Zhengfeng Lai
YOU?
Author Swipe
View article: Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction Open
This study focuses on a challenging yet promising task, Text-to-Sounding-Video (T2SV) generation, which aims to generate a video with synchronized audio from text conditions, meanwhile ensuring both modalities are aligned with text. Despit…
View article: ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering Open
Precisely evaluating semantic alignment between text prompts and generated videos remains a challenge in Text-to-Video (T2V) Generation. Existing text-to-video alignment metrics like CLIPScore only generate coarse-grained scores without fi…
View article: STIV: Scalable Text and Image Conditioned Video Generation
STIV: Scalable Text and Image Conditioned Video Generation Open
The field of video generation has made remarkable advancements, yet there remains a pressing need for a clear, systematic recipe that can guide the development of robust and scalable models. In this work, we present a comprehensive study t…
View article: Automating Microinfarct Screening in Hematoxylin and Eosin‐stained Human Brain Tissues: A Machine Learning Approach
Automating Microinfarct Screening in Hematoxylin and Eosin‐stained Human Brain Tissues: A Machine Learning Approach Open
Background Microinfarcts, characteristic lesions of vascular dementia (VaD), are heterogenous and vary in appearance, which pose a considerable challenge for VaD grading as there is great interrater variability in microinfarct assessment. …
View article: Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training Open
Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has been widely adopted as the vision backbon…
View article: Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Open
Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is n…
View article: MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Open
We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. Building upon the MM1 architecture, MM…
View article: Machine learning quantification of Amyloid-β deposits in the temporal lobe of 131 brain bank cases
Machine learning quantification of Amyloid-β deposits in the temporal lobe of 131 brain bank cases Open
View article: Reduction in Chemical Fertilizer Rates by Applying Bio-Organic Fertilizer for Optimization Yield and Quality of Hemerocallis citrina Baroni
Reduction in Chemical Fertilizer Rates by Applying Bio-Organic Fertilizer for Optimization Yield and Quality of Hemerocallis citrina Baroni Open
In this study, we investigated if reducing the amount of chemical fertilizer by combining it with organic fertilizer in Hemerocallis citrina Baroni (H. citrina) cultivation could improve plant growth and photosynthetic capacity and, conseq…
View article: SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Open
We propose SlowFast-LLaVA (or SF-LLaVA for short), a training-free video large language model (LLM) that can jointly capture detailed spatial semantics and long-range temporal context without exceeding the token budget of commonly used LLM…
View article: Machine Learning–Based Critical Congenital Heart Disease Screening Using Dual‐Site Pulse Oximetry Measurements
Machine Learning–Based Critical Congenital Heart Disease Screening Using Dual‐Site Pulse Oximetry Measurements Open
Background Oxygen saturation (Sp o 2 ) screening has not led to earlier detection of critical congenital heart disease (CCHD). Adding pulse oximetry features (ie, perfusion data and radiofemoral pulse delay) may improve CCHD detection, esp…
View article: Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning Open
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target domain using only unlabeled target data. Current SFDA methods face challenges in effectively leveraging pre-trained knowledge and exploiting target d…
View article: Semi-Path: An interactive semi-supervised learning framework for gigapixel pathology image analysis
Semi-Path: An interactive semi-supervised learning framework for gigapixel pathology image analysis Open
The efficacy of supervised deep learning in medical image analyses, particularly in pathology, is hindered by the necessity for extensive manual annotations. Annotating images at the gigapixel level manually proves to be a highly labor-int…
View article: MobilityGPT: Enhanced Human Mobility Modeling with a GPT model
MobilityGPT: Enhanced Human Mobility Modeling with a GPT model Open
Generative models have shown promising results in capturing human mobility characteristics and generating synthetic trajectories. However, it remains challenging to ensure that the generated geospatial mobility data is semantically realist…
View article: Genome-Wide Identification of Tomato (<i>Solanum lycopersicum</i> L.) <i>CKX</i> Gene Family and Expression Analysis in the Callus Tissue under Zeatin Treatment
Genome-Wide Identification of Tomato (<i>Solanum lycopersicum</i> L.) <i>CKX</i> Gene Family and Expression Analysis in the Callus Tissue under Zeatin Treatment Open
The cytokinin oxidase/dehydrogenase (CKX) enzyme is essential for controlling the fluctuating levels of endogenous cytokinin (CK) and has a significant impact on different aspects of plant growth and development. Nonetheless, there is limi…
View article: VeCLIP: Improving CLIP Training via Visual-enriched Captions
VeCLIP: Improving CLIP Training via Visual-enriched Captions Open
Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential irrelevance of web-crawled AltTexts pose challenges in achieving precise image…
View article: Preanalytic variable effects on segmentation and quantification machine learning algorithms for amyloid-β analyses on digitized human brain slides
Preanalytic variable effects on segmentation and quantification machine learning algorithms for amyloid-β analyses on digitized human brain slides Open
Computational machine learning (ML)-based frameworks could be advantageous for scalable analyses in neuropathology. A recent deep learning (DL) framework has shown promise in automating the processes of visualizing and quantifying differen…
View article: Identification, Characterization and Function of Orphan Genes Among the Current Cucurbitaceae Genomes
Identification, Characterization and Function of Orphan Genes Among the Current Cucurbitaceae Genomes Open
Orphan genes (OGs) that are missing identifiable homologs in other lineages may potentially make contributions to a variety of biological functions. The Cucurbitaceae family consists of a wide range of fruit crops of worldwide or local eco…
View article: Analysis of Transcriptome Response to Low Temperature Stress in <i>Mesembryanthemum</i> <i>crystallinum</i> Linn
Analysis of Transcriptome Response to Low Temperature Stress in <i>Mesembryanthemum</i> <i>crystallinum</i> Linn Open
The second generation of high throughput sequencing technology was used to sequence ice plant under low temperature stress and to construct transcriptome database. 24.13 Gb of valid data and 24 045 annotations of Unigene were obtained. DEG…
View article: BrainSec: Automated Brain Tissue Segmentation Pipeline for Scalable Neuropathological Analysis
BrainSec: Automated Brain Tissue Segmentation Pipeline for Scalable Neuropathological Analysis Open
As neurodegenerative disease pathological hallmarks have been reported in both grey matter (GM) and white matter (WM) with different density distributions, automating the segmentation process of GM/WM would be extremely advantageous for ai…
View article: A Machine Learning Driven Pipeline for Automated Photoplethysmogram Signal Artifact Detection
A Machine Learning Driven Pipeline for Automated Photoplethysmogram Signal Artifact Detection Open
Recent advances in Critical Congenital Heart Disease (CCHD) research using Photoplethysmography (PPG) signals have yielded an Internet of Things (IoT) based enhanced screening method that performs CCHD detection comparable to SpO2 screenin…
View article: A Semi-supervised Learning for Segmentation of Gigapixel Histopathology Images from Brain Tissues
A Semi-supervised Learning for Segmentation of Gigapixel Histopathology Images from Brain Tissues Open
Automated segmentation of grey matter (GM) and white matter (WM) in gigapixel histopathology images is advantageous to analyzing distributions of disease pathologies, further aiding in neuropathologic deep phenotyping. Although supervised …
View article: Joint Semi-supervised and Active Learning for Segmentation of Gigapixel Pathology Images with Cost-Effective Labeling
Joint Semi-supervised and Active Learning for Segmentation of Gigapixel Pathology Images with Cost-Effective Labeling Open
The need for manual and detailed annotations limits the applicability of supervised deep learning algorithms in medical image analyses, specifically in the field of pathology. Semi-supervised learning (SSL) provides an effective way for le…
View article: A novel system to collect dual pulse oximetry data for critical congenital heart disease screening research
A novel system to collect dual pulse oximetry data for critical congenital heart disease screening research Open
Introduction: Access to patient medical data is critical to building a real-time data analytic pipeline for improving care providers’ ability to detect, diagnose, and prognosticate diseases. Critical congenital heart disease (CCHD) is a co…