Chun-Mei Feng
YOU?
Author Swipe
View article: A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Open
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a compre…
View article: Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning
Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning Open
Benefited from image-text contrastive learning, pre-trained vision-language models, e.g., CLIP, allow to direct leverage texts as images (TaI) for parameter-efficient fine-tuning (PEFT). While CLIP is capable of making image features to be…
View article: VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering Open
Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual…
View article: Achieving flexible fairness metrics in federated medical imaging
Achieving flexible fairness metrics in federated medical imaging Open
View article: PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models
PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models Open
3D Multimodal Large Language Models (MLLMs) have recently made substantial advancements. However, their potential remains untapped, primarily due to the limited quantity and suboptimal quality of 3D datasets. Current approaches attempt to …
View article: Surgical video workflow analysis via visual-language learning
Surgical video workflow analysis via visual-language learning Open
Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene analysis and decision-making. However, previous research has primarily focused…
View article: Unprejudiced Training Auxiliary Tasks Makes Primary Better: A Multi-Task Learning Perspective
Unprejudiced Training Auxiliary Tasks Makes Primary Better: A Multi-Task Learning Perspective Open
Human beings can leverage knowledge from relative tasks to improve learning on a primary task. Similarly, multi-task learning methods suggest using auxiliary tasks to enhance a neural network's performance on a specific primary task. Howev…
View article: Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open
Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…
View article: Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation
Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation Open
Existing test-time prompt tuning (TPT) methods focus on single-modality data, primarily enhancing images and using confidence ratings to filter out inaccurate images. However, while image generation models can produce visually diverse imag…
View article: Surgical Video Workflow Analysis via Visual-Language Learning
Surgical Video Workflow Analysis via Visual-Language Learning Open
Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene analysis and decision-making. However, previous research has mainly focused on…
View article: Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence
Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence Open
Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and …
View article: Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open
Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…
View article: Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection
Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection Open
AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture t…
View article: Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development Open
Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotati…
View article: Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models
Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models Open
Aside from offering state-of-the-art performance in medical image generation, denoising diffusion probabilistic models (DPM) can also serve as a representation learner to capture semantic information and potentially be used as an image rep…
View article: CPT: Consistent Proxy Tuning for Black-box Optimization
CPT: Consistent Proxy Tuning for Black-box Optimization Open
Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It a…
View article: Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition
Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition Open
In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To addre…
View article: Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning
Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning Open
Few-shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes based on very limited training data without forgetting the old ones encountered. Existing studies solely relied on pure visual networks, while in this paper…
View article: A New Perspective to Boost Performance Fairness For Medical Federated Learning
A New Perspective to Boost Performance Fairness For Medical Federated Learning Open
View article: VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering Open
Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual…
View article: Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding Open
3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this …
View article: Sentence-level Prompts Benefit Composed Image Retrieval
Sentence-level Prompts Benefit Composed Image Retrieval Open
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption. Most existing CIR models adopt the late-fusion strategy to combine visual and language f…
View article: COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation
COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation Open
Universal domain adaptation (UniDA) aims to address domain and category shifts across data sources. Recently, due to more stringent data restrictions, researchers have introduced source-free UniDA (SF-UniDA). SF-UniDA methods eliminate the…
View article: Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction
Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction Open
While multi-modal learning has been widely used for MRI reconstruction, it relies on paired multi-modal data which is difficult to acquire in real clinical scenarios. Especially in the federated setting, the common situation is that severa…
View article: Rethinking Client Drift in Federated Learning: A Logit Perspective
Rethinking Client Drift in Federated Learning: A Logit Perspective Open
Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interesti…
View article: Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning Open
Benefiting from prompt tuning, recent years have witnessed the promising performance of pre-trained vision-language models, e.g., CLIP, on versatile downstream tasks. In this paper, we focus on a particular setting of learning adaptive pro…
View article: Towards Instance-adaptive Inference for Federated Learning
Towards Instance-adaptive Inference for Federated Learning Open
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training. However, the performance of the global model is often hampered by non-i.i.d. distribut…
View article: Learning Federated Visual Prompt in Null Space for MRI Reconstruction
Learning Federated Visual Prompt in Null Space for MRI Reconstruction Open
Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy. However, the data heterogeneity caused by different MRI p…
View article: Reliable Federated Disentangling Network for Non-IID Domain Feature
Reliable Federated Disentangling Network for Non-IID Domain Feature Open
Federated learning (FL), as an effective decentralized distributed learning approach, enables multiple institutions to jointly train a model without sharing their local data. However, the domain feature shift caused by different acquisitio…
View article: A New Dual Stator Permanent Magnet Machine Based on Field Modulation Theory
A New Dual Stator Permanent Magnet Machine Based on Field Modulation Theory Open
Increasing industrial development puts forward high requirements for the performances of stator permanent magnet (PM) machines, such as torque density and efficiency. The paper proposes a new dual stator PM machine based on field modulatio…