Weidong Cai
YOU?
Author Swipe
View article: A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion <scp>MRI</scp> Tractography
A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion <span>MRI</span> Tractography Open
Recently, shape measures have emerged as promising descriptors of white matter tractography, offering complementary insights into anatomical variability and associations with cognitive and clinical phenotypes. However, conventional methods…
View article: <i>RealSyn</i> : An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
<i>RealSyn</i> : An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Open
View article: ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion Open
View article: ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation
ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation Open
The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Exist…
View article: MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding
MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding Open
Music is both an auditory and an embodied phenomenon, closely linked to human motion and naturally expressed through dance. However, most existing audio representations neglect this embodied dimension, limiting their ability to capture rhy…
View article: UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning Open
Universal multimodal embedding models are foundational to various tasks. Existing approaches typically employ in-batch negative mining by measuring the similarity of query-candidate pairs. However, these methods often struggle to capture s…
View article: Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing
Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing Open
Naturalistic fMRI encoding must handle multimodal inputs, shifting fusion styles, and pronounced inter-subject variability. We introduce AFIRE (Agnostic Framework for Multimodal fMRI Response Encoding), an agnostic interface that standardi…
View article: Study of Sex Differences in the Whole Brain White Matter Using Diffusion MRI Tractography and Suprathreshold Fiber Cluster Statistics
Study of Sex Differences in the Whole Brain White Matter Using Diffusion MRI Tractography and Suprathreshold Fiber Cluster Statistics Open
Sex-specific characteristics demonstrate a substantial influence on the human brain white matter, suggesting distinct brain structural connectivity patterns between females and males. Diffusion MRI (dMRI) tractography is an important tool …
View article: Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models Open
Autoregressive (AR) models based on next-scale prediction are rapidly emerging as a powerful tool for image generation, but they face a critical weakness: information inconsistencies between patches across timesteps introduced by progressi…
View article: ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation
ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation Open
The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Exist…
View article: A Survey of 3D Reconstruction with Event Cameras
A Survey of 3D Reconstruction with Event Cameras Open
Event cameras are rapidly emerging as powerful vision sensors for 3D reconstruction, uniquely capable of asynchronously capturing per-pixel brightness changes. Compared to traditional frame-based cameras, event cameras produce sparse yet t…
View article: Diversity-Augmented Diffusion Network With LLM Assistance For Radiology Report Generation
Diversity-Augmented Diffusion Network With LLM Assistance For Radiology Report Generation Open
View article: Seek Inner: LLM-Enhanced Information Mining for Medical Visual Question Answering
Seek Inner: LLM-Enhanced Information Mining for Medical Visual Question Answering Open
View article: LLM-UM: The 1st Workshop on Large Language Model Using Multi-modal Data for User Modeling
LLM-UM: The 1st Workshop on Large Language Model Using Multi-modal Data for User Modeling Open
View article: Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Open
The Contrastive Language-Image Pre-training (CLIP) framework has become a widely used approach for multimodal representation learning, particularly in image-text retrieval and clustering. However, its efficacy is constrained by three key l…
View article: CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination Open
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption …
View article: <scp>TractCloud</scp>‐<scp>FOV</scp>: Deep Learning‐Based Robust Tractography Parcellation in Diffusion <scp>MRI</scp> With Incomplete Field of View
<span>TractCloud</span>‐<span>FOV</span>: Deep Learning‐Based Robust Tractography Parcellation in Diffusion <span>MRI</span> With Incomplete Field of View Open
Tractography parcellation classifies streamlines reconstructed from diffusion MRI into anatomically defined fiber tracts for clinical and research applications. However, clinical scans often have incomplete fields of view (FOV) where brain…
View article: The Shape of the Brain's Connections Is Predictive of Cognitive Performance: An Explainable Machine Learning Study
The Shape of the Brain's Connections Is Predictive of Cognitive Performance: An Explainable Machine Learning Study Open
The shape of the brain's white matter connections is relatively unexplored in diffusion magnetic resonance imaging (dMRI) tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is …
View article: Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding Open
Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily m…
View article: Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework
Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework Open
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder impacting social and behavioral development. Resting-state fMRI, a non-invasive tool for capturing brain connectivity patterns, aids in early ASD diagnosis and differentiation…
View article: RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Open
After pre-training on extensive image-text pairs, Contrastive Language-Image Pre-training (CLIP) demonstrates promising performance on a wide variety of benchmarks. However, a substantial volume of multimodal interleaved documents remains …
View article: Cross-Domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score Prediction
Cross-Domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score Prediction Open
View article: A Modified Quechers Method Using Single-Sorbent Combined with Lc-Ms/Ms for Simultaneous Determination of Four Phenolic Pesticide Residues in Fruits and Vegetables
A Modified Quechers Method Using Single-Sorbent Combined with Lc-Ms/Ms for Simultaneous Determination of Four Phenolic Pesticide Residues in Fruits and Vegetables Open
View article: Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities Open
View article: MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention Open
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease. Multi-modal self-supervised learning has demonstrated remarkable potential in learning patholo…
View article: Cross-View Consistency Regularisation for Knowledge Distillation
Cross-View Consistency Regularisation for Knowledge Distillation Open
Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with th…
View article: Gotta Hear Them All: Towards Sound Source Aware Audio Generation
Gotta Hear Them All: Towards Sound Source Aware Audio Generation Open
Audio synthesis has broad applications in multimedia. Recent advancements have made it possible to generate relevant audios from inputs describing an audio scene, such as images or texts. However, the immersiveness and expressiveness of th…
View article: Cell as Point: One-Stage Framework for Efficient Cell Tracking
Cell as Point: One-Stage Framework for Efficient Cell Tracking Open
Conventional multi-stage cell tracking approaches rely heavily on detection or segmentation in each frame as a prerequisite, requiring substantial resources for high-quality segmentation masks and increasing the overall prediction time. To…
View article: ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation Open
The objective of Radiology Report Generation (RRG) is to automatically generate coherent textual analyses of diseases based on radiological images, thereby alleviating the workload of radiologists. Current AI-based methods for RRG primaril…
View article: AI-Powered cellular morphometric biomarkers discovered in needle biopsy of prostatic cancer predict neoadjuvant androgen deprivation therapy response and prognosis: an international multicenter retrospective study
AI-Powered cellular morphometric biomarkers discovered in needle biopsy of prostatic cancer predict neoadjuvant androgen deprivation therapy response and prognosis: an international multicenter retrospective study Open
It is imperative to identify patients with prostate cancer (PCa) who will benefit from androgen receptor signaling inhibitors that can impact quality of life upon prolonged use. Using our extensively-validated artificial-intelligence techn…