Weidong Cai
YOU?
Author Swipe
View article: A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion <scp>MRI</scp> Tractography
A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion <span>MRI</span> Tractography Open
Recently, shape measures have emerged as promising descriptors of white matter tractography, offering complementary insights into anatomical variability and associations with cognitive and clinical phenotypes. However, conventional methods…
View article: Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing
Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing Open
Naturalistic fMRI encoding must handle multimodal inputs, shifting fusion styles, and pronounced inter-subject variability. We introduce AFIRE (Agnostic Framework for Multimodal fMRI Response Encoding), an agnostic interface that standardi…
View article: Study of Sex Differences in the Whole Brain White Matter Using Diffusion MRI Tractography and Suprathreshold Fiber Cluster Statistics
Study of Sex Differences in the Whole Brain White Matter Using Diffusion MRI Tractography and Suprathreshold Fiber Cluster Statistics Open
Sex-specific characteristics demonstrate a substantial influence on the brain white matter (WM), suggesting distinct structural connectivity patterns between females and males. Diffusion MRI (dMRI) tractography is an important tool in asse…
View article: Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models
Not All Tokens are Guided Equal: Improving Guidance in Visual Autoregressive Models Open
Autoregressive (AR) models based on next-scale prediction are rapidly emerging as a powerful tool for image generation, but they face a critical weakness: information inconsistencies between patches across timesteps introduced by progressi…
View article: Beyond Random Masking: A Dual-Stream Approach for Rotation-Invariant Point Cloud Masked Autoencoders
Beyond Random Masking: A Dual-Stream Approach for Rotation-Invariant Point Cloud Masked Autoencoders Open
Existing rotation-invariant point cloud masked autoencoders (MAE) rely on random masking strategies that overlook geometric structure and semantic coherence. Random masking treats patches independently, failing to capture spatial relations…
View article: ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation
ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation Open
The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Exist…
View article: DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography
DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography Open
Diffusion MRI (dMRI) tractography enables in vivo mapping of brain structural connections, but traditional connectome generation is time-consuming and requires gray matter parcellation, posing challenges for large-scale studies. We introdu…
View article: A Survey of 3D Reconstruction with Event Cameras
A Survey of 3D Reconstruction with Event Cameras Open
Event cameras are rapidly emerging as powerful vision sensors for 3D reconstruction, uniquely capable of asynchronously capturing per-pixel brightness changes. Compared to traditional frame-based cameras, event cameras produce sparse yet t…
View article: Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Open
The Contrastive Language-Image Pre-training (CLIP) framework has become a widely used approach for multimodal representation learning, particularly in image-text retrieval and clustering. However, its efficacy is constrained by three key l…
View article: CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination Open
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption …
View article: <scp>TractCloud</scp>‐<scp>FOV</scp>: Deep Learning‐Based Robust Tractography Parcellation in Diffusion <scp>MRI</scp> With Incomplete Field of View
<span>TractCloud</span>‐<span>FOV</span>: Deep Learning‐Based Robust Tractography Parcellation in Diffusion <span>MRI</span> With Incomplete Field of View Open
Tractography parcellation classifies streamlines reconstructed from diffusion MRI into anatomically defined fiber tracts for clinical and research applications. However, clinical scans often have incomplete fields of view (FOV) where brain…
View article: The Shape of the Brain's Connections Is Predictive of Cognitive Performance: An Explainable Machine Learning Study
The Shape of the Brain's Connections Is Predictive of Cognitive Performance: An Explainable Machine Learning Study Open
The shape of the brain's white matter connections is relatively unexplored in diffusion magnetic resonance imaging (dMRI) tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is …
View article: Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding Open
Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily m…
View article: TractCloud-FOV: Deep Learning-based Robust Tractography Parcellation in Diffusion MRI with Incomplete Field of View
TractCloud-FOV: Deep Learning-based Robust Tractography Parcellation in Diffusion MRI with Incomplete Field of View Open
Tractography parcellation classifies streamlines reconstructed from diffusion MRI into anatomically defined fiber tracts for clinical and research applications. However, clinical scans often have incomplete fields of view (FOV) where brain…
View article: Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework
Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework Open
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder impacting social and behavioral development. Resting-state fMRI, a non-invasive tool for capturing brain connectivity patterns, aids in early ASD diagnosis and differentiation…
View article: RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Open
After pre-training on extensive image-text pairs, Contrastive Language-Image Pre-training (CLIP) demonstrates promising performance on a wide variety of benchmarks. However, a substantial volume of multimodal interleaved documents remains …
View article: MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention Open
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease. Multi-modal self-supervised learning has demonstrated remarkable potential in learning patholo…
View article: Cross-View Consistency Regularisation for Knowledge Distillation
Cross-View Consistency Regularisation for Knowledge Distillation Open
Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with th…
View article: Gotta Hear Them All: Towards Sound Source Aware Audio Generation
Gotta Hear Them All: Towards Sound Source Aware Audio Generation Open
Audio synthesis has broad applications in multimedia. Recent advancements have made it possible to generate relevant audios from inputs describing an audio scene, such as images or texts. However, the immersiveness and expressiveness of th…
View article: Cell as Point: One-Stage Framework for Efficient Cell Tracking
Cell as Point: One-Stage Framework for Efficient Cell Tracking Open
Conventional multi-stage cell tracking approaches rely heavily on detection or segmentation in each frame as a prerequisite, requiring substantial resources for high-quality segmentation masks and increasing the overall prediction time. To…
View article: ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation Open
The objective of Radiology Report Generation (RRG) is to automatically generate coherent textual analyses of diseases based on radiological images, thereby alleviating the workload of radiologists. Current AI-based methods for RRG primaril…
View article: AI-Powered cellular morphometric biomarkers discovered in needle biopsy of prostatic cancer predict neoadjuvant androgen deprivation therapy response and prognosis: an international multicenter retrospective study
AI-Powered cellular morphometric biomarkers discovered in needle biopsy of prostatic cancer predict neoadjuvant androgen deprivation therapy response and prognosis: an international multicenter retrospective study Open
It is imperative to identify patients with prostate cancer (PCa) who will benefit from androgen receptor signaling inhibitors that can impact quality of life upon prolonged use. Using our extensively-validated artificial-intelligence techn…
View article: AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation Open
Surgical instrument segmentation (SIS) is pivotal for robotic-assisted minimally invasive surgery, assisting surgeons by identifying surgical instruments in endoscopic video frames. Recent unsupervised surgical instrument segmentation (USI…
View article: TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds
TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds Open
Brain imaging studies have demonstrated that diffusion MRI tractography geometric shape descriptors can inform the study of the brain's white matter pathways and their relationship to brain function. In this work, we investigate the possib…