Nanning Zheng
YOU?
Author Swipe
View article: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models Open
The performance of Latent Diffusion Models (LDMs) is critically dependent on the quality of their visual tokenizer. While recent works have explored incorporating Vision Foundation Models (VFMs) via distillation, we identify a fundamental …
View article: FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers Open
Detecting 3D objects accurately from multi-view 2D images is a challenging yet essential task in the field of autonomous driving. Current methods resort to integrating depth prediction to recover the spatial information for object query de…
View article: Review of research progress on self-training for object detection in unmanned system perception
Review of research progress on self-training for object detection in unmanned system perception Open
View article: Fitting Pair Distribution Function with Backpropagation
Fitting Pair Distribution Function with Backpropagation Open
Pair distribution function (PDF) analysis is a powerful technique for characterizing both long-range structures and local distortions in materials, gaining significant importance in materials science. However, conventional PDF modeling app…
View article: Semantic Concept in Brain fMRI Spatio-Temporal Voxel Patterns
Semantic Concept in Brain fMRI Spatio-Temporal Voxel Patterns Open
Cognitive neuroscience bridges insights into human brain mechanisms with artificial intelligence, where brain-inspired architectures have driven unprecedented success in artificial neural networks. However, endowing AI models with the dyna…
View article: EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning Open
Compositional Zero-Shot Learning (CZSL) investigates compositional generalization capacity to recognize unknown state-object pairs based on learned primitive concepts. Existing CZSL methods typically derive primitives features through a si…
View article: Unveiling Multi-View Anomaly Detection: Intra-view Decoupling and Inter-view Fusion
Unveiling Multi-View Anomaly Detection: Intra-view Decoupling and Inter-view Fusion Open
Anomaly detection has garnered significant attention for its extensive industrial application value. Most existing methods focus on single-view scenarios and fail to detect anomalies hidden in blind spots, leaving a gap in addressing the d…
View article: See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI
See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI Open
Deciphering visual content from fMRI sheds light on the human vision system, but data scarcity and noise limit brain decoding model performance. Traditional approaches rely on subject-specific models, which are sensitive to training sample…
View article: Correction to: With or without human interference for precise age estimation based on machine learning?
Correction to: With or without human interference for precise age estimation based on machine learning? Open
View article: BrainCLIP: Brain Representation via CLIP for Generic Natural Visual Stimulus Decoding
BrainCLIP: Brain Representation via CLIP for Generic Natural Visual Stimulus Decoding Open
Functional Magnetic Resonance Imaging (fMRI) presents challenges due to limited paired samples and low signal-to-noise ratios, particularly in tasks involving reconstructing natural images or decoding their semantic content. To address the…
View article: Interactive Design of Developable Surfaces by Patch-Based Learning
Interactive Design of Developable Surfaces by Patch-Based Learning Open
View article: Instrucrobo: Object-Centric Multi-Instruction Decoupling Model for Explainable Robotic Manipulation
Instrucrobo: Object-Centric Multi-Instruction Decoupling Model for Explainable Robotic Manipulation Open
View article: Hidden States in LLMs Improve EEG Representation Learning and Visual Decoding
Hidden States in LLMs Improve EEG Representation Learning and Visual Decoding Open
Analyzing brain signals and reconstructing visual stimuli from the brain can facilitate further exploration on cognitive functions of the human brain, which have attracted strong interest in neuroscience and artificial intelligence. Howeve…
View article: Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation Open
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation. Vision Transformers (ViTs) have advanced global modeling through self-attention …
View article: Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs
Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs Open
Geometric graph neural networks (GNNs) have emerged as powerful tools for modeling molecular geometry. However, they encounter limitations in effectively capturing long-range interactions in large molecular systems. To address this challen…
View article: DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments
DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments Open
Grasping large and flat objects (e.g. a book or a pan) is often regarded as an ungraspable task, which poses significant challenges due to the unreachable grasping poses. Previous works leverage Extrinsic Dexterity like walls or table edge…
View article: PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation Open
Semi-supervised learning has emerged as a widely adopted technique in the field of medical image segmentation. The existing works either focuses on the construction of consistency constraints or the generation of pseudo labels to provide h…
View article: Improving AlphaFlow for Efficient Protein Ensembles Generation
Improving AlphaFlow for Efficient Protein Ensembles Generation Open
Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure predic…
View article: Refracting Once is Enough: Neural Radiance Fields for Novel-View Synthesis of Real Refractive Objects
Refracting Once is Enough: Neural Radiance Fields for Novel-View Synthesis of Real Refractive Objects Open
Neural Radiance Fields (NeRF) have shown promise in novel view synthesis, but it still face challenges when applied to refractive objects. The presence of refraction disrupts multiview consistency, often resulting in renderings that are ei…
View article: A General Theory for Compositional Generalization
A General Theory for Compositional Generalization Open
Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural net…
View article: Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis Open
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated …
View article: F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching
F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching Open
Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerg…
View article: Make Your LLM Fully Utilize the Context
Make Your LLM Fully Utilize the Context Open
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insuffic…
View article: Robust Noisy Label Learning via Two-Stream Sample Distillation
Robust Noisy Label Learning via Two-Stream Sample Distillation Open
Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the…
View article: Inside back cover
Inside back cover Open
View article: IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance
IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance Open
Among existing Neural Architecture Search methods, DARTS is known for its efficiency and simplicity. This approach applies continuous relaxation of network representation to construct a weight-sharing supernet and enables the identificatio…
View article: Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection
Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection Open
Efficient representation of point clouds is fundamental for LiDAR-based 3D object detection. While recent grid-based detectors often encode point clouds into either voxels or pillars, the distinctions between these approaches remain undere…
View article: GSO-Net: Grid Surface Optimization via Learning Geometric Constraints
GSO-Net: Grid Surface Optimization via Learning Geometric Constraints Open
In the context of surface representations, we find a natural structural similarity between grid surface and image data. Motivated by this inspiration, we propose a novel approach: encoding grid surfaces as geometric images and using image …
View article: Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction
Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction Open
Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for …
View article: See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI
See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI Open
Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system. However, the scarcity of fMRI data and noise hamper brain decoding model performance. Previous approaches primarily empl…