Junchi Yan
YOU?
Author Swipe
View article: Structure Alignment-driven Cross-Graph Modeling for Functional RNA Design
Structure Alignment-driven Cross-Graph Modeling for Functional RNA Design Open
RNAs are critical for biological processes, with their biological functions closely tied to their three-dimensional structures. RNA inverse folding, the design of RNA sequences that fold into target 3D structures, is a complex challenge du…
View article: Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views Open
Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative ap…
View article: Fast Multi-objective RNA Optimization with Autoregressive Reinforcement Learning
Fast Multi-objective RNA Optimization with Autoregressive Reinforcement Learning Open
Codon optimization is essential in mRNA vaccine development, while existing tools face limitations in the computational efficiency, sequence diversity and universality. To address these challenges, we develop RNAJog (RNA Joint Optimization…
View article: Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency
Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency Open
Despite the fast progress of deep learning, one standing challenge is the gap of the observed training samples and the underlying true distribution. There are multiple reasons for the causing of this gap e.g. sampling bias, noise etc. In t…
View article: When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems Open
Recent large-scale events like election fraud and financial scams have shown how harmful coordinated efforts by human groups can be. With the rise of autonomous AI systems, there is growing concern that AI-driven groups could also cause si…
View article: TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge
TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge Open
In this technical report, we introduce TrajTok, a trajectory tokenizer for discrete next-token-prediction based behavior generation models, which combines data-driven and rule-based methods with better coverage, symmetry and robustness, al…
View article: ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Open
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer rem…
View article: SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence Open
Multimodal Large Language Models (MLLMs) have achieved remarkable progress in various multimodal tasks. To pursue higher intelligence in space, MLLMs require integrating multiple spatial capabilities, even for handling simple and normal ta…
View article: Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space
Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space Open
We propose AdapTok, an adaptive temporal causal video tokenizer that can flexibly allocate tokens for different frames based on video content. AdapTok is equipped with a block-wise masking strategy that randomly drops tail tokens of each b…
View article: Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) Open
Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, a…
View article: New Evidence of the Two-Phase Learning Dynamics of Neural Networks
New Evidence of the Two-Phase Learning Dynamics of Neural Networks Open
Understanding how deep neural networks learn remains a fundamental challenge in modern machine learning. A growing body of evidence suggests that training dynamics undergo a distinct phase transition, yet our understanding of this transiti…
View article: Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving Open
As a seemingly self-explanatory task, problem-solving has been a significant component of science and engineering. However, a general yet concrete formulation of problem-solving itself is missing. With the recent development of AI-based pr…
View article: Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions Open
The rise of foundation models paves the way for generalist robot policies in the physical world. Existing methods relying on text-only instructions often struggle to generalize to unseen scenarios. We argue that interleaved image-text inpu…
View article: Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning
Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning Open
Motion planning is a critical module in autonomous driving, with the primary challenge of uncertainty caused by interactions with other participants. As most previous methods treat prediction and planning as separate tasks, it is difficult…
View article: On the Cone Effect in the Learning Dynamics
On the Cone Effect in the Learning Dynamics Open
Understanding the learning dynamics of neural networks is a central topic in the deep learning community. In this paper, we take an empirical perspective to study the learning dynamics of neural networks in real-world settings. Specificall…
View article: DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving Open
End-to-end autonomous driving (E2E-AD) has emerged as a trend in the field of autonomous driving, promising a data-driven, scalable approach to system design. However, existing E2E-AD methods usually adopt the sequential paradigm of percep…
View article: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Open
Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage tra…
View article: The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training Open
Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is imp…
View article: Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection
Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection Open
Accurately estimating the orientation of visual objects with compact rotated bounding boxes (RBoxes) has become a prominent demand, which challenges existing object detection paradigms that only use horizontal bounding boxes (HBoxes). To e…
View article: Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances Open
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging ta…
View article: Efficient Packaging Line Object Counting by Cross-Frame Association With Wavelet Convolutions and Trajectory Compensation
Efficient Packaging Line Object Counting by Cross-Frame Association With Wavelet Convolutions and Trajectory Compensation Open
Real-time object counting in the industry pipeline is critical for improving efficiency and accuracy in industries like manufacturing and logistics. This paper introduces a novel multi-object association method, namely tracking method, whi…
View article: GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training Open
Despite their proficiency in general tasks, Multi-modal Large Language Models (MLLMs) struggle with automatic Geometry Problem Solving (GPS), which demands understanding diagrams, interpreting symbols, and performing complex reasoning. Thi…
View article: Universal Hamming Weight Preserving Variational Quantum Ansatz
Universal Hamming Weight Preserving Variational Quantum Ansatz Open
Understanding the mathematical properties of variational quantum ansätze is crucial for determining quantum advantage in Variational Quantum Eigensolvers (VQEs). A deeper understanding of ansätze not only enriches theoretical discussions b…
View article: Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation Open
In recent years, aerial object detection has been increasingly pivotal in various earth observation applications. However, current algorithms are limited to detecting a set of pre-defined object categories, demanding sufficient annotated t…