Explanipedia

ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning Open

X. P. Qin, Xiaoxing Wang, Ning Liao, Chuanlei Zhang, Xiangdong Zhang , et al. · 2025

Data quality plays a critical role in enhancing supervised fine-tuning (SFT) for large language models (LLMs), and token-level data selection has emerged as a promising direction for its fine-grained nature. Despite their strong empirical …

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers Open

Haisheng Su, Junjie Zhang, Fei Song, Sanping Zhou, Wei Wu , et al. · 2025

Detecting 3D objects accurately from multi-view 2D images is a challenging yet essential task in the field of autonomous driving. Current methods resort to integrating depth prediction to recover the spatial information for object query de…

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Open

Yang Li, Zhiyong Dong, Yuhan Sun, Weixun Wang, Shijiang Xiong , et al. · 2025

The reasoning pattern of Large language models (LLMs) remains opaque, and Reinforcement learning (RL) typically applies uniform credit across an entire generation, blurring the distinction between pivotal and routine steps. This work posit…

The LLM Era Demands Natural-Language-Aligned Theorem Provers for Mathematics Open

Qinxiang Cao, Lihan Xie, Junchi Yan · 2025

Physics-informed Neural-operator Predictive Control for Drag Reduction in Turbulent Flows Open

Zelin Zhao, Zongyi Li, Kimia Hassibi, Kamyar Azizzadenesheli, Junchi Yan , et al. · 2025

Assessing turbulence control effects for wall friction numerically is a significant challenge since it requires expensive simulations of turbulent fluid dynamics. We instead propose an efficient deep reinforcement learning (RL) framework f…

Structure Alignment-driven Cross-Graph Modeling for Functional RNA Design Open

Xiaoyong Pan, Xiaoyong Pan, Jun Wang, Xiaojian Liu, Weimin Zhu , et al. · 2025

RNAs are critical for biological processes, with their biological functions closely tied to their three-dimensional structures. RNA inverse folding, the design of RNA sequences that fold into target 3D structures, is a complex challenge du…

Fast Multi-objective RNA Optimization with Autoregressive Reinforcement Learning Open

Jiaqi Huang, Harrison X. Bai, Yi Fang, Xiaojian Liu, Xiaoyong Pan , et al. · 2025

Codon optimization is essential in mRNA vaccine development, while existing tools face limitations in the computational efficiency, sequence diversity and universality. To address these challenges, we develop RNAJog (RNA Joint Optimization…

Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency Open

Yanbiao Ma, Wei Dai, Bowei Liu, Jiayi Chen, Wenke Huang , et al. · 2025

Despite the fast progress of deep learning, one standing challenge is the gap of the observed training samples and the underlying true distribution. There are multiple reasons for the causing of this gap e.g. sampling bias, noise etc. In t…

BiQAP: Neural Bi-level Optimization-based Framework for Solving Quadratic Assignment Problems Open

Liangliang Shi, Haoran Zhang, Shuheng Shen, Changhua Meng, Weiqiang Wang , et al. · 2025

Reinvent the Operation not the Architecture: Quantum-inspired High-order Product for Compatible and Improved LLMs Training Open

Hao Xiong, Y. Yang, Hongbing Wu, Xu Zhong, Yehui Tang , et al. · 2025

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Open

He Feng, Zijun Chen, Xiaoyi Liang, Ma Tingting, Yicheng Qiu , et al. · 2025

Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer rem…

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence Open

Ziyang Gong, Wenhao Li, Ou Ma, Songyuan Li, Jiayi Ji , et al. · 2025

Multimodal Large Language Models (MLLMs) have achieved remarkable progress in various multimodal tasks. To pursue higher intelligence in space, MLLMs require integrating multiple spatial capabilities, even for handling simple and normal ta…

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space Open

Mengqi Li, Changyao Tian, Renqiu Xia, Ning Liao, Weiwei Guo , et al. · 2025

We propose AdapTok, an adaptive temporal causal video tokenizer that can flexibly allocate tokens for different frames based on video content. AdapTok is equipped with a block-wise masking strategy that randomly drops tail tokens of each b…

New Evidence of the Two-Phase Learning Dynamics of Neural Networks Open

Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan · 2025

Understanding how deep neural networks learn remains a fundamental challenge in modern machine learning. A growing body of evidence suggests that training dynamics undergo a distinct phase transition, yet our understanding of this transiti…

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving Open

Qi Liu, Xinhao Zheng, Renqiu Xia, Xiangtong Qi, Qinxiang Cao , et al. · 2025

As a seemingly self-explanatory task, problem-solving has been a significant component of science and engineering. However, a general yet concrete formulation of problem-solving itself is missing. With the recent development of AI-based pr…

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions Open

Chia Lun Fan, Xiaosong Jia, Yiwen Sun, Yixiao Wang, Jun Wei , et al. · 2025

The rise of foundation models paves the way for generalist robot policies in the physical world. Existing methods relying on text-only instructions often struggle to generalize to unseen scenarios. We argue that interleaved image-text inpu…

TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving Open

Daocheng Fu, Jianlong Chen, Renqiu Xia, Zijun Chen, Qi Liu , et al. · 2025

Mathematical geometric problem solving (GPS) demands verifiable logical coherence and multimodal reasoning capabilities. While large language models (LLMs) have shown rapid progress in GPS, their advancement is hindered by the lack of reli…

Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning Open

Xiaolei Chen, Junchi Yan, Wenlong Liao, Tao He, Pai Peng · 2025

Motion planning is a critical module in autonomous driving, with the primary challenge of uncertainty caused by interactions with other participants. As most previous methods treat prediction and planning as separate tasks, it is difficult…

On the Cone Effect in the Learning Dynamics Open

Zhanpeng Zhou, Yongyi Yang, Jie Ren, Mahito Sugiyama, Junchi Yan · 2025

Understanding the learning dynamics of neural networks is a central topic in the deep learning community. In this paper, we take an empirical perspective to study the learning dynamics of neural networks in real-world settings. Specificall…

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving Open

Xiaosong Jia, Junqi You, Zhiyuan Zhang, Junchi Yan · 2025

End-to-end autonomous driving (E2E-AD) has emerged as a trend in the field of autonomous driving, promising a data-driven, scalable approach to system design. However, existing E2E-AD methods usually adopt the sequential paradigm of percep…

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Open

Nancy Y. C. Yang, Pandeng Li, Liming Zhao, Yang Li, Chen-Wei Xie , et al. · 2025

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage tra…

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training Open

Jinbo Wang, Mingze Wang, Zhanpeng Zhou, Junchi Yan, E Weinan , et al. · 2025

Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is imp…

Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection Open

Yi Yu, Xue Yang, Yansheng Li, Zhenjun Han, Feipeng Da , et al. · 2025

Accurately estimating the orientation of visual objects with compact rotated bounding boxes (RBoxes) has become a prominent demand, which challenges existing object detection paradigms that only use horizontal bounding boxes (HBoxes). To e…

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances Open

Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo , et al. · 2025

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging ta…

Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization Open

Yang Li, Juan Guo, Runzhong Wang, Hongyuan Zha, Junchi Yan · 2025

Diffusion models have recently advanced Combinatorial Optimization (CO) as a powerful backbone for neural solvers. However, their iterative sampling process requiring denoising across multiple noise levels incurs substantial overhead. We p…

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection Open

Peiyuan Zhang, Junwei Luo, Xue Yang, Yi Yu, Qingyun Li , et al. · 2025

With the growing demand for oriented object detection (OOD), recent studies on point-supervised OOD have attracted significant interest. In this paper, we propose PointOBB-v3, a stronger single point-supervised OOD framework. Compared to e…

Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning Open

Xiaolei Chen, Junchi Yan, Wenlong Liao, Tao He, Pai Peng · 2025

Motion planning is a critical module in autonomous driving, with the primary challenge of uncertainty caused by interactions with other participants. As most previous methods treat prediction and planning as separate tasks, it is difficult…

Efficient Packaging Line Object Counting by Cross-Frame Association With Wavelet Convolutions and Trajectory Compensation Open

Long Wei, Yutao Zhu, Yufeng Li, Ming Qian, Xiang Zuo , et al. · 2025

Real-time object counting in the industry pipeline is critical for improving efficiency and accuracy in industries like manufacturing and logistics. This paper introduces a novel multi-object association method, namely tracking method, whi…

Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives Open

Zhihu Wang, Shiwan Zhao, Yu Wang, Heyuan Huang, Sitao Xie , et al. · 2025

Knowledge-Empowered, Collaborative, and Co-Evolving AI Models: The Post-LLM Roadmap Open

Fei Wu, Tao Shen, Thomas Bäck, Jingyuan Chen, Gang Huang , et al. · 2024

Large language models (LLMs) have significantly advanced artificial intelligence (AI) by excelling in tasks such as understanding, generation, and reasoning across multiple modalities. Despite these achievements, LLMs have inherent limitat…

Junchi Yan YOU? Author Swipe