Explanipedia

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Open

Yuhui Shen, Song Yuan, Zhi Zhang, Fangxing Li, Daxin Jiang , et al. · 2025

KV Cache is commonly used to accelerate LLM inference with long contexts, yet its high memory demand drives the need for cache compression. Existing compression methods, however, are largely heuristic and lack dynamic budget allocation. To…

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models Open

Zili Zhang, Yinmin Zhong, Yimin Jiang, Hanpeng Hu, Jianjian Sun , et al. · 2025

InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers Open

Chenchen Shou, G. M. Liu, Hao Nie, Huaiyu Meng, Yu Zhou , et al. · 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Open

Yana Wei, Liang Zhao, Jianjian Sun, Kaili Lin, Jiaqi Yin , et al. · 2025

The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) …

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? Open

Houyi Li, Ka Man Lo, Z. Wang, Zili Wang, Wenzhen Zheng , et al. · 2025

Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute. However, can MoEs surpass dense architectures under strictly equal resource constraints - …

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models Open

Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang , et al. · 2025

Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient inno…

Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Open

Zhaohui Yang, Yuxiao Ye, Shilei Jiang, Hu Chen, Linjing Li , et al. · 2025

Recent advances in reasoning language models have witnessed a paradigm shift from short to long CoT pattern. Given the substantial computational cost of rollouts in long CoT models, maximizing the utility of fixed training datasets becomes…

Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning Open

Zhaohui Yang, Chenghua He, Xiaowen Shi, Linjing Li, Qiyue Yin , et al. · 2025

Many studies focus on data annotation techniques for training effective PRMs. However, current methods encounter a significant issue when applied to long CoT reasoning processes: they tend to focus solely on the first incorrect step and al…

Step1X-Edit: A Practical Framework for General Image Editing Open

Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang , et al. · 2025

In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. Thes…

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation Open

Yinmin Zhong, Zili Zhang, Xiaoniu Song, Hanpeng Hu, Chao Jin , et al. · 2025

Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs). RL for LLMs involves two stages: generation and training. The LLM first generates samples online, which are then used to derive rewar…

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Open

Houyi Li, Wenzhen Zheng, Qiufeng Wang, H. Q. Zhang, Zili Wang , et al. · 2025

The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well\text{-}established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the …

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training Open

Xin Tan, Y. Chen, Yimin Jiang, Xing Chen, Kun Yan , et al. · 2025

Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos. However, the quadratic complexity of 3D full attention remains a bottleneck in scaling DiT training, especially with high-definition, length…

InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers Open

Chenchen Shou, G. M. Liu, Hao Nie, Huaiyu Meng, Yu Zhou , et al. · 2025

Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism. However, existing HBD architectures face…

Investigation of Berberine's Cardioprotective Effects and Its Association With the Notch Signaling Pathway in Rat Myocardial Ischemia-Reperfusion Injury Open

Daxin Jiang, Xin Tang, Haifan Yang, Hong Cheng · 2025

The study suggests an association between berberine's cardioprotective effects and Notch signaling pathway modulation in the context of myocardial ischemia-reperfusion injury. While berberine was found to bind Notch1 and affect gene and pr…

Hypertext Entity Extraction in Webpage Open

Yifei Yang, Tianqiao Liu, Bo Shao, Zhao Hai, Linjun Shou , et al. · 2025

Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Open

Zhaohui Yang, Yuxiao Ye, Shilei Jiang, Shihong Deng, Hu Chen , et al. · 2025

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Open

Yuhui Shen, Song Yuan, Zhi Zhang, Fangxing Li, Daxin Jiang , et al. · 2025

Fine-Grained Distillation for Long Document Retrieval Open

Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Jianbing Shen , et al. · 2024

Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in cont…

Hypertext Entity Extraction in Webpage Open

Yifei Yang, Tianqiao Liu, Bo Shao, Zhao Hai, Linjun Shou , et al. · 2024

Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual con…

Nickel Foam Supported Nicomomn-Based Oxide Nanosheets for High-Performance Electrochemical Energy Storage Open

Dingkun Yan, Daxin Jiang, Zongwei Du, Tong Liu, Nan Zhao , et al. · 2024

Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency Open

Zilin Xiao, Linjun Shou, Xingyao Zhang, Jie Wu, Ming Gong , et al. · 2023

Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often strug…

Instructed Language Models with Retrievers Are Powerful Entity Linkers Open

Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou , et al. · 2023

Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, t…

RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation Open

Ning Wu, Ming Gong, Linjun Shou, Jian Pei, Daxin Jiang · 2023

Online recommender systems (RS) aim to match user needs with the vast amount\nof resources available on various platforms. A key challenge is to model user\npreferences accurately under the condition of data sparsity. To address this\nchal…

Investigating the Learning Behaviour of In-Context Learning: A Comparison with Supervised Learning Open

Xindi Wang, Yufei Wang, Can Xu, Xiubo Geng, Bowen Zhang , et al. · 2023

Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, ther…

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning Open

Xindi Wang, Yufei Wang, Can Xu, Xiubo Geng, Bowen Zhang , et al. · 2023

Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, ther…

WIERT: Web Information Extraction via Render Tree Open

Zimeng Li, Bo Shao, Linjun Shou, Ming Gong, Gen Li , et al. · 2023

Web information extraction (WIE) is a fundamental problem in web document understanding, with a significant impact on various applications. Visual information plays a crucial role in WIE tasks as the nodes containing relevant information a…

A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension Open

Zenan Xu, Linjun Shou, Jian Pei, Ming Gong, Qinliang Su , et al. · 2023

Although great progress has been made for Machine Reading Comprehension (MRC) in English, scaling out to a large number of languages remains a huge challenge due to the lack of large amounts of annotated training data in non-English langua…

WizardCoder: Empowering Code Large Language Models with Evol-Instruct Open

Ziyang Luo, Can Xu, Pu Zhao, Qing‐Feng Sun, Xiubo Geng , et al. · 2023

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In…

Synergistic Interplay between Search and Large Language Models for Information Retrieval Open

Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu , et al. · 2023

Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language …

Alleviating Over-smoothing for Unsupervised Sentence Representation Open

Nuo Chen, Linjun Shou, Ming Gong, Jian Pei, Bowen Cao , et al. · 2023

Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promisin…

Daxin Jiang YOU? Author Swipe