Daxin Jiang
YOU?
Author Swipe
View article: LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Open
KV Cache is commonly used to accelerate LLM inference with long contexts, yet its high memory demand drives the need for cache compression. Existing compression methods, however, are largely heuristic and lack dynamic budget allocation. To…
View article: DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models
DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models Open
View article: InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers
InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers Open
View article: Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Open
The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) …
View article: Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? Open
Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute. However, can MoEs surpass dense architectures under strictly equal resource constraints - …
View article: Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models Open
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient inno…
View article: Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Open
Recent advances in reasoning language models have witnessed a paradigm shift from short to long CoT pattern. Given the substantial computational cost of rollouts in long CoT models, maximizing the utility of fixed training datasets becomes…
View article: Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning Open
Many studies focus on data annotation techniques for training effective PRMs. However, current methods encounter a significant issue when applied to long CoT reasoning processes: they tend to focus solely on the first incorrect step and al…
View article: Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing Open
In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. Thes…
View article: StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation Open
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs). RL for LLMs involves two stages: generation and training. The LLM first generates samples online, which are then used to derive rewar…
View article: Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Open
The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well\text{-}established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the …
View article: DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training Open
Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos. However, the quadratic complexity of 3D full attention remains a bottleneck in scaling DiT training, especially with high-definition, length…
View article: InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers
InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers Open
Scaling Large Language Model (LLM) training relies on multi-dimensional parallelism, where High-Bandwidth Domains (HBDs) are critical for communication-intensive parallelism like Tensor Parallelism. However, existing HBD architectures face…
View article: Investigation of Berberine's Cardioprotective Effects and Its Association With the Notch Signaling Pathway in Rat Myocardial Ischemia-Reperfusion Injury
Investigation of Berberine's Cardioprotective Effects and Its Association With the Notch Signaling Pathway in Rat Myocardial Ischemia-Reperfusion Injury Open
The study suggests an association between berberine's cardioprotective effects and Notch signaling pathway modulation in the context of myocardial ischemia-reperfusion injury. While berberine was found to bind Notch1 and affect gene and pr…
View article: Hypertext Entity Extraction in Webpage
Hypertext Entity Extraction in Webpage Open
View article: Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Open
View article: LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Open
View article: Fine-Grained Distillation for Long Document Retrieval
Fine-Grained Distillation for Long Document Retrieval Open
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in cont…
View article: Hypertext Entity Extraction in Webpage
Hypertext Entity Extraction in Webpage Open
Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual con…
View article: Nickel Foam Supported Nicomomn-Based Oxide Nanosheets for High-Performance Electrochemical Energy Storage
Nickel Foam Supported Nicomomn-Based Oxide Nanosheets for High-Performance Electrochemical Energy Storage Open
View article: Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency
Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency Open
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often strug…
View article: Instructed Language Models with Retrievers Are Powerful Entity Linkers
Instructed Language Models with Retrievers Are Powerful Entity Linkers Open
Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, t…
View article: RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation
RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation Open
Online recommender systems (RS) aim to match user needs with the vast amount\nof resources available on various platforms. A key challenge is to model user\npreferences accurately under the condition of data sparsity. To address this\nchal…
View article: Investigating the Learning Behaviour of In-Context Learning: A Comparison with Supervised Learning
Investigating the Learning Behaviour of In-Context Learning: A Comparison with Supervised Learning Open
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, ther…
View article: Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning
Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning Open
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, ther…
View article: WIERT: Web Information Extraction via Render Tree
WIERT: Web Information Extraction via Render Tree Open
Web information extraction (WIE) is a fundamental problem in web document understanding, with a significant impact on various applications. Visual information plays a crucial role in WIE tasks as the nodes containing relevant information a…
View article: A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension
A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension Open
Although great progress has been made for Machine Reading Comprehension (MRC) in English, scaling out to a large number of languages remains a huge challenge due to the lack of large amounts of annotated training data in non-English langua…
View article: WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder: Empowering Code Large Language Models with Evol-Instruct Open
Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In…
View article: Synergistic Interplay between Search and Large Language Models for Information Retrieval
Synergistic Interplay between Search and Large Language Models for Information Retrieval Open
Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language …
View article: Alleviating Over-smoothing for Unsupervised Sentence Representation
Alleviating Over-smoothing for Unsupervised Sentence Representation Open
Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promisin…