Bolin Ni
YOU?
Author Swipe
View article: Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Open
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's…
View article: Practical Continual Forgetting for Pre-trained Vision Models
Practical Continual Forgetting for Pre-trained Vision Models Open
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and th…
View article: Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Open
In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection samp…
View article: Enhancing Visual Continual Learning with Language-Guided Supervision
Enhancing Visual Continual Learning with Language-Guided Supervision Open
Continual learning (CL) aims to empower models to learn new tasks without forgetting previously acquired knowledge. Most prior works concentrate on the techniques of architectures, replay data, regularization, \etc. However, the category n…
View article: Defying Imbalanced Forgetting in Class Incremental Learning
Defying Imbalanced Forgetting in Class Incremental Learning Open
We observe a high level of imbalance in the accuracy of different learned classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced f…
View article: Defying Imbalanced Forgetting in Class Incremental Learning
Defying Imbalanced Forgetting in Class Incremental Learning Open
We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgettin…
View article: Continual Forgetting for Pre-trained Vision Models
Continual Forgetting for Pre-trained Vision Models Open
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These …
View article: FP8-LM: Training FP8 Large Language Models
FP8-LM: Training FP8 Large Language Models Open
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data format…
View article: Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition Open
Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data, demonstrating remarkable "zero-shot" generalization ability for various image tasks. However, how to effect…
View article: Pro-tuning: Unified Prompt Tuning for Vision Tasks
Pro-tuning: Unified Prompt Tuning for Vision Tasks Open
In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and hea…
View article: Searching the Search Space of Vision Transformer
Searching the Search Space of Vision Transformer Open
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, …