Explanipedia

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Open

Ao Liu, Botong Zhou, Can Xu, C.Z. Zhou, ChenChen Zhang , et al. · 2025

As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's…

Practical Continual Forgetting for Pre-trained Vision Models Open

Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng , et al. · 2025

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and th…

Xwin-LM: Strong and Scalable Alignment Practice for LLMs Open

Bolin Ni, JingCheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang , et al. · 2024

In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection samp…

Enhancing Visual Continual Learning with Language-Guided Supervision Open

Bolin Ni, Zhao Hong-bo, Chenghao Zhang, Ke Hu, Gaofeng Meng , et al. · 2024

Continual learning (CL) aims to empower models to learn new tasks without forgetting previously acquired knowledge. Most prior works concentrate on the techniques of architectures, replay data, regularization, \etc. However, the category n…

Defying Imbalanced Forgetting in Class Incremental Learning Open

Xu Shi-xiong, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan , et al. · 2024

We observe a high level of imbalance in the accuracy of different learned classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced f…

Defying Imbalanced Forgetting in Class Incremental Learning Open

Xu Shi-xiong, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan , et al. · 2024

We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgettin…

Continual Forgetting for Pre-trained Vision Models Open

Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu , et al. · 2024

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These …

FP8-LM: Training FP8 Large Language Models Open

Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang , et al. · 2023

In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data format…

Expanding Language-Image Pretrained Models for General Video Recognition Open

Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng , et al. · 2022

Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data, demonstrating remarkable "zero-shot" generalization ability for various image tasks. However, how to effect…

Pro-tuning: Unified Prompt Tuning for Vision Tasks Open

Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo , et al. · 2022

In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and hea…

Searching the Search Space of Vision Transformer Open

Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu , et al. · 2021

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, …

Bolin Ni YOU? Author Swipe