BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

Exploring foci of: arXiv (Cornell University) BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation February 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. While their performance is impressive, the computational footprint due to their vast number of parameters can be prohibitive. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. However, their layer-wise approach results in significant perturbation to the model's output and requires meticulous hyperparameter tuning,… Open Article Page

Pruning Computer Science Algorithm Mathematics Artificial Intelligence Biology Agronomy Open Article