Exploring foci of:
arXiv (Cornell University)
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
February 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. While their performance is impressive, the computational footprint due to their vast number of parameters can be prohibitive. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. However, their layer-wise approach results in significant perturbation to the model's output and requires meticulous hyperparameter tuning,…
Pruning
Computer Science
Algorithm
Mathematics
Artificial Intelligence
Biology
Agronomy