BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2402.16880
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. While their performance is impressive, the computational footprint due to their vast number of parameters can be prohibitive. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. However, their layer-wise approach results in significant perturbation to the model's output and requires meticulous hyperparameter tuning, such as the pruning rate, which can adversely affect overall model performance. To address this, this paper introduces a novel LLM pruning technique dubbed blockwise parameter-efficient sparsity allocation (BESA) by applying a blockwise reconstruction loss. In contrast to the typical layer-wise pruning techniques, BESA is characterized by two distinctive attributes: i) it targets the overall pruning error with respect to individual transformer blocks, and ii) it allocates layer-specific sparsity in a differentiable manner, both of which ensure reduced performance degradation after pruning. Our experiments show that BESA achieves state-of-the-art performance, efficiently pruning LLMs like LLaMA1, and LLaMA2 with 7B to 70B parameters on a single A100 GPU in just five hours. Code is available at https://github.com/OpenGVLab/LLMPrune-BESA.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2402.16880
- https://arxiv.org/pdf/2402.16880
- OA Status
- green
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4392270529
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4392270529Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2402.16880Digital Object Identifier
- Title
-
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity AllocationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-02-18Full publication date if available
- Authors
-
Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping LuoList of authors in order
- Landing page
-
https://arxiv.org/abs/2402.16880Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2402.16880Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2402.16880Direct OA link when available
- Concepts
-
Pruning, Computer science, Language model, Algorithm, Mathematics, Artificial intelligence, Biology, AgronomyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 3Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4392270529 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2402.16880 |
| ids.doi | https://doi.org/10.48550/arxiv.2402.16880 |
| ids.openalex | https://openalex.org/W4392270529 |
| fwci | |
| type | preprint |
| title | BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9966999888420105 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.994700014591217 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9721999764442444 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C108010975 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8437478542327881 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q500094 |
| concepts[0].display_name | Pruning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5202832818031311 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C137293760 |
| concepts[2].level | 2 |
| concepts[2].score | 0.4446121156215668 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[2].display_name | Language model |
| concepts[3].id | https://openalex.org/C11413529 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3915506601333618 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[3].display_name | Algorithm |
| concepts[4].id | https://openalex.org/C33923547 |
| concepts[4].level | 0 |
| concepts[4].score | 0.35531318187713623 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[4].display_name | Mathematics |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3373657464981079 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C86803240 |
| concepts[6].level | 0 |
| concepts[6].score | 0.05238217115402222 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[6].display_name | Biology |
| concepts[7].id | https://openalex.org/C6557445 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q173113 |
| concepts[7].display_name | Agronomy |
| keywords[0].id | https://openalex.org/keywords/pruning |
| keywords[0].score | 0.8437478542327881 |
| keywords[0].display_name | Pruning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5202832818031311 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/language-model |
| keywords[2].score | 0.4446121156215668 |
| keywords[2].display_name | Language model |
| keywords[3].id | https://openalex.org/keywords/algorithm |
| keywords[3].score | 0.3915506601333618 |
| keywords[3].display_name | Algorithm |
| keywords[4].id | https://openalex.org/keywords/mathematics |
| keywords[4].score | 0.35531318187713623 |
| keywords[4].display_name | Mathematics |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.3373657464981079 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/biology |
| keywords[6].score | 0.05238217115402222 |
| keywords[6].display_name | Biology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2402.16880 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2402.16880 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2402.16880 |
| locations[1].id | doi:10.48550/arxiv.2402.16880 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2402.16880 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101433114 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5399-4227 |
| authorships[0].author.display_name | Peng Xu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xu, Peng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101827257 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-3781-4086 |
| authorships[1].author.display_name | Wenqi Shao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shao, Wenqi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5073044954 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Mengzhao Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Mengzhao |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5110510798 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Shitao Tang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Tang, Shitao |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5036606244 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6105-6532 |
| authorships[4].author.display_name | Kaipeng Zhang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhang, Kaipeng |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5039420666 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-2110-7427 |
| authorships[5].author.display_name | Peng Gao |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Gao, Peng |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5069123107 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-7554-7938 |
| authorships[6].author.display_name | Fengwei An |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | An, Fengwei |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100748135 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-1889-2567 |
| authorships[7].author.display_name | Yu Qiao |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Qiao, Yu |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5100752685 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-6645-4721 |
| authorships[8].author.display_name | Ping Luo |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Luo, Ping |
| authorships[8].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2402.16880 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-03-05T00:00:00 |
| display_name | BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9966999888420105 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W1979597421, https://openalex.org/W2007980826, https://openalex.org/W2061531152, https://openalex.org/W3002753104, https://openalex.org/W2077600819, https://openalex.org/W2142036596, https://openalex.org/W2072657027, https://openalex.org/W2600246793, https://openalex.org/W4238204885, https://openalex.org/W2963966623 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 3 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2402.16880 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2402.16880 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2402.16880 |
| primary_location.id | pmh:oai:arXiv.org:2402.16880 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2402.16880 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2402.16880 |
| publication_date | 2024-02-18 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 87, 100, 139, 172 |
| abstract_inverted_index.7B | 167 |
| abstract_inverted_index.In | 104 |
| abstract_inverted_index.To | 81 |
| abstract_inverted_index.as | 12, 40, 70 |
| abstract_inverted_index.at | 183 |
| abstract_inverted_index.be | 35 |
| abstract_inverted_index.by | 98, 115 |
| abstract_inverted_index.i) | 119 |
| abstract_inverted_index.in | 8, 57, 138, 176 |
| abstract_inverted_index.is | 22, 113, 181 |
| abstract_inverted_index.it | 120, 134 |
| abstract_inverted_index.of | 32, 143 |
| abstract_inverted_index.on | 171 |
| abstract_inverted_index.to | 28, 45, 60, 106, 128, 168 |
| abstract_inverted_index.70B | 169 |
| abstract_inverted_index.GPU | 175 |
| abstract_inverted_index.LLM | 89 |
| abstract_inverted_index.Our | 151 |
| abstract_inverted_index.and | 17, 42, 64, 132, 164 |
| abstract_inverted_index.can | 34, 75 |
| abstract_inverted_index.due | 27 |
| abstract_inverted_index.ii) | 133 |
| abstract_inverted_index.the | 24, 61, 71, 107, 122 |
| abstract_inverted_index.two | 116 |
| abstract_inverted_index.A100 | 174 |
| abstract_inverted_index.BESA | 112, 155 |
| abstract_inverted_index.Code | 180 |
| abstract_inverted_index.LLMs | 161 |
| abstract_inverted_index.both | 142 |
| abstract_inverted_index.etc. | 18 |
| abstract_inverted_index.five | 178 |
| abstract_inverted_index.have | 4 |
| abstract_inverted_index.just | 177 |
| abstract_inverted_index.like | 162 |
| abstract_inverted_index.show | 153 |
| abstract_inverted_index.such | 11, 39, 69 |
| abstract_inverted_index.text | 13, 15 |
| abstract_inverted_index.that | 154 |
| abstract_inverted_index.this | 47, 84 |
| abstract_inverted_index.vast | 30 |
| abstract_inverted_index.with | 126, 166 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Wanda | 43 |
| abstract_inverted_index.While | 19 |
| abstract_inverted_index.after | 149 |
| abstract_inverted_index.error | 125 |
| abstract_inverted_index.issue | 48 |
| abstract_inverted_index.loss. | 103 |
| abstract_inverted_index.model | 79 |
| abstract_inverted_index.novel | 88 |
| abstract_inverted_index.paper | 85 |
| abstract_inverted_index.rate, | 73 |
| abstract_inverted_index.their | 20, 29, 53 |
| abstract_inverted_index.this, | 83 |
| abstract_inverted_index.which | 74, 144 |
| abstract_inverted_index.(BESA) | 97 |
| abstract_inverted_index.(LLMs) | 3 |
| abstract_inverted_index.LLaMA2 | 165 |
| abstract_inverted_index.affect | 77 |
| abstract_inverted_index.dubbed | 92 |
| abstract_inverted_index.ensure | 145 |
| abstract_inverted_index.hours. | 179 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.number | 31 |
| abstract_inverted_index.output | 63 |
| abstract_inverted_index.single | 173 |
| abstract_inverted_index.tasks, | 10 |
| abstract_inverted_index.weight | 50 |
| abstract_inverted_index.LLaMA1, | 163 |
| abstract_inverted_index.address | 82 |
| abstract_inverted_index.attempt | 44 |
| abstract_inverted_index.blocks, | 131 |
| abstract_inverted_index.manner, | 141 |
| abstract_inverted_index.model's | 62 |
| abstract_inverted_index.overall | 78, 123 |
| abstract_inverted_index.pruning | 72, 90, 110, 124, 160 |
| abstract_inverted_index.reduced | 146 |
| abstract_inverted_index.respect | 127 |
| abstract_inverted_index.results | 56 |
| abstract_inverted_index.targets | 121 |
| abstract_inverted_index.through | 49 |
| abstract_inverted_index.tuning, | 68 |
| abstract_inverted_index.typical | 108 |
| abstract_inverted_index.various | 9 |
| abstract_inverted_index.Existing | 37 |
| abstract_inverted_index.However, | 52 |
| abstract_inverted_index.achieves | 156 |
| abstract_inverted_index.applying | 99 |
| abstract_inverted_index.approach | 55 |
| abstract_inverted_index.contrast | 105 |
| abstract_inverted_index.language | 1 |
| abstract_inverted_index.pruning. | 51, 150 |
| abstract_inverted_index.requires | 65 |
| abstract_inverted_index.sparsity | 95, 137 |
| abstract_inverted_index.SparseGPT | 41 |
| abstract_inverted_index.adversely | 76 |
| abstract_inverted_index.alleviate | 46 |
| abstract_inverted_index.allocates | 135 |
| abstract_inverted_index.available | 182 |
| abstract_inverted_index.blockwise | 93, 101 |
| abstract_inverted_index.footprint | 26 |
| abstract_inverted_index.solutions | 38 |
| abstract_inverted_index.technique | 91 |
| abstract_inverted_index.allocation | 96 |
| abstract_inverted_index.individual | 129 |
| abstract_inverted_index.introduces | 86 |
| abstract_inverted_index.layer-wise | 54, 109 |
| abstract_inverted_index.meticulous | 66 |
| abstract_inverted_index.parameters | 33, 170 |
| abstract_inverted_index.attributes: | 118 |
| abstract_inverted_index.degradation | 148 |
| abstract_inverted_index.distinctive | 117 |
| abstract_inverted_index.efficiently | 159 |
| abstract_inverted_index.experiments | 152 |
| abstract_inverted_index.impressive, | 23 |
| abstract_inverted_index.outstanding | 6 |
| abstract_inverted_index.performance | 7, 21, 147 |
| abstract_inverted_index.significant | 58 |
| abstract_inverted_index.techniques, | 111 |
| abstract_inverted_index.transformer | 130 |
| abstract_inverted_index.demonstrated | 5 |
| abstract_inverted_index.performance, | 158 |
| abstract_inverted_index.performance. | 80 |
| abstract_inverted_index.perturbation | 59 |
| abstract_inverted_index.prohibitive. | 36 |
| abstract_inverted_index.characterized | 114 |
| abstract_inverted_index.computational | 25 |
| abstract_inverted_index.differentiable | 140 |
| abstract_inverted_index.hyperparameter | 67 |
| abstract_inverted_index.layer-specific | 136 |
| abstract_inverted_index.reconstruction | 102 |
| abstract_inverted_index.summarization, | 14 |
| abstract_inverted_index.state-of-the-art | 157 |
| abstract_inverted_index.parameter-efficient | 94 |
| abstract_inverted_index.question-answering, | 16 |
| abstract_inverted_index.https://github.com/OpenGVLab/LLMPrune-BESA. | 184 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |