Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2501.11411
Coupling Large Language Models (LLMs) with Evolutionary Algorithms has recently shown significant promise as a technique to design new heuristics that outperform existing methods, particularly in the field of combinatorial optimisation. An escalating arms race is both rapidly producing new heuristics and improving the efficiency of the processes evolving them. However, driven by the desire to quickly demonstrate the superiority of new approaches, evaluation of the new heuristics produced for a specific domain is often cursory: testing on very few datasets in which instances all belong to a specific class from the domain, and on few instances per class. Taking bin-packing as an example, to the best of our knowledge we conduct the first rigorous benchmarking study of new LLM-generated heuristics, comparing them to well-known existing heuristics across a large suite of benchmark instances using three performance metrics. For each heuristic, we then evolve new instances won by the heuristic and perform an instance space analysis to understand where in the feature space each heuristic performs well. We show that most of the LLM heuristics do not generalise well when evaluated across a broad range of benchmarks in contrast to existing simple heuristics, and suggest that any gains from generating very specialist heuristics that only work in small areas of the instance space need to be weighed carefully against the considerable cost of generating these heuristics.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2501.11411
- https://arxiv.org/pdf/2501.11411
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406735520
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406735520Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2501.11411Digital Object Identifier
- Title
-
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin PackingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-01-20Full publication date if available
- Authors
-
Kevin Sim, Quentin Renau, Emma HartList of authors in order
- Landing page
-
https://arxiv.org/abs/2501.11411Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2501.11411Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2501.11411Direct OA link when available
- Concepts
-
Benchmarking, Heuristics, Bin packing problem, Bin, Computer science, Business, Marketing, Algorithm, Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406735520 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2501.11411 |
| ids.doi | https://doi.org/10.48550/arxiv.2501.11411 |
| ids.openalex | https://openalex.org/W4406735520 |
| fwci | |
| type | preprint |
| title | Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12176 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9948999881744385 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2209 |
| topics[0].subfield.display_name | Industrial and Manufacturing Engineering |
| topics[0].display_name | Optimization and Packing Problems |
| topics[1].id | https://openalex.org/T11814 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9932000041007996 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2209 |
| topics[1].subfield.display_name | Industrial and Manufacturing Engineering |
| topics[1].display_name | Advanced Manufacturing and Logistics Optimization |
| topics[2].id | https://openalex.org/T11159 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9878000020980835 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2209 |
| topics[2].subfield.display_name | Industrial and Manufacturing Engineering |
| topics[2].display_name | Manufacturing Process and Optimization |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C86251818 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8720006942749023 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q816754 |
| concepts[0].display_name | Benchmarking |
| concepts[1].id | https://openalex.org/C127705205 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7815362215042114 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5748245 |
| concepts[1].display_name | Heuristics |
| concepts[2].id | https://openalex.org/C87219788 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6728319525718689 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q814581 |
| concepts[2].display_name | Bin packing problem |
| concepts[3].id | https://openalex.org/C156273044 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6234830021858215 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q4913766 |
| concepts[3].display_name | Bin |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.501751184463501 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C144133560 |
| concepts[5].level | 0 |
| concepts[5].score | 0.3534632921218872 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[5].display_name | Business |
| concepts[6].id | https://openalex.org/C162853370 |
| concepts[6].level | 1 |
| concepts[6].score | 0.19137918949127197 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q39809 |
| concepts[6].display_name | Marketing |
| concepts[7].id | https://openalex.org/C11413529 |
| concepts[7].level | 1 |
| concepts[7].score | 0.13258403539657593 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[7].display_name | Algorithm |
| concepts[8].id | https://openalex.org/C111919701 |
| concepts[8].level | 1 |
| concepts[8].score | 0.09228608012199402 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[8].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/benchmarking |
| keywords[0].score | 0.8720006942749023 |
| keywords[0].display_name | Benchmarking |
| keywords[1].id | https://openalex.org/keywords/heuristics |
| keywords[1].score | 0.7815362215042114 |
| keywords[1].display_name | Heuristics |
| keywords[2].id | https://openalex.org/keywords/bin-packing-problem |
| keywords[2].score | 0.6728319525718689 |
| keywords[2].display_name | Bin packing problem |
| keywords[3].id | https://openalex.org/keywords/bin |
| keywords[3].score | 0.6234830021858215 |
| keywords[3].display_name | Bin |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.501751184463501 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/business |
| keywords[5].score | 0.3534632921218872 |
| keywords[5].display_name | Business |
| keywords[6].id | https://openalex.org/keywords/marketing |
| keywords[6].score | 0.19137918949127197 |
| keywords[6].display_name | Marketing |
| keywords[7].id | https://openalex.org/keywords/algorithm |
| keywords[7].score | 0.13258403539657593 |
| keywords[7].display_name | Algorithm |
| keywords[8].id | https://openalex.org/keywords/operating-system |
| keywords[8].score | 0.09228608012199402 |
| keywords[8].display_name | Operating system |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2501.11411 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2501.11411 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2501.11411 |
| locations[1].id | doi:10.48550/arxiv.2501.11411 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2501.11411 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5059180650 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6555-7721 |
| authorships[0].author.display_name | Kevin Sim |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sim, Kevin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5069347164 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2487-981X |
| authorships[1].author.display_name | Quentin Renau |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Renau, Quentin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5006992974 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5405-4413 |
| authorships[2].author.display_name | Emma Hart |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Hart, Emma |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2501.11411 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12176 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9948999881744385 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2209 |
| primary_topic.subfield.display_name | Industrial and Manufacturing Engineering |
| primary_topic.display_name | Optimization and Packing Problems |
| related_works | https://openalex.org/W2797538013, https://openalex.org/W2026961896, https://openalex.org/W173368591, https://openalex.org/W2034272113, https://openalex.org/W2138242427, https://openalex.org/W2104955674, https://openalex.org/W4288260564, https://openalex.org/W2068892231, https://openalex.org/W2096053066, https://openalex.org/W2001915926 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2501.11411 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2501.11411 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2501.11411 |
| primary_location.id | pmh:oai:arXiv.org:2501.11411 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2501.11411 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2501.11411 |
| publication_date | 2025-01-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 14, 70, 87, 128, 182 |
| abstract_inverted_index.An | 31 |
| abstract_inverted_index.We | 167 |
| abstract_inverted_index.an | 102, 152 |
| abstract_inverted_index.as | 13, 101 |
| abstract_inverted_index.be | 215 |
| abstract_inverted_index.by | 52, 147 |
| abstract_inverted_index.do | 175 |
| abstract_inverted_index.in | 25, 81, 159, 187, 206 |
| abstract_inverted_index.is | 35, 73 |
| abstract_inverted_index.of | 28, 45, 60, 64, 107, 117, 131, 171, 185, 209, 222 |
| abstract_inverted_index.on | 77, 94 |
| abstract_inverted_index.to | 16, 55, 86, 104, 123, 156, 189, 214 |
| abstract_inverted_index.we | 110, 141 |
| abstract_inverted_index.For | 138 |
| abstract_inverted_index.LLM | 173 |
| abstract_inverted_index.all | 84 |
| abstract_inverted_index.and | 41, 93, 150, 193 |
| abstract_inverted_index.any | 196 |
| abstract_inverted_index.few | 79, 95 |
| abstract_inverted_index.for | 69 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.new | 18, 39, 61, 66, 118, 144 |
| abstract_inverted_index.not | 176 |
| abstract_inverted_index.our | 108 |
| abstract_inverted_index.per | 97 |
| abstract_inverted_index.the | 26, 43, 46, 53, 58, 65, 91, 105, 112, 148, 160, 172, 210, 219 |
| abstract_inverted_index.won | 146 |
| abstract_inverted_index.arms | 33 |
| abstract_inverted_index.best | 106 |
| abstract_inverted_index.both | 36 |
| abstract_inverted_index.cost | 221 |
| abstract_inverted_index.each | 139, 163 |
| abstract_inverted_index.from | 90, 198 |
| abstract_inverted_index.most | 170 |
| abstract_inverted_index.need | 213 |
| abstract_inverted_index.only | 204 |
| abstract_inverted_index.race | 34 |
| abstract_inverted_index.show | 168 |
| abstract_inverted_index.that | 20, 169, 195, 203 |
| abstract_inverted_index.them | 122 |
| abstract_inverted_index.then | 142 |
| abstract_inverted_index.very | 78, 200 |
| abstract_inverted_index.well | 178 |
| abstract_inverted_index.when | 179 |
| abstract_inverted_index.with | 5 |
| abstract_inverted_index.work | 205 |
| abstract_inverted_index.Large | 1 |
| abstract_inverted_index.areas | 208 |
| abstract_inverted_index.broad | 183 |
| abstract_inverted_index.class | 89 |
| abstract_inverted_index.field | 27 |
| abstract_inverted_index.first | 113 |
| abstract_inverted_index.gains | 197 |
| abstract_inverted_index.large | 129 |
| abstract_inverted_index.often | 74 |
| abstract_inverted_index.range | 184 |
| abstract_inverted_index.shown | 10 |
| abstract_inverted_index.small | 207 |
| abstract_inverted_index.space | 154, 162, 212 |
| abstract_inverted_index.study | 116 |
| abstract_inverted_index.suite | 130 |
| abstract_inverted_index.them. | 49 |
| abstract_inverted_index.these | 224 |
| abstract_inverted_index.three | 135 |
| abstract_inverted_index.using | 134 |
| abstract_inverted_index.well. | 166 |
| abstract_inverted_index.where | 158 |
| abstract_inverted_index.which | 82 |
| abstract_inverted_index.(LLMs) | 4 |
| abstract_inverted_index.Models | 3 |
| abstract_inverted_index.Taking | 99 |
| abstract_inverted_index.across | 127, 181 |
| abstract_inverted_index.belong | 85 |
| abstract_inverted_index.class. | 98 |
| abstract_inverted_index.design | 17 |
| abstract_inverted_index.desire | 54 |
| abstract_inverted_index.domain | 72 |
| abstract_inverted_index.driven | 51 |
| abstract_inverted_index.evolve | 143 |
| abstract_inverted_index.simple | 191 |
| abstract_inverted_index.against | 218 |
| abstract_inverted_index.conduct | 111 |
| abstract_inverted_index.domain, | 92 |
| abstract_inverted_index.feature | 161 |
| abstract_inverted_index.perform | 151 |
| abstract_inverted_index.promise | 12 |
| abstract_inverted_index.quickly | 56 |
| abstract_inverted_index.rapidly | 37 |
| abstract_inverted_index.suggest | 194 |
| abstract_inverted_index.testing | 76 |
| abstract_inverted_index.weighed | 216 |
| abstract_inverted_index.Coupling | 0 |
| abstract_inverted_index.However, | 50 |
| abstract_inverted_index.Language | 2 |
| abstract_inverted_index.analysis | 155 |
| abstract_inverted_index.contrast | 188 |
| abstract_inverted_index.cursory: | 75 |
| abstract_inverted_index.datasets | 80 |
| abstract_inverted_index.evolving | 48 |
| abstract_inverted_index.example, | 103 |
| abstract_inverted_index.existing | 22, 125, 190 |
| abstract_inverted_index.instance | 153, 211 |
| abstract_inverted_index.methods, | 23 |
| abstract_inverted_index.metrics. | 137 |
| abstract_inverted_index.performs | 165 |
| abstract_inverted_index.produced | 68 |
| abstract_inverted_index.recently | 9 |
| abstract_inverted_index.rigorous | 114 |
| abstract_inverted_index.specific | 71, 88 |
| abstract_inverted_index.benchmark | 132 |
| abstract_inverted_index.carefully | 217 |
| abstract_inverted_index.comparing | 121 |
| abstract_inverted_index.evaluated | 180 |
| abstract_inverted_index.heuristic | 149, 164 |
| abstract_inverted_index.improving | 42 |
| abstract_inverted_index.instances | 83, 96, 133, 145 |
| abstract_inverted_index.knowledge | 109 |
| abstract_inverted_index.processes | 47 |
| abstract_inverted_index.producing | 38 |
| abstract_inverted_index.technique | 15 |
| abstract_inverted_index.Algorithms | 7 |
| abstract_inverted_index.benchmarks | 186 |
| abstract_inverted_index.efficiency | 44 |
| abstract_inverted_index.escalating | 32 |
| abstract_inverted_index.evaluation | 63 |
| abstract_inverted_index.generalise | 177 |
| abstract_inverted_index.generating | 199, 223 |
| abstract_inverted_index.heuristic, | 140 |
| abstract_inverted_index.heuristics | 19, 40, 67, 126, 174, 202 |
| abstract_inverted_index.outperform | 21 |
| abstract_inverted_index.specialist | 201 |
| abstract_inverted_index.understand | 157 |
| abstract_inverted_index.well-known | 124 |
| abstract_inverted_index.approaches, | 62 |
| abstract_inverted_index.bin-packing | 100 |
| abstract_inverted_index.demonstrate | 57 |
| abstract_inverted_index.heuristics, | 120, 192 |
| abstract_inverted_index.heuristics. | 225 |
| abstract_inverted_index.performance | 136 |
| abstract_inverted_index.significant | 11 |
| abstract_inverted_index.superiority | 59 |
| abstract_inverted_index.Evolutionary | 6 |
| abstract_inverted_index.benchmarking | 115 |
| abstract_inverted_index.considerable | 220 |
| abstract_inverted_index.particularly | 24 |
| abstract_inverted_index.LLM-generated | 119 |
| abstract_inverted_index.combinatorial | 29 |
| abstract_inverted_index.optimisation. | 30 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |