Large Language Models and Simple, Stupid Bugs Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2303.11455
With the advent of powerful neural language models, AI-based systems to assist developers in coding tasks are becoming widely available; Copilot is one such system. Copilot uses Codex, a large language model (LLM), to complete code conditioned on a preceding "prompt". Codex, however, is trained on public GitHub repositories, viz., on code that may include bugs and vulnerabilities. Previous studies [1], [2] show Codex reproduces vulnerabilities seen in training. In this study, we examine how prone Codex is to generate an interesting bug category, single statement bugs, commonly referred to as simple, stupid bugs or SStuBs in the MSR community. We find that Codex and similar LLMs do help avoid some SStuBs, but do produce known, verbatim SStuBs as much as 2x as likely than known, verbatim correct code. We explore the consequences of the Codex generated SStuBs and propose avoidance strategies that suggest the possibility of reducing the production of known, verbatim SStubs, and increase the possibility of producing known, verbatim fixes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2303.11455
- https://arxiv.org/pdf/2303.11455
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4353113385
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4353113385Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2303.11455Digital Object Identifier
- Title
-
Large Language Models and Simple, Stupid BugsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-03-20Full publication date if available
- Authors
-
Kevin Jesse, Toufique Ahmed, Prémkumar Dévanbu, Emily MorganList of authors in order
- Landing page
-
https://arxiv.org/abs/2303.11455Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2303.11455Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2303.11455Direct OA link when available
- Concepts
-
Statement (logic), Computer science, Code (set theory), Simple (philosophy), Coding (social sciences), Programming language, Natural language processing, Artificial intelligence, Computer security, Linguistics, Sociology, Epistemology, Philosophy, Social science, Set (abstract data type)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 2Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4353113385 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2303.11455 |
| ids.doi | https://doi.org/10.48550/arxiv.2303.11455 |
| ids.openalex | https://openalex.org/W4353113385 |
| fwci | |
| type | preprint |
| title | Large Language Models and Simple, Stupid Bugs |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10260 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9890999794006348 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Software Engineering Research |
| topics[1].id | https://openalex.org/T11689 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9851999878883362 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Adversarial Robustness in Machine Learning |
| topics[2].id | https://openalex.org/T12423 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.954800009727478 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1712 |
| topics[2].subfield.display_name | Software |
| topics[2].display_name | Software Reliability and Analysis Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777026412 |
| concepts[0].level | 2 |
| concepts[0].score | 0.800804853439331 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2684591 |
| concepts[0].display_name | Statement (logic) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7221739292144775 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2776760102 |
| concepts[2].level | 3 |
| concepts[2].score | 0.7015429735183716 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[2].display_name | Code (set theory) |
| concepts[3].id | https://openalex.org/C2780586882 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6330468058586121 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7520643 |
| concepts[3].display_name | Simple (philosophy) |
| concepts[4].id | https://openalex.org/C179518139 |
| concepts[4].level | 2 |
| concepts[4].score | 0.627856969833374 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q5140297 |
| concepts[4].display_name | Coding (social sciences) |
| concepts[5].id | https://openalex.org/C199360897 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5083305239677429 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[5].display_name | Programming language |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3710100054740906 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.36296480894088745 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C38652104 |
| concepts[8].level | 1 |
| concepts[8].score | 0.34859901666641235 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[8].display_name | Computer security |
| concepts[9].id | https://openalex.org/C41895202 |
| concepts[9].level | 1 |
| concepts[9].score | 0.25571465492248535 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[9].display_name | Linguistics |
| concepts[10].id | https://openalex.org/C144024400 |
| concepts[10].level | 0 |
| concepts[10].score | 0.08439606428146362 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q21201 |
| concepts[10].display_name | Sociology |
| concepts[11].id | https://openalex.org/C111472728 |
| concepts[11].level | 1 |
| concepts[11].score | 0.07860872149467468 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[11].display_name | Epistemology |
| concepts[12].id | https://openalex.org/C138885662 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[12].display_name | Philosophy |
| concepts[13].id | https://openalex.org/C36289849 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q34749 |
| concepts[13].display_name | Social science |
| concepts[14].id | https://openalex.org/C177264268 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[14].display_name | Set (abstract data type) |
| keywords[0].id | https://openalex.org/keywords/statement |
| keywords[0].score | 0.800804853439331 |
| keywords[0].display_name | Statement (logic) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7221739292144775 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/code |
| keywords[2].score | 0.7015429735183716 |
| keywords[2].display_name | Code (set theory) |
| keywords[3].id | https://openalex.org/keywords/simple |
| keywords[3].score | 0.6330468058586121 |
| keywords[3].display_name | Simple (philosophy) |
| keywords[4].id | https://openalex.org/keywords/coding |
| keywords[4].score | 0.627856969833374 |
| keywords[4].display_name | Coding (social sciences) |
| keywords[5].id | https://openalex.org/keywords/programming-language |
| keywords[5].score | 0.5083305239677429 |
| keywords[5].display_name | Programming language |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.3710100054740906 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.36296480894088745 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/computer-security |
| keywords[8].score | 0.34859901666641235 |
| keywords[8].display_name | Computer security |
| keywords[9].id | https://openalex.org/keywords/linguistics |
| keywords[9].score | 0.25571465492248535 |
| keywords[9].display_name | Linguistics |
| keywords[10].id | https://openalex.org/keywords/sociology |
| keywords[10].score | 0.08439606428146362 |
| keywords[10].display_name | Sociology |
| keywords[11].id | https://openalex.org/keywords/epistemology |
| keywords[11].score | 0.07860872149467468 |
| keywords[11].display_name | Epistemology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2303.11455 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2303.11455 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2303.11455 |
| locations[1].id | doi:10.48550/arxiv.2303.11455 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2303.11455 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5003662912 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0484-1766 |
| authorships[0].author.display_name | Kevin Jesse |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jesse, Kevin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5072573553 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-4427-1350 |
| authorships[1].author.display_name | Toufique Ahmed |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ahmed, Toufique |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5036744986 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4346-5276 |
| authorships[2].author.display_name | Prémkumar Dévanbu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Devanbu, Premkumar T. |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5004332292 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-9486-8203 |
| authorships[3].author.display_name | Emily Morgan |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Morgan, Emily |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2303.11455 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Large Language Models and Simple, Stupid Bugs |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10260 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9890999794006348 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Software Engineering Research |
| related_works | https://openalex.org/W167088980, https://openalex.org/W2475705533, https://openalex.org/W1585007175, https://openalex.org/W186129870, https://openalex.org/W3200522959, https://openalex.org/W4389944781, https://openalex.org/W2997993211, https://openalex.org/W120415280, https://openalex.org/W2382521049, https://openalex.org/W4383099232 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 2 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2303.11455 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2303.11455 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2303.11455 |
| primary_location.id | pmh:oai:arXiv.org:2303.11455 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2303.11455 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2303.11455 |
| publication_date | 2023-03-20 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 28, 38 |
| abstract_inverted_index.2x | 121 |
| abstract_inverted_index.In | 69 |
| abstract_inverted_index.We | 100, 129 |
| abstract_inverted_index.an | 80 |
| abstract_inverted_index.as | 90, 118, 120, 122 |
| abstract_inverted_index.do | 107, 113 |
| abstract_inverted_index.in | 13, 67, 96 |
| abstract_inverted_index.is | 21, 43, 77 |
| abstract_inverted_index.of | 3, 133, 146, 150, 158 |
| abstract_inverted_index.on | 37, 45, 50 |
| abstract_inverted_index.or | 94 |
| abstract_inverted_index.to | 10, 33, 78, 89 |
| abstract_inverted_index.we | 72 |
| abstract_inverted_index.MSR | 98 |
| abstract_inverted_index.[2] | 61 |
| abstract_inverted_index.and | 56, 104, 138, 154 |
| abstract_inverted_index.are | 16 |
| abstract_inverted_index.bug | 82 |
| abstract_inverted_index.but | 112 |
| abstract_inverted_index.how | 74 |
| abstract_inverted_index.may | 53 |
| abstract_inverted_index.one | 22 |
| abstract_inverted_index.the | 1, 97, 131, 134, 144, 148, 156 |
| abstract_inverted_index.LLMs | 106 |
| abstract_inverted_index.With | 0 |
| abstract_inverted_index.[1], | 60 |
| abstract_inverted_index.bugs | 55, 93 |
| abstract_inverted_index.code | 35, 51 |
| abstract_inverted_index.find | 101 |
| abstract_inverted_index.help | 108 |
| abstract_inverted_index.much | 119 |
| abstract_inverted_index.seen | 66 |
| abstract_inverted_index.show | 62 |
| abstract_inverted_index.some | 110 |
| abstract_inverted_index.such | 23 |
| abstract_inverted_index.than | 124 |
| abstract_inverted_index.that | 52, 102, 142 |
| abstract_inverted_index.this | 70 |
| abstract_inverted_index.uses | 26 |
| abstract_inverted_index.Codex | 63, 76, 103, 135 |
| abstract_inverted_index.avoid | 109 |
| abstract_inverted_index.bugs, | 86 |
| abstract_inverted_index.code. | 128 |
| abstract_inverted_index.large | 29 |
| abstract_inverted_index.model | 31 |
| abstract_inverted_index.prone | 75 |
| abstract_inverted_index.tasks | 15 |
| abstract_inverted_index.viz., | 49 |
| abstract_inverted_index.(LLM), | 32 |
| abstract_inverted_index.Codex, | 27, 41 |
| abstract_inverted_index.GitHub | 47 |
| abstract_inverted_index.SStuBs | 95, 117, 137 |
| abstract_inverted_index.advent | 2 |
| abstract_inverted_index.assist | 11 |
| abstract_inverted_index.coding | 14 |
| abstract_inverted_index.fixes. | 162 |
| abstract_inverted_index.known, | 115, 125, 151, 160 |
| abstract_inverted_index.likely | 123 |
| abstract_inverted_index.neural | 5 |
| abstract_inverted_index.public | 46 |
| abstract_inverted_index.single | 84 |
| abstract_inverted_index.study, | 71 |
| abstract_inverted_index.stupid | 92 |
| abstract_inverted_index.widely | 18 |
| abstract_inverted_index.Copilot | 20, 25 |
| abstract_inverted_index.SStuBs, | 111 |
| abstract_inverted_index.SStubs, | 153 |
| abstract_inverted_index.correct | 127 |
| abstract_inverted_index.examine | 73 |
| abstract_inverted_index.explore | 130 |
| abstract_inverted_index.include | 54 |
| abstract_inverted_index.models, | 7 |
| abstract_inverted_index.produce | 114 |
| abstract_inverted_index.propose | 139 |
| abstract_inverted_index.similar | 105 |
| abstract_inverted_index.simple, | 91 |
| abstract_inverted_index.studies | 59 |
| abstract_inverted_index.suggest | 143 |
| abstract_inverted_index.system. | 24 |
| abstract_inverted_index.systems | 9 |
| abstract_inverted_index.trained | 44 |
| abstract_inverted_index.AI-based | 8 |
| abstract_inverted_index.Previous | 58 |
| abstract_inverted_index.becoming | 17 |
| abstract_inverted_index.commonly | 87 |
| abstract_inverted_index.complete | 34 |
| abstract_inverted_index.generate | 79 |
| abstract_inverted_index.however, | 42 |
| abstract_inverted_index.increase | 155 |
| abstract_inverted_index.language | 6, 30 |
| abstract_inverted_index.powerful | 4 |
| abstract_inverted_index.reducing | 147 |
| abstract_inverted_index.referred | 88 |
| abstract_inverted_index.verbatim | 116, 126, 152, 161 |
| abstract_inverted_index."prompt". | 40 |
| abstract_inverted_index.avoidance | 140 |
| abstract_inverted_index.category, | 83 |
| abstract_inverted_index.generated | 136 |
| abstract_inverted_index.preceding | 39 |
| abstract_inverted_index.producing | 159 |
| abstract_inverted_index.statement | 85 |
| abstract_inverted_index.training. | 68 |
| abstract_inverted_index.available; | 19 |
| abstract_inverted_index.community. | 99 |
| abstract_inverted_index.developers | 12 |
| abstract_inverted_index.production | 149 |
| abstract_inverted_index.reproduces | 64 |
| abstract_inverted_index.strategies | 141 |
| abstract_inverted_index.conditioned | 36 |
| abstract_inverted_index.interesting | 81 |
| abstract_inverted_index.possibility | 145, 157 |
| abstract_inverted_index.consequences | 132 |
| abstract_inverted_index.repositories, | 48 |
| abstract_inverted_index.vulnerabilities | 65 |
| abstract_inverted_index.vulnerabilities. | 57 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.5099999904632568 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |