Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.12608
Despite recent advances in Large Language Models (LLMs) for code generation, the quality of LLM-generated code still faces significant challenges. One significant issue is code repetition, which refers to the model's tendency to generate structurally redundant code, resulting in inefficiencies and reduced readability. To address this, we conduct the first empirical study to investigate the prevalence and nature of repetition across 19 state-of-the-art code LLMs using three widely-used benchmarks. Our study includes both quantitative and qualitative analyses, revealing that repetition is pervasive and manifests at various granularities and extents, including character, statement, and block levels. We further summarize a taxonomy of 20 repetition patterns. Building on our findings, we propose DeRep, a rule-based technique designed to detect and mitigate repetition in generated code. We evaluate DeRep using both open-source benchmarks and in an industrial setting. Our results demonstrate that DeRep significantly outperforms baselines in reducing repetition (with an average improvements of 91.3%, 93.5%, and 79.9% in rep-3, rep-line, and sim-line metrics) and enhancing code quality (with a Pass@1 increase of 208.3% over greedy search). Furthermore, integrating DeRep improves the performance of existing repetition mitigation methods, with Pass@1 improvements ranging from 53.7% to 215.7%.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.12608
- https://arxiv.org/pdf/2504.12608
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4417089497
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4417089497Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2504.12608Digital Object Identifier
- Title
-
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-17Full publication date if available
- Authors
-
Mingwei Liu, Juntao Li, Xueying Du, Qiuyuan Chen, Hong Hao, Yong Xu, Fumin Zou, Xin Peng, Yiling LouList of authors in order
- Landing page
-
https://arxiv.org/abs/2504.12608Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2504.12608Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2504.12608Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4417089497 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2504.12608 |
| ids.doi | https://doi.org/10.48550/arxiv.2504.12608 |
| ids.openalex | https://openalex.org/W4417089497 |
| fwci | |
| type | preprint |
| title | Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2504.12608 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2504.12608 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2504.12608 |
| locations[1].id | doi:10.48550/arxiv.2504.12608 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2504.12608 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102154339 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Mingwei Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Mingwei |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100657514 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6286-7529 |
| authorships[1].author.display_name | Juntao Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Juntao |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5060885589 |
| authorships[2].author.orcid | https://orcid.org/0009-0005-0004-9183 |
| authorships[2].author.display_name | Xueying Du |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Du, Xueying |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5002897230 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-1240-9095 |
| authorships[3].author.display_name | Qiuyuan Chen |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chen, Qiuyuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101983101 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7509-8653 |
| authorships[4].author.display_name | Hong Hao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Wei, Zhao |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5014876751 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-4844-2460 |
| authorships[5].author.display_name | Yong Xu |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Xu, Yong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5101747184 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-4234-1861 |
| authorships[6].author.display_name | Fumin Zou |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zou, Fangming |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5101854992 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-3376-2581 |
| authorships[7].author.display_name | Xin Peng |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Peng, Xin |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5102858698 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-7814-0693 |
| authorships[8].author.display_name | Yiling Lou |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Lou, Yiling |
| authorships[8].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2504.12608 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-12-08T23:20:48.110394 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2504.12608 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2504.12608 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2504.12608 |
| primary_location.id | pmh:oai:arXiv.org:2504.12608 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2504.12608 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2504.12608 |
| publication_date | 2025-04-17 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 98, 111, 166 |
| abstract_inverted_index.19 | 61 |
| abstract_inverted_index.20 | 101 |
| abstract_inverted_index.To | 43 |
| abstract_inverted_index.We | 95, 123 |
| abstract_inverted_index.an | 132, 147 |
| abstract_inverted_index.at | 84 |
| abstract_inverted_index.in | 3, 38, 120, 131, 143, 155 |
| abstract_inverted_index.is | 23, 80 |
| abstract_inverted_index.of | 13, 58, 100, 150, 169, 180 |
| abstract_inverted_index.on | 105 |
| abstract_inverted_index.to | 28, 32, 52, 115, 191 |
| abstract_inverted_index.we | 46, 108 |
| abstract_inverted_index.One | 20 |
| abstract_inverted_index.Our | 69, 135 |
| abstract_inverted_index.and | 40, 56, 74, 82, 87, 92, 117, 130, 153, 158, 161 |
| abstract_inverted_index.for | 8 |
| abstract_inverted_index.our | 106 |
| abstract_inverted_index.the | 11, 29, 48, 54, 178 |
| abstract_inverted_index.LLMs | 64 |
| abstract_inverted_index.both | 72, 127 |
| abstract_inverted_index.code | 9, 15, 24, 63, 163 |
| abstract_inverted_index.from | 189 |
| abstract_inverted_index.over | 171 |
| abstract_inverted_index.that | 78, 138 |
| abstract_inverted_index.with | 185 |
| abstract_inverted_index.(with | 146, 165 |
| abstract_inverted_index.53.7% | 190 |
| abstract_inverted_index.79.9% | 154 |
| abstract_inverted_index.DeRep | 125, 139, 176 |
| abstract_inverted_index.Large | 4 |
| abstract_inverted_index.block | 93 |
| abstract_inverted_index.code, | 36 |
| abstract_inverted_index.code. | 122 |
| abstract_inverted_index.faces | 17 |
| abstract_inverted_index.first | 49 |
| abstract_inverted_index.issue | 22 |
| abstract_inverted_index.still | 16 |
| abstract_inverted_index.study | 51, 70 |
| abstract_inverted_index.this, | 45 |
| abstract_inverted_index.three | 66 |
| abstract_inverted_index.using | 65, 126 |
| abstract_inverted_index.which | 26 |
| abstract_inverted_index.(LLMs) | 7 |
| abstract_inverted_index.208.3% | 170 |
| abstract_inverted_index.91.3%, | 151 |
| abstract_inverted_index.93.5%, | 152 |
| abstract_inverted_index.DeRep, | 110 |
| abstract_inverted_index.Models | 6 |
| abstract_inverted_index.Pass@1 | 167, 186 |
| abstract_inverted_index.across | 60 |
| abstract_inverted_index.detect | 116 |
| abstract_inverted_index.greedy | 172 |
| abstract_inverted_index.nature | 57 |
| abstract_inverted_index.recent | 1 |
| abstract_inverted_index.refers | 27 |
| abstract_inverted_index.rep-3, | 156 |
| abstract_inverted_index.215.7%. | 192 |
| abstract_inverted_index.Despite | 0 |
| abstract_inverted_index.address | 44 |
| abstract_inverted_index.average | 148 |
| abstract_inverted_index.conduct | 47 |
| abstract_inverted_index.further | 96 |
| abstract_inverted_index.levels. | 94 |
| abstract_inverted_index.model's | 30 |
| abstract_inverted_index.propose | 109 |
| abstract_inverted_index.quality | 12, 164 |
| abstract_inverted_index.ranging | 188 |
| abstract_inverted_index.reduced | 41 |
| abstract_inverted_index.results | 136 |
| abstract_inverted_index.various | 85 |
| abstract_inverted_index.Building | 104 |
| abstract_inverted_index.Language | 5 |
| abstract_inverted_index.advances | 2 |
| abstract_inverted_index.designed | 114 |
| abstract_inverted_index.evaluate | 124 |
| abstract_inverted_index.existing | 181 |
| abstract_inverted_index.extents, | 88 |
| abstract_inverted_index.generate | 33 |
| abstract_inverted_index.improves | 177 |
| abstract_inverted_index.includes | 71 |
| abstract_inverted_index.increase | 168 |
| abstract_inverted_index.methods, | 184 |
| abstract_inverted_index.metrics) | 160 |
| abstract_inverted_index.mitigate | 118 |
| abstract_inverted_index.reducing | 144 |
| abstract_inverted_index.search). | 173 |
| abstract_inverted_index.setting. | 134 |
| abstract_inverted_index.sim-line | 159 |
| abstract_inverted_index.taxonomy | 99 |
| abstract_inverted_index.tendency | 31 |
| abstract_inverted_index.analyses, | 76 |
| abstract_inverted_index.baselines | 142 |
| abstract_inverted_index.empirical | 50 |
| abstract_inverted_index.enhancing | 162 |
| abstract_inverted_index.findings, | 107 |
| abstract_inverted_index.generated | 121 |
| abstract_inverted_index.including | 89 |
| abstract_inverted_index.manifests | 83 |
| abstract_inverted_index.patterns. | 103 |
| abstract_inverted_index.pervasive | 81 |
| abstract_inverted_index.redundant | 35 |
| abstract_inverted_index.rep-line, | 157 |
| abstract_inverted_index.resulting | 37 |
| abstract_inverted_index.revealing | 77 |
| abstract_inverted_index.summarize | 97 |
| abstract_inverted_index.technique | 113 |
| abstract_inverted_index.benchmarks | 129 |
| abstract_inverted_index.character, | 90 |
| abstract_inverted_index.industrial | 133 |
| abstract_inverted_index.mitigation | 183 |
| abstract_inverted_index.prevalence | 55 |
| abstract_inverted_index.repetition | 59, 79, 102, 119, 145, 182 |
| abstract_inverted_index.rule-based | 112 |
| abstract_inverted_index.statement, | 91 |
| abstract_inverted_index.benchmarks. | 68 |
| abstract_inverted_index.challenges. | 19 |
| abstract_inverted_index.demonstrate | 137 |
| abstract_inverted_index.generation, | 10 |
| abstract_inverted_index.integrating | 175 |
| abstract_inverted_index.investigate | 53 |
| abstract_inverted_index.open-source | 128 |
| abstract_inverted_index.outperforms | 141 |
| abstract_inverted_index.performance | 179 |
| abstract_inverted_index.qualitative | 75 |
| abstract_inverted_index.repetition, | 25 |
| abstract_inverted_index.significant | 18, 21 |
| abstract_inverted_index.widely-used | 67 |
| abstract_inverted_index.Furthermore, | 174 |
| abstract_inverted_index.improvements | 149, 187 |
| abstract_inverted_index.quantitative | 73 |
| abstract_inverted_index.readability. | 42 |
| abstract_inverted_index.structurally | 34 |
| abstract_inverted_index.LLM-generated | 14 |
| abstract_inverted_index.granularities | 86 |
| abstract_inverted_index.significantly | 140 |
| abstract_inverted_index.inefficiencies | 39 |
| abstract_inverted_index.state-of-the-art | 62 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |