Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2507.09199
Issue-commit linking, which connects issues with commits that fix them, is crucial for software maintenance. Existing approaches have shown promise in automatically recovering these links. Evaluations of these techniques assess their ability to identify genuine links from plausible but false links. However, these evaluations overlook the fact that, in reality, when a repository has more commits, the presence of more plausible yet unrelated commits may interfere with the tool in differentiating the correct fix commits. To address this, we propose the Realistic Distribution Setting (RDS) and use it to construct a more realistic evaluation dataset that includes 20 open-source projects. By evaluating tools on this dataset, we observe that the performance of the state-of-the-art deep learning-based approach drops by more than half, while the traditional Information Retrieval method, VSM, outperforms it. Inspired by these observations, we propose EasyLink, which utilizes a vector database as a modern Information Retrieval technique. To address the long-standing problem of the semantic gap between issues and commits, EasyLink leverages a large language model to rerank the commits retrieved from the database. Under our evaluation, EasyLink achieves an average Precision@1 of 75.03\%, improving over the state-of-the-art by over four times. Additionally, this paper provides practical guidelines for advancing research in issue-commit link recovery.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2507.09199
- https://arxiv.org/pdf/2507.09199
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414691251
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414691251Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2507.09199Digital Object Identifier
- Title
-
Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted RetrievalWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-07-12Full publication date if available
- Authors
-
Huihui Huang, Ratnadira Widyasari, Ting Zhang, Ivana Clairine Irsan, Jieke Shi, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David LoList of authors in order
- Landing page
-
https://arxiv.org/abs/2507.09199Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2507.09199Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2507.09199Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414691251 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2507.09199 |
| ids.doi | https://doi.org/10.48550/arxiv.2507.09199 |
| ids.openalex | https://openalex.org/W4414691251 |
| fwci | |
| type | preprint |
| title | Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T14330 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9387000203132629 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Library Science and Information Systems |
| topics[1].id | https://openalex.org/T11719 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9239000082015991 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Data Quality and Management |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9021999835968018 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2507.09199 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2507.09199 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2507.09199 |
| locations[1].id | doi:10.48550/arxiv.2507.09199 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2507.09199 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101920796 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8596-7226 |
| authorships[0].author.display_name | Huihui Huang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Huang, Huihui |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009224648 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8190-5458 |
| authorships[1].author.display_name | Ratnadira Widyasari |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Widyasari, Ratnadira |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5113671760 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9066-5625 |
| authorships[2].author.display_name | Ting Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Ting |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5059690191 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-6350-2700 |
| authorships[3].author.display_name | Ivana Clairine Irsan |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Irsan, Ivana Clairine |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5002667771 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-0799-5018 |
| authorships[4].author.display_name | Jieke Shi |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Shi, Jieke |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5081714483 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Han Wei Ang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ang, Han Wei |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5115001461 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Frank Liauw |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Liauw, Frank |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5041682105 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-7759-348X |
| authorships[7].author.display_name | Eng Lieh Ouh |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Ouh, Eng Lieh |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5029828965 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-5130-0407 |
| authorships[8].author.display_name | Lwin Khin Shar |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Shar, Lwin Khin |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5027335548 |
| authorships[9].author.orcid | https://orcid.org/0000-0001-7335-7295 |
| authorships[9].author.display_name | Hong Jin Kang |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Kang, Hong Jin |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5081036622 |
| authorships[10].author.orcid | https://orcid.org/0000-0002-4367-7201 |
| authorships[10].author.display_name | David Lo |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Lo, David |
| authorships[10].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2507.09199 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T14330 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9387000203132629 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Library Science and Information Systems |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2507.09199 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2507.09199 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2507.09199 |
| primary_location.id | pmh:oai:arXiv.org:2507.09199 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2507.09199 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2507.09199 |
| publication_date | 2025-07-12 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 51, 90, 140, 144, 164 |
| abstract_inverted_index.20 | 97 |
| abstract_inverted_index.By | 100 |
| abstract_inverted_index.To | 75, 149 |
| abstract_inverted_index.an | 181 |
| abstract_inverted_index.as | 143 |
| abstract_inverted_index.by | 118, 132, 190 |
| abstract_inverted_index.in | 20, 48, 69, 203 |
| abstract_inverted_index.is | 10 |
| abstract_inverted_index.it | 87 |
| abstract_inverted_index.of | 26, 58, 111, 154, 184 |
| abstract_inverted_index.on | 103 |
| abstract_inverted_index.to | 32, 88, 168 |
| abstract_inverted_index.we | 78, 106, 135 |
| abstract_inverted_index.and | 85, 160 |
| abstract_inverted_index.but | 38 |
| abstract_inverted_index.fix | 8, 73 |
| abstract_inverted_index.for | 12, 200 |
| abstract_inverted_index.gap | 157 |
| abstract_inverted_index.has | 53 |
| abstract_inverted_index.it. | 130 |
| abstract_inverted_index.may | 64 |
| abstract_inverted_index.our | 177 |
| abstract_inverted_index.the | 45, 56, 67, 71, 80, 109, 112, 123, 151, 155, 170, 174, 188 |
| abstract_inverted_index.use | 86 |
| abstract_inverted_index.yet | 61 |
| abstract_inverted_index.VSM, | 128 |
| abstract_inverted_index.deep | 114 |
| abstract_inverted_index.fact | 46 |
| abstract_inverted_index.four | 192 |
| abstract_inverted_index.from | 36, 173 |
| abstract_inverted_index.have | 17 |
| abstract_inverted_index.link | 205 |
| abstract_inverted_index.more | 54, 59, 91, 119 |
| abstract_inverted_index.over | 187, 191 |
| abstract_inverted_index.than | 120 |
| abstract_inverted_index.that | 7, 95, 108 |
| abstract_inverted_index.this | 104, 195 |
| abstract_inverted_index.tool | 68 |
| abstract_inverted_index.when | 50 |
| abstract_inverted_index.with | 5, 66 |
| abstract_inverted_index.(RDS) | 84 |
| abstract_inverted_index.Under | 176 |
| abstract_inverted_index.drops | 117 |
| abstract_inverted_index.false | 39 |
| abstract_inverted_index.half, | 121 |
| abstract_inverted_index.large | 165 |
| abstract_inverted_index.links | 35 |
| abstract_inverted_index.model | 167 |
| abstract_inverted_index.paper | 196 |
| abstract_inverted_index.shown | 18 |
| abstract_inverted_index.that, | 47 |
| abstract_inverted_index.their | 30 |
| abstract_inverted_index.them, | 9 |
| abstract_inverted_index.these | 23, 27, 42, 133 |
| abstract_inverted_index.this, | 77 |
| abstract_inverted_index.tools | 102 |
| abstract_inverted_index.which | 2, 138 |
| abstract_inverted_index.while | 122 |
| abstract_inverted_index.assess | 29 |
| abstract_inverted_index.issues | 4, 159 |
| abstract_inverted_index.links. | 24, 40 |
| abstract_inverted_index.modern | 145 |
| abstract_inverted_index.rerank | 169 |
| abstract_inverted_index.times. | 193 |
| abstract_inverted_index.vector | 141 |
| abstract_inverted_index.Setting | 83 |
| abstract_inverted_index.ability | 31 |
| abstract_inverted_index.address | 76, 150 |
| abstract_inverted_index.average | 182 |
| abstract_inverted_index.between | 158 |
| abstract_inverted_index.commits | 6, 63, 171 |
| abstract_inverted_index.correct | 72 |
| abstract_inverted_index.crucial | 11 |
| abstract_inverted_index.dataset | 94 |
| abstract_inverted_index.genuine | 34 |
| abstract_inverted_index.method, | 127 |
| abstract_inverted_index.observe | 107 |
| abstract_inverted_index.problem | 153 |
| abstract_inverted_index.promise | 19 |
| abstract_inverted_index.propose | 79, 136 |
| abstract_inverted_index.75.03\%, | 185 |
| abstract_inverted_index.EasyLink | 162, 179 |
| abstract_inverted_index.Existing | 15 |
| abstract_inverted_index.However, | 41 |
| abstract_inverted_index.Inspired | 131 |
| abstract_inverted_index.achieves | 180 |
| abstract_inverted_index.approach | 116 |
| abstract_inverted_index.commits, | 55, 161 |
| abstract_inverted_index.commits. | 74 |
| abstract_inverted_index.connects | 3 |
| abstract_inverted_index.database | 142 |
| abstract_inverted_index.dataset, | 105 |
| abstract_inverted_index.identify | 33 |
| abstract_inverted_index.includes | 96 |
| abstract_inverted_index.language | 166 |
| abstract_inverted_index.linking, | 1 |
| abstract_inverted_index.overlook | 44 |
| abstract_inverted_index.presence | 57 |
| abstract_inverted_index.provides | 197 |
| abstract_inverted_index.reality, | 49 |
| abstract_inverted_index.research | 202 |
| abstract_inverted_index.semantic | 156 |
| abstract_inverted_index.software | 13 |
| abstract_inverted_index.utilizes | 139 |
| abstract_inverted_index.EasyLink, | 137 |
| abstract_inverted_index.Realistic | 81 |
| abstract_inverted_index.Retrieval | 126, 147 |
| abstract_inverted_index.advancing | 201 |
| abstract_inverted_index.construct | 89 |
| abstract_inverted_index.database. | 175 |
| abstract_inverted_index.improving | 186 |
| abstract_inverted_index.interfere | 65 |
| abstract_inverted_index.leverages | 163 |
| abstract_inverted_index.plausible | 37, 60 |
| abstract_inverted_index.practical | 198 |
| abstract_inverted_index.projects. | 99 |
| abstract_inverted_index.realistic | 92 |
| abstract_inverted_index.recovery. | 206 |
| abstract_inverted_index.retrieved | 172 |
| abstract_inverted_index.unrelated | 62 |
| abstract_inverted_index.approaches | 16 |
| abstract_inverted_index.evaluating | 101 |
| abstract_inverted_index.evaluation | 93 |
| abstract_inverted_index.guidelines | 199 |
| abstract_inverted_index.recovering | 22 |
| abstract_inverted_index.repository | 52 |
| abstract_inverted_index.technique. | 148 |
| abstract_inverted_index.techniques | 28 |
| abstract_inverted_index.Evaluations | 25 |
| abstract_inverted_index.Information | 125, 146 |
| abstract_inverted_index.Precision@1 | 183 |
| abstract_inverted_index.evaluation, | 178 |
| abstract_inverted_index.evaluations | 43 |
| abstract_inverted_index.open-source | 98 |
| abstract_inverted_index.outperforms | 129 |
| abstract_inverted_index.performance | 110 |
| abstract_inverted_index.traditional | 124 |
| abstract_inverted_index.Distribution | 82 |
| abstract_inverted_index.Issue-commit | 0 |
| abstract_inverted_index.issue-commit | 204 |
| abstract_inverted_index.maintenance. | 14 |
| abstract_inverted_index.Additionally, | 194 |
| abstract_inverted_index.automatically | 21 |
| abstract_inverted_index.long-standing | 152 |
| abstract_inverted_index.observations, | 134 |
| abstract_inverted_index.learning-based | 115 |
| abstract_inverted_index.differentiating | 70 |
| abstract_inverted_index.state-of-the-art | 113, 189 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| citation_normalized_percentile |