LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2512.04474
Log parsing transforms raw logs into structured templates containing constants and variables. It underpins anomaly detection, failure diagnosis, and other AIOps tasks. Current parsers are mostly reactive and log-centric. They only infer templates from logs, mostly overlooking the source code. This restricts their capacity to grasp dynamic log structures or adjust to evolving systems. Moreover, per-log LLM inference is too costly for practical deployment. In this paper, we propose LLM-SrcLog, a proactive and unified framework for log template parsing. It extracts templates directly from source code prior to deployment and supplements them with data-driven parsing for logs without available code. LLM-SrcLog integrates a cross-function static code analyzer to reconstruct meaningful logging contexts, an LLM-based white-box template extractor with post-processing to distinguish constants from variables, and a black-box template extractor that incorporates data-driven clustering for remaining unmatched logs. Experiments on two public benchmarks (Hadoop and Zookeeper) and a large-scale industrial system (Sunfire-Compute) show that, compared to two LLM-based baselines, LLM-SrcLog improves average F1-score by 2-17% and 8-35%. Meanwhile, its online parsing latency is comparable to data-driven methods and about 1,000 times faster than per-log LLM parsing. LLM-SrcLog achieves a near-ideal balance between speed and accuracy. Finally, we further validate the effectiveness of LLM-SrcLog through practical case studies in a real-world production environment.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2512.04474
- https://arxiv.org/pdf/2512.04474
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4417086746
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4417086746Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2512.04474Digital Object Identifier
- Title
-
LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-04Full publication date if available
- Authors
-
Jiaqi Sun, Wei Li, Cheng Ding, Shiyou Qian, Jianliang CaoList of authors in order
- Landing page
-
https://arxiv.org/abs/2512.04474Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2512.04474Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2512.04474Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4417086746 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2512.04474 |
| ids.doi | https://doi.org/10.48550/arxiv.2512.04474 |
| ids.openalex | https://openalex.org/W4417086746 |
| fwci | |
| type | preprint |
| title | LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2512.04474 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2512.04474 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2512.04474 |
| locations[1].id | doi:10.48550/arxiv.2512.04474 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2512.04474 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102025199 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0294-0122 |
| authorships[0].author.display_name | Jiaqi Sun |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sun, Jiaqi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5072580225 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6786-532X |
| authorships[1].author.display_name | Wei Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Wei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5046668332 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7244-3535 |
| authorships[2].author.display_name | Cheng Ding |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ding, Chutong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5041333646 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7775-1740 |
| authorships[3].author.display_name | Shiyou Qian |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Qian, Shiyou |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5049265273 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-3570-5185 |
| authorships[4].author.display_name | Jianliang Cao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Cao, Jian |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2512.04474 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-06T00:00:00 |
| display_name | LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-07T09:55:26.987075 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2512.04474 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2512.04474 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2512.04474 |
| primary_location.id | pmh:oai:arXiv.org:2512.04474 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2512.04474 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2512.04474 |
| publication_date | 2025-12-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 70, 102, 125, 146, 187, 207 |
| abstract_inverted_index.In | 64 |
| abstract_inverted_index.It | 12, 79 |
| abstract_inverted_index.an | 112 |
| abstract_inverted_index.by | 162 |
| abstract_inverted_index.in | 206 |
| abstract_inverted_index.is | 58, 171 |
| abstract_inverted_index.of | 200 |
| abstract_inverted_index.on | 138 |
| abstract_inverted_index.or | 49 |
| abstract_inverted_index.to | 44, 51, 87, 107, 119, 154, 173 |
| abstract_inverted_index.we | 67, 195 |
| abstract_inverted_index.LLM | 56, 183 |
| abstract_inverted_index.Log | 0 |
| abstract_inverted_index.and | 10, 18, 27, 72, 89, 124, 143, 145, 164, 176, 192 |
| abstract_inverted_index.are | 24 |
| abstract_inverted_index.for | 61, 75, 95, 133 |
| abstract_inverted_index.its | 167 |
| abstract_inverted_index.log | 47, 76 |
| abstract_inverted_index.raw | 3 |
| abstract_inverted_index.the | 37, 198 |
| abstract_inverted_index.too | 59 |
| abstract_inverted_index.two | 139, 155 |
| abstract_inverted_index.They | 29 |
| abstract_inverted_index.This | 40 |
| abstract_inverted_index.case | 204 |
| abstract_inverted_index.code | 85, 105 |
| abstract_inverted_index.from | 33, 83, 122 |
| abstract_inverted_index.into | 5 |
| abstract_inverted_index.logs | 4, 96 |
| abstract_inverted_index.only | 30 |
| abstract_inverted_index.show | 151 |
| abstract_inverted_index.than | 181 |
| abstract_inverted_index.that | 129 |
| abstract_inverted_index.them | 91 |
| abstract_inverted_index.this | 65 |
| abstract_inverted_index.with | 92, 117 |
| abstract_inverted_index.1,000 | 178 |
| abstract_inverted_index.2-17% | 163 |
| abstract_inverted_index.AIOps | 20 |
| abstract_inverted_index.about | 177 |
| abstract_inverted_index.code. | 39, 99 |
| abstract_inverted_index.grasp | 45 |
| abstract_inverted_index.infer | 31 |
| abstract_inverted_index.logs, | 34 |
| abstract_inverted_index.logs. | 136 |
| abstract_inverted_index.other | 19 |
| abstract_inverted_index.prior | 86 |
| abstract_inverted_index.speed | 191 |
| abstract_inverted_index.that, | 152 |
| abstract_inverted_index.their | 42 |
| abstract_inverted_index.times | 179 |
| abstract_inverted_index.8-35%. | 165 |
| abstract_inverted_index.adjust | 50 |
| abstract_inverted_index.costly | 60 |
| abstract_inverted_index.faster | 180 |
| abstract_inverted_index.mostly | 25, 35 |
| abstract_inverted_index.online | 168 |
| abstract_inverted_index.paper, | 66 |
| abstract_inverted_index.public | 140 |
| abstract_inverted_index.source | 38, 84 |
| abstract_inverted_index.static | 104 |
| abstract_inverted_index.system | 149 |
| abstract_inverted_index.tasks. | 21 |
| abstract_inverted_index.(Hadoop | 142 |
| abstract_inverted_index.Current | 22 |
| abstract_inverted_index.anomaly | 14 |
| abstract_inverted_index.average | 160 |
| abstract_inverted_index.balance | 189 |
| abstract_inverted_index.between | 190 |
| abstract_inverted_index.dynamic | 46 |
| abstract_inverted_index.failure | 16 |
| abstract_inverted_index.further | 196 |
| abstract_inverted_index.latency | 170 |
| abstract_inverted_index.logging | 110 |
| abstract_inverted_index.methods | 175 |
| abstract_inverted_index.parsers | 23 |
| abstract_inverted_index.parsing | 1, 94, 169 |
| abstract_inverted_index.per-log | 55, 182 |
| abstract_inverted_index.propose | 68 |
| abstract_inverted_index.studies | 205 |
| abstract_inverted_index.through | 202 |
| abstract_inverted_index.unified | 73 |
| abstract_inverted_index.without | 97 |
| abstract_inverted_index.F1-score | 161 |
| abstract_inverted_index.Finally, | 194 |
| abstract_inverted_index.achieves | 186 |
| abstract_inverted_index.analyzer | 106 |
| abstract_inverted_index.capacity | 43 |
| abstract_inverted_index.compared | 153 |
| abstract_inverted_index.directly | 82 |
| abstract_inverted_index.evolving | 52 |
| abstract_inverted_index.extracts | 80 |
| abstract_inverted_index.improves | 159 |
| abstract_inverted_index.parsing. | 78, 184 |
| abstract_inverted_index.reactive | 26 |
| abstract_inverted_index.systems. | 53 |
| abstract_inverted_index.template | 77, 115, 127 |
| abstract_inverted_index.validate | 197 |
| abstract_inverted_index.LLM-based | 113, 156 |
| abstract_inverted_index.Moreover, | 54 |
| abstract_inverted_index.accuracy. | 193 |
| abstract_inverted_index.available | 98 |
| abstract_inverted_index.black-box | 126 |
| abstract_inverted_index.constants | 9, 121 |
| abstract_inverted_index.contexts, | 111 |
| abstract_inverted_index.extractor | 116, 128 |
| abstract_inverted_index.framework | 74 |
| abstract_inverted_index.inference | 57 |
| abstract_inverted_index.practical | 62, 203 |
| abstract_inverted_index.proactive | 71 |
| abstract_inverted_index.remaining | 134 |
| abstract_inverted_index.restricts | 41 |
| abstract_inverted_index.templates | 7, 32, 81 |
| abstract_inverted_index.underpins | 13 |
| abstract_inverted_index.unmatched | 135 |
| abstract_inverted_index.white-box | 114 |
| abstract_inverted_index.LLM-SrcLog | 100, 158, 185, 201 |
| abstract_inverted_index.Meanwhile, | 166 |
| abstract_inverted_index.Zookeeper) | 144 |
| abstract_inverted_index.baselines, | 157 |
| abstract_inverted_index.benchmarks | 141 |
| abstract_inverted_index.clustering | 132 |
| abstract_inverted_index.comparable | 172 |
| abstract_inverted_index.containing | 8 |
| abstract_inverted_index.deployment | 88 |
| abstract_inverted_index.detection, | 15 |
| abstract_inverted_index.diagnosis, | 17 |
| abstract_inverted_index.industrial | 148 |
| abstract_inverted_index.integrates | 101 |
| abstract_inverted_index.meaningful | 109 |
| abstract_inverted_index.near-ideal | 188 |
| abstract_inverted_index.production | 209 |
| abstract_inverted_index.real-world | 208 |
| abstract_inverted_index.structured | 6 |
| abstract_inverted_index.structures | 48 |
| abstract_inverted_index.transforms | 2 |
| abstract_inverted_index.variables, | 123 |
| abstract_inverted_index.variables. | 11 |
| abstract_inverted_index.Experiments | 137 |
| abstract_inverted_index.LLM-SrcLog, | 69 |
| abstract_inverted_index.data-driven | 93, 131, 174 |
| abstract_inverted_index.deployment. | 63 |
| abstract_inverted_index.distinguish | 120 |
| abstract_inverted_index.large-scale | 147 |
| abstract_inverted_index.overlooking | 36 |
| abstract_inverted_index.reconstruct | 108 |
| abstract_inverted_index.supplements | 90 |
| abstract_inverted_index.environment. | 210 |
| abstract_inverted_index.incorporates | 130 |
| abstract_inverted_index.log-centric. | 28 |
| abstract_inverted_index.effectiveness | 199 |
| abstract_inverted_index.cross-function | 103 |
| abstract_inverted_index.post-processing | 118 |
| abstract_inverted_index.(Sunfire-Compute) | 150 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |