SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.14976
Software logs are messages recorded during the execution of a software system that provide crucial run-time information about events and activities. Although software logs have a critical role in software maintenance and operation tasks, publicly accessible log datasets remain limited, hindering advance in log analysis research and practices. The presence of sensitive information, particularly Personally Identifiable Information (PII) and quasi-identifiers, introduces serious privacy and re-identification risks, discouraging the publishing and sharing of real-world logs. In practice, log anonymization techniques primarily rely on regular expression patterns, which involve manually crafting rules to identify and replace sensitive information. However, these regex-based approaches suffer from significant limitations, such as extensive manual efforts and poor generalizability across diverse log formats and datasets. To mitigate these limitations, we introduce SDLog, a deep learning-based framework designed to identify sensitive information in software logs. Our results show that SDLog overcomes regex limitations and outperforms the best-performing regex patterns in identifying sensitive information. With only 100 fine-tuning samples from the target dataset, SDLog can correctly identify 99.5% of sensitive attributes and achieves an F1-score of 98.4%. To the best of our knowledge, this is the first deep learning alternative to regex-based methods in software log anonymization.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.14976
- https://arxiv.org/pdf/2505.14976
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415327325
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415327325Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.14976Digital Object Identifier
- Title
-
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software LogsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-20Full publication date if available
- Authors
-
Roozbeh Aghili, Xingfang Wu, Foutse Khomh, Heng LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.14976Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.14976Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.14976Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415327325 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.14976 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.14976 |
| ids.openalex | https://openalex.org/W4415327325 |
| fwci | |
| type | preprint |
| title | SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10260 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9718000292778015 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Software Engineering Research |
| topics[1].id | https://openalex.org/T12127 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9175999760627747 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Software System Performance and Reliability |
| topics[2].id | https://openalex.org/T11512 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9067000150680542 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Anomaly Detection Techniques and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.14976 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.14976 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.14976 |
| locations[1].id | doi:10.48550/arxiv.2505.14976 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.14976 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5078321605 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-9361-2369 |
| authorships[0].author.display_name | Roozbeh Aghili |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Aghili, Roozbeh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101480041 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7040-3751 |
| authorships[1].author.display_name | Xingfang Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Xingfang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5071052367 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5704-4173 |
| authorships[2].author.display_name | Foutse Khomh |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Khomh, Foutse |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100338802 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-3187-9041 |
| authorships[3].author.display_name | Heng Li |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Li, Heng |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.14976 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-19T00:00:00 |
| display_name | SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10260 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9718000292778015 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Software Engineering Research |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.14976 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.14976 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.14976 |
| primary_location.id | pmh:oai:arXiv.org:2505.14976 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.14976 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.14976 |
| publication_date | 2025-05-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 9, 25, 125 |
| abstract_inverted_index.In | 74 |
| abstract_inverted_index.To | 118, 178 |
| abstract_inverted_index.an | 174 |
| abstract_inverted_index.as | 105 |
| abstract_inverted_index.in | 28, 42, 134, 151, 194 |
| abstract_inverted_index.is | 185 |
| abstract_inverted_index.of | 8, 50, 71, 169, 176, 181 |
| abstract_inverted_index.on | 81 |
| abstract_inverted_index.to | 90, 130, 191 |
| abstract_inverted_index.we | 122 |
| abstract_inverted_index.100 | 157 |
| abstract_inverted_index.Our | 137 |
| abstract_inverted_index.The | 48 |
| abstract_inverted_index.and | 19, 31, 46, 58, 63, 69, 92, 109, 116, 145, 172 |
| abstract_inverted_index.are | 2 |
| abstract_inverted_index.can | 165 |
| abstract_inverted_index.log | 36, 43, 76, 114, 196 |
| abstract_inverted_index.our | 182 |
| abstract_inverted_index.the | 6, 67, 147, 161, 179, 186 |
| abstract_inverted_index.With | 155 |
| abstract_inverted_index.best | 180 |
| abstract_inverted_index.deep | 126, 188 |
| abstract_inverted_index.from | 101, 160 |
| abstract_inverted_index.have | 24 |
| abstract_inverted_index.logs | 1, 23 |
| abstract_inverted_index.only | 156 |
| abstract_inverted_index.poor | 110 |
| abstract_inverted_index.rely | 80 |
| abstract_inverted_index.role | 27 |
| abstract_inverted_index.show | 139 |
| abstract_inverted_index.such | 104 |
| abstract_inverted_index.that | 12, 140 |
| abstract_inverted_index.this | 184 |
| abstract_inverted_index.(PII) | 57 |
| abstract_inverted_index.99.5% | 168 |
| abstract_inverted_index.SDLog | 141, 164 |
| abstract_inverted_index.about | 17 |
| abstract_inverted_index.first | 187 |
| abstract_inverted_index.logs. | 73, 136 |
| abstract_inverted_index.regex | 143, 149 |
| abstract_inverted_index.rules | 89 |
| abstract_inverted_index.these | 97, 120 |
| abstract_inverted_index.which | 85 |
| abstract_inverted_index.98.4%. | 177 |
| abstract_inverted_index.SDLog, | 124 |
| abstract_inverted_index.across | 112 |
| abstract_inverted_index.during | 5 |
| abstract_inverted_index.events | 18 |
| abstract_inverted_index.manual | 107 |
| abstract_inverted_index.remain | 38 |
| abstract_inverted_index.risks, | 65 |
| abstract_inverted_index.suffer | 100 |
| abstract_inverted_index.system | 11 |
| abstract_inverted_index.target | 162 |
| abstract_inverted_index.tasks, | 33 |
| abstract_inverted_index.advance | 41 |
| abstract_inverted_index.crucial | 14 |
| abstract_inverted_index.diverse | 113 |
| abstract_inverted_index.efforts | 108 |
| abstract_inverted_index.formats | 115 |
| abstract_inverted_index.involve | 86 |
| abstract_inverted_index.methods | 193 |
| abstract_inverted_index.privacy | 62 |
| abstract_inverted_index.provide | 13 |
| abstract_inverted_index.regular | 82 |
| abstract_inverted_index.replace | 93 |
| abstract_inverted_index.results | 138 |
| abstract_inverted_index.samples | 159 |
| abstract_inverted_index.serious | 61 |
| abstract_inverted_index.sharing | 70 |
| abstract_inverted_index.Although | 21 |
| abstract_inverted_index.F1-score | 175 |
| abstract_inverted_index.However, | 96 |
| abstract_inverted_index.Software | 0 |
| abstract_inverted_index.achieves | 173 |
| abstract_inverted_index.analysis | 44 |
| abstract_inverted_index.crafting | 88 |
| abstract_inverted_index.critical | 26 |
| abstract_inverted_index.dataset, | 163 |
| abstract_inverted_index.datasets | 37 |
| abstract_inverted_index.designed | 129 |
| abstract_inverted_index.identify | 91, 131, 167 |
| abstract_inverted_index.learning | 189 |
| abstract_inverted_index.limited, | 39 |
| abstract_inverted_index.manually | 87 |
| abstract_inverted_index.messages | 3 |
| abstract_inverted_index.mitigate | 119 |
| abstract_inverted_index.patterns | 150 |
| abstract_inverted_index.presence | 49 |
| abstract_inverted_index.publicly | 34 |
| abstract_inverted_index.recorded | 4 |
| abstract_inverted_index.research | 45 |
| abstract_inverted_index.run-time | 15 |
| abstract_inverted_index.software | 10, 22, 29, 135, 195 |
| abstract_inverted_index.correctly | 166 |
| abstract_inverted_index.datasets. | 117 |
| abstract_inverted_index.execution | 7 |
| abstract_inverted_index.extensive | 106 |
| abstract_inverted_index.framework | 128 |
| abstract_inverted_index.hindering | 40 |
| abstract_inverted_index.introduce | 123 |
| abstract_inverted_index.operation | 32 |
| abstract_inverted_index.overcomes | 142 |
| abstract_inverted_index.patterns, | 84 |
| abstract_inverted_index.practice, | 75 |
| abstract_inverted_index.primarily | 79 |
| abstract_inverted_index.sensitive | 51, 94, 132, 153, 170 |
| abstract_inverted_index.Personally | 54 |
| abstract_inverted_index.accessible | 35 |
| abstract_inverted_index.approaches | 99 |
| abstract_inverted_index.attributes | 171 |
| abstract_inverted_index.expression | 83 |
| abstract_inverted_index.introduces | 60 |
| abstract_inverted_index.knowledge, | 183 |
| abstract_inverted_index.practices. | 47 |
| abstract_inverted_index.publishing | 68 |
| abstract_inverted_index.real-world | 72 |
| abstract_inverted_index.techniques | 78 |
| abstract_inverted_index.Information | 56 |
| abstract_inverted_index.activities. | 20 |
| abstract_inverted_index.alternative | 190 |
| abstract_inverted_index.fine-tuning | 158 |
| abstract_inverted_index.identifying | 152 |
| abstract_inverted_index.information | 16, 133 |
| abstract_inverted_index.limitations | 144 |
| abstract_inverted_index.maintenance | 30 |
| abstract_inverted_index.outperforms | 146 |
| abstract_inverted_index.regex-based | 98, 192 |
| abstract_inverted_index.significant | 102 |
| abstract_inverted_index.Identifiable | 55 |
| abstract_inverted_index.discouraging | 66 |
| abstract_inverted_index.information, | 52 |
| abstract_inverted_index.information. | 95, 154 |
| abstract_inverted_index.limitations, | 103, 121 |
| abstract_inverted_index.particularly | 53 |
| abstract_inverted_index.anonymization | 77 |
| abstract_inverted_index.anonymization. | 197 |
| abstract_inverted_index.learning-based | 127 |
| abstract_inverted_index.best-performing | 148 |
| abstract_inverted_index.generalizability | 111 |
| abstract_inverted_index.re-identification | 64 |
| abstract_inverted_index.quasi-identifiers, | 59 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |