Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.07016
Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real production sites due to the generalization problem, which is primarily caused by the difficulty of data collection and the complexity of environmental factors. This paper introduces a robust ASD model that leverages audio pre-trained models. Specifically, we fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy. Additionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA) tuning instead of full fine-tuning to address the problem of limited data for fine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models, including top-tier traditional convolutional networks and speech pre-trained models, which demonstrates the effectiveness of audio pre-trained models with LoRA tuning. Ablation studies are also conducted to showcase the efficacy of the proposed scheme.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.07016
- https://arxiv.org/pdf/2409.07016
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403621304
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403621304Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.07016Digital Object Identifier
- Title
-
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-11Full publication date if available
- Authors
-
Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.07016Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.07016Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.07016Direct OA link when available
- Concepts
-
Sound (geography), Rank (graph theory), Adaptation (eye), Computer science, Speech recognition, Acoustics, Mathematics, Psychology, Physics, Combinatorics, NeuroscienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403621304 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.07016 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.07016 |
| ids.openalex | https://openalex.org/W4403621304 |
| fwci | 0.0 |
| type | preprint |
| title | Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11309 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9976000189781189 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Music and Audio Processing |
| topics[1].id | https://openalex.org/T10860 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9937999844551086 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Speech and Audio Processing |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9745000004768372 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C203718221 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7470926642417908 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q491713 |
| concepts[0].display_name | Sound (geography) |
| concepts[1].id | https://openalex.org/C164226766 |
| concepts[1].level | 2 |
| concepts[1].score | 0.66025710105896 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7293202 |
| concepts[1].display_name | Rank (graph theory) |
| concepts[2].id | https://openalex.org/C139807058 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6176553964614868 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q352374 |
| concepts[2].display_name | Adaptation (eye) |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5262802243232727 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C28490314 |
| concepts[4].level | 1 |
| concepts[4].score | 0.4738771617412567 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[4].display_name | Speech recognition |
| concepts[5].id | https://openalex.org/C24890656 |
| concepts[5].level | 1 |
| concepts[5].score | 0.35492444038391113 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q82811 |
| concepts[5].display_name | Acoustics |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.17446181178092957 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C15744967 |
| concepts[7].level | 0 |
| concepts[7].score | 0.1354309618473053 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[7].display_name | Psychology |
| concepts[8].id | https://openalex.org/C121332964 |
| concepts[8].level | 0 |
| concepts[8].score | 0.1264389455318451 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[8].display_name | Physics |
| concepts[9].id | https://openalex.org/C114614502 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[9].display_name | Combinatorics |
| concepts[10].id | https://openalex.org/C169760540 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q207011 |
| concepts[10].display_name | Neuroscience |
| keywords[0].id | https://openalex.org/keywords/sound |
| keywords[0].score | 0.7470926642417908 |
| keywords[0].display_name | Sound (geography) |
| keywords[1].id | https://openalex.org/keywords/rank |
| keywords[1].score | 0.66025710105896 |
| keywords[1].display_name | Rank (graph theory) |
| keywords[2].id | https://openalex.org/keywords/adaptation |
| keywords[2].score | 0.6176553964614868 |
| keywords[2].display_name | Adaptation (eye) |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5262802243232727 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/speech-recognition |
| keywords[4].score | 0.4738771617412567 |
| keywords[4].display_name | Speech recognition |
| keywords[5].id | https://openalex.org/keywords/acoustics |
| keywords[5].score | 0.35492444038391113 |
| keywords[5].display_name | Acoustics |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.17446181178092957 |
| keywords[6].display_name | Mathematics |
| keywords[7].id | https://openalex.org/keywords/psychology |
| keywords[7].score | 0.1354309618473053 |
| keywords[7].display_name | Psychology |
| keywords[8].id | https://openalex.org/keywords/physics |
| keywords[8].score | 0.1264389455318451 |
| keywords[8].display_name | Physics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.07016 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.07016 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.07016 |
| locations[1].id | doi:10.48550/arxiv.2409.07016 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.07016 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5062424202 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-9898-5543 |
| authorships[0].author.display_name | Xinhu Zheng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zheng, Xinhu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5032861798 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Anbai Jiang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jiang, Anbai |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100690517 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6319-6755 |
| authorships[2].author.display_name | Bing Han |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Han, Bing |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100341993 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0314-3790 |
| authorships[3].author.display_name | Yanmin Qian |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Qian, Yanmin |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5079233004 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-0658-6079 |
| authorships[4].author.display_name | Pingyi Fan |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Fan, Pingyi |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100409739 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-9845-2720 |
| authorships[5].author.display_name | Jia Liu |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Liu, Jia |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100692904 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-3841-1959 |
| authorships[6].author.display_name | Wei-Qiang Zhang |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Zhang, Wei-Qiang |
| authorships[6].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.07016 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11309 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9976000189781189 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Music and Audio Processing |
| related_works | https://openalex.org/W2909726438, https://openalex.org/W2067046791, https://openalex.org/W2997567050, https://openalex.org/W2909888262, https://openalex.org/W2025747832, https://openalex.org/W3020957235, https://openalex.org/W2056769785, https://openalex.org/W1483272040, https://openalex.org/W4283377908, https://openalex.org/W1533421371 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.07016 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.07016 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.07016 |
| primary_location.id | pmh:oai:arXiv.org:2409.07016 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.07016 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.07016 |
| publication_date | 2024-09-11 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.2 | 114 |
| abstract_inverted_index.a | 59, 80, 117, 127 |
| abstract_inverted_index.as | 79 |
| abstract_inverted_index.be | 28 |
| abstract_inverted_index.by | 44 |
| abstract_inverted_index.in | 17, 31 |
| abstract_inverted_index.is | 41 |
| abstract_inverted_index.of | 11, 47, 53, 89, 96, 103, 120, 130, 151, 167 |
| abstract_inverted_index.on | 110, 122 |
| abstract_inverted_index.to | 36, 99, 163 |
| abstract_inverted_index.we | 69, 85 |
| abstract_inverted_index.ASD | 24, 61 |
| abstract_inverted_index.Our | 108 |
| abstract_inverted_index.and | 50, 143 |
| abstract_inverted_index.are | 160 |
| abstract_inverted_index.can | 26 |
| abstract_inverted_index.due | 35 |
| abstract_inverted_index.for | 106 |
| abstract_inverted_index.has | 4 |
| abstract_inverted_index.new | 118 |
| abstract_inverted_index.the | 9, 37, 45, 51, 87, 101, 111, 123, 149, 165, 168 |
| abstract_inverted_index.(AI) | 15 |
| abstract_inverted_index.LoRA | 156 |
| abstract_inverted_index.Task | 113 |
| abstract_inverted_index.This | 56 |
| abstract_inverted_index.also | 161 |
| abstract_inverted_index.data | 48, 81, 105 |
| abstract_inverted_index.full | 97 |
| abstract_inverted_index.real | 32 |
| abstract_inverted_index.set, | 125 |
| abstract_inverted_index.that | 63 |
| abstract_inverted_index.with | 126, 133, 155 |
| abstract_inverted_index.(ASD) | 3 |
| abstract_inverted_index.6.48% | 131 |
| abstract_inverted_index.Sound | 1 |
| abstract_inverted_index.audio | 65, 152 |
| abstract_inverted_index.data, | 76 |
| abstract_inverted_index.great | 22 |
| abstract_inverted_index.model | 62 |
| abstract_inverted_index.paper | 57 |
| abstract_inverted_index.sites | 34 |
| abstract_inverted_index.these | 71 |
| abstract_inverted_index.using | 73 |
| abstract_inverted_index.which | 40, 147 |
| abstract_inverted_index.(LoRA) | 93 |
| abstract_inverted_index.(SOTA) | 136 |
| abstract_inverted_index.77.75% | 121 |
| abstract_inverted_index.Though | 20 |
| abstract_inverted_index.caused | 43 |
| abstract_inverted_index.gained | 5 |
| abstract_inverted_index.hardly | 27 |
| abstract_inverted_index.impact | 88 |
| abstract_inverted_index.models | 72, 154 |
| abstract_inverted_index.robust | 60 |
| abstract_inverted_index.speech | 144 |
| abstract_inverted_index.tuning | 94 |
| abstract_inverted_index.SpecAug | 78 |
| abstract_inverted_index.address | 100 |
| abstract_inverted_index.dataset | 115 |
| abstract_inverted_index.instead | 95 |
| abstract_inverted_index.limited | 104 |
| abstract_inverted_index.machine | 74 |
| abstract_inverted_index.models, | 137, 146 |
| abstract_inverted_index.models. | 67 |
| abstract_inverted_index.problem | 102 |
| abstract_inverted_index.readily | 29 |
| abstract_inverted_index.scheme. | 170 |
| abstract_inverted_index.studies | 159 |
| abstract_inverted_index.systems | 25 |
| abstract_inverted_index.through | 8 |
| abstract_inverted_index.tuning. | 157 |
| abstract_inverted_index.various | 12 |
| abstract_inverted_index.Ablation | 158 |
| abstract_inverted_index.Low-Rank | 91 |
| abstract_inverted_index.compared | 132 |
| abstract_inverted_index.deployed | 30 |
| abstract_inverted_index.efficacy | 166 |
| abstract_inverted_index.factors. | 55 |
| abstract_inverted_index.interest | 7 |
| abstract_inverted_index.networks | 142 |
| abstract_inverted_index.previous | 134 |
| abstract_inverted_index.problem, | 39 |
| abstract_inverted_index.proposed | 169 |
| abstract_inverted_index.showcase | 164 |
| abstract_inverted_index.top-tier | 139 |
| abstract_inverted_index.Anomalous | 0 |
| abstract_inverted_index.DCASE2023 | 112 |
| abstract_inverted_index.Detection | 2 |
| abstract_inverted_index.benchmark | 119 |
| abstract_inverted_index.conducted | 162 |
| abstract_inverted_index.employing | 77 |
| abstract_inverted_index.establish | 116 |
| abstract_inverted_index.fine-tune | 70 |
| abstract_inverted_index.including | 138 |
| abstract_inverted_index.leverages | 64 |
| abstract_inverted_index.operation | 75 |
| abstract_inverted_index.primarily | 42 |
| abstract_inverted_index.settings. | 19 |
| abstract_inverted_index.strategy. | 83 |
| abstract_inverted_index.utilizing | 90 |
| abstract_inverted_index.Adaptation | 92 |
| abstract_inverted_index.Artificial | 13 |
| abstract_inverted_index.collection | 49 |
| abstract_inverted_index.complexity | 52 |
| abstract_inverted_index.difficulty | 46 |
| abstract_inverted_index.evaluation | 124 |
| abstract_inverted_index.industrial | 18 |
| abstract_inverted_index.introduces | 58 |
| abstract_inverted_index.possessing | 21 |
| abstract_inverted_index.potential, | 23 |
| abstract_inverted_index.production | 33 |
| abstract_inverted_index.application | 10 |
| abstract_inverted_index.experiments | 109 |
| abstract_inverted_index.fine-tuning | 98 |
| abstract_inverted_index.improvement | 129 |
| abstract_inverted_index.investigate | 86 |
| abstract_inverted_index.pre-trained | 66, 145, 153 |
| abstract_inverted_index.significant | 6, 128 |
| abstract_inverted_index.traditional | 140 |
| abstract_inverted_index.Intelligence | 14 |
| abstract_inverted_index.augmentation | 82 |
| abstract_inverted_index.demonstrates | 148 |
| abstract_inverted_index.fine-tuning. | 107 |
| abstract_inverted_index.technologies | 16 |
| abstract_inverted_index.Additionally, | 84 |
| abstract_inverted_index.Specifically, | 68 |
| abstract_inverted_index.convolutional | 141 |
| abstract_inverted_index.effectiveness | 150 |
| abstract_inverted_index.environmental | 54 |
| abstract_inverted_index.generalization | 38 |
| abstract_inverted_index.state-of-the-art | 135 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile.value | 0.24941047 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |