Attention-based audio embeddings for query-by-example Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.7316592
An ideal audio retrieval system efficiently and robustly recognizes a short query snippet from an extensive database. However, the performance of well-known audio fingerprinting systems falls short at high signal distortion levels. This paper presents an audio retrieval system that generates noise and reverberation robust audio fingerprints using the contrastive learning framework. Using these fingerprints, the method performs a comprehensive search to identify the query audio and precisely estimate its timestamp in the reference audio. Our framework involves training a CNN to maximize the similarity between pairs of embeddings extracted from clean audio and its corresponding distorted and time-shifted version. We employ a channel-wise spectral-temporal attention mechanism to capture salient time indices and spectral bands in the CNN features. The attention mechanism enables the CNN to better discriminate the audio by giving more weight to the salient spectral-temporal patches in the signal. Experimental results indicate that our system is efficient in computation and memory usage while being more accurate, particularly at higher distortion levels, than competing state-of-the-art systems and scalable to a larger database.
Related Topics
- Type
- paratext
- Language
- en
- Landing Page
- https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4K
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4308860392
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4308860392Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.5281/zenodo.7316592Digital Object Identifier
- Title
-
Attention-based audio embeddings for query-by-exampleWork title
- Type
-
paratextOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-12-04Full publication date if available
- Authors
-
Anup K. Singh, Kris Demuynck, Vipul AroraList of authors in order
- Landing page
-
https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4KPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4KDirect OA link when available
- Concepts
-
Computer science, Query optimization, Information retrievalTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4308860392 |
|---|---|
| doi | https://doi.org/10.5281/zenodo.7316592 |
| ids.doi | https://doi.org/10.5281/zenodo.7316592 |
| ids.openalex | https://openalex.org/W4308860392 |
| fwci | 0.19495729 |
| type | paratext |
| title | Attention-based audio embeddings for query-by-example |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11309 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Music and Audio Processing |
| topics[1].id | https://openalex.org/T11349 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9894000291824341 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Music Technology and Sound Studies |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9642000198364258 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8326883316040039 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C157692150 |
| concepts[1].level | 2 |
| concepts[1].score | 0.4789610207080841 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2919848 |
| concepts[1].display_name | Query optimization |
| concepts[2].id | https://openalex.org/C23123220 |
| concepts[2].level | 1 |
| concepts[2].score | 0.30363643169403076 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[2].display_name | Information retrieval |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8326883316040039 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/query-optimization |
| keywords[1].score | 0.4789610207080841 |
| keywords[1].display_name | Query optimization |
| keywords[2].id | https://openalex.org/keywords/information-retrieval |
| keywords[2].score | 0.30363643169403076 |
| keywords[2].display_name | Information retrieval |
| language | en |
| locations[0].id | pmh:oai:archive.ugent.be:01GRZHDNX1F3WM193KDVHKYB4K |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400477 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Ghent University Academic Bibliography (Ghent University) |
| locations[0].source.host_organization | https://openalex.org/I32597200 |
| locations[0].source.host_organization_name | Ghent University |
| locations[0].source.host_organization_lineage | https://openalex.org/I32597200 |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | conference |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022) |
| locations[0].landing_page_url | https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4K |
| locations[1].id | doi:10.5281/zenodo.7316592 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400562 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| locations[1].source.host_organization | https://openalex.org/I67311998 |
| locations[1].source.host_organization_name | European Organization for Nuclear Research |
| locations[1].source.host_organization_lineage | https://openalex.org/I67311998 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.5281/zenodo.7316592 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5101680403 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-3653-1618 |
| authorships[0].author.display_name | Anup K. Singh |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Anup Singh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5046536366 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8525-7160 |
| authorships[1].author.display_name | Kris Demuynck |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kris Demuynck |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5011121139 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-1207-1258 |
| authorships[2].author.display_name | Vipul Arora |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Vipul Arora |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4K |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Attention-based audio embeddings for query-by-example |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11309 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Music and Audio Processing |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2350741829, https://openalex.org/W2530322880, https://openalex.org/W1596801655, https://openalex.org/W2359140296 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:archive.ugent.be:01GRZHDNX1F3WM193KDVHKYB4K |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400477 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Ghent University Academic Bibliography (Ghent University) |
| best_oa_location.source.host_organization | https://openalex.org/I32597200 |
| best_oa_location.source.host_organization_name | Ghent University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I32597200 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | conference |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022) |
| best_oa_location.landing_page_url | https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4K |
| primary_location.id | pmh:oai:archive.ugent.be:01GRZHDNX1F3WM193KDVHKYB4K |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400477 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Ghent University Academic Bibliography (Ghent University) |
| primary_location.source.host_organization | https://openalex.org/I32597200 |
| primary_location.source.host_organization_name | Ghent University |
| primary_location.source.host_organization_lineage | https://openalex.org/I32597200 |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | conference |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022) |
| primary_location.landing_page_url | https://biblio.ugent.be/publication/01GRZHDNX1F3WM193KDVHKYB4K |
| publication_date | 2022-12-04 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 9, 58, 79, 102, 171 |
| abstract_inverted_index.An | 0 |
| abstract_inverted_index.We | 100 |
| abstract_inverted_index.an | 14, 35 |
| abstract_inverted_index.at | 27, 160 |
| abstract_inverted_index.by | 130 |
| abstract_inverted_index.in | 71, 115, 139, 150 |
| abstract_inverted_index.is | 148 |
| abstract_inverted_index.of | 20, 87 |
| abstract_inverted_index.to | 61, 81, 107, 125, 134, 170 |
| abstract_inverted_index.CNN | 80, 117, 124 |
| abstract_inverted_index.Our | 75 |
| abstract_inverted_index.The | 119 |
| abstract_inverted_index.and | 6, 42, 66, 93, 97, 112, 152, 168 |
| abstract_inverted_index.its | 69, 94 |
| abstract_inverted_index.our | 146 |
| abstract_inverted_index.the | 18, 48, 55, 63, 72, 83, 116, 123, 128, 135, 140 |
| abstract_inverted_index.This | 32 |
| abstract_inverted_index.from | 13, 90 |
| abstract_inverted_index.high | 28 |
| abstract_inverted_index.more | 132, 157 |
| abstract_inverted_index.than | 164 |
| abstract_inverted_index.that | 39, 145 |
| abstract_inverted_index.time | 110 |
| abstract_inverted_index.Using | 52 |
| abstract_inverted_index.audio | 2, 22, 36, 45, 65, 92, 129 |
| abstract_inverted_index.bands | 114 |
| abstract_inverted_index.being | 156 |
| abstract_inverted_index.clean | 91 |
| abstract_inverted_index.falls | 25 |
| abstract_inverted_index.ideal | 1 |
| abstract_inverted_index.noise | 41 |
| abstract_inverted_index.pairs | 86 |
| abstract_inverted_index.paper | 33 |
| abstract_inverted_index.query | 11, 64 |
| abstract_inverted_index.short | 10, 26 |
| abstract_inverted_index.these | 53 |
| abstract_inverted_index.usage | 154 |
| abstract_inverted_index.using | 47 |
| abstract_inverted_index.while | 155 |
| abstract_inverted_index.audio. | 74 |
| abstract_inverted_index.better | 126 |
| abstract_inverted_index.employ | 101 |
| abstract_inverted_index.giving | 131 |
| abstract_inverted_index.higher | 161 |
| abstract_inverted_index.larger | 172 |
| abstract_inverted_index.memory | 153 |
| abstract_inverted_index.method | 56 |
| abstract_inverted_index.robust | 44 |
| abstract_inverted_index.search | 60 |
| abstract_inverted_index.signal | 29 |
| abstract_inverted_index.system | 4, 38, 147 |
| abstract_inverted_index.weight | 133 |
| abstract_inverted_index.between | 85 |
| abstract_inverted_index.capture | 108 |
| abstract_inverted_index.enables | 122 |
| abstract_inverted_index.indices | 111 |
| abstract_inverted_index.levels, | 163 |
| abstract_inverted_index.levels. | 31 |
| abstract_inverted_index.patches | 138 |
| abstract_inverted_index.results | 143 |
| abstract_inverted_index.salient | 109, 136 |
| abstract_inverted_index.signal. | 141 |
| abstract_inverted_index.snippet | 12 |
| abstract_inverted_index.systems | 24, 167 |
| abstract_inverted_index.However, | 17 |
| abstract_inverted_index.estimate | 68 |
| abstract_inverted_index.identify | 62 |
| abstract_inverted_index.indicate | 144 |
| abstract_inverted_index.involves | 77 |
| abstract_inverted_index.learning | 50 |
| abstract_inverted_index.maximize | 82 |
| abstract_inverted_index.performs | 57 |
| abstract_inverted_index.presents | 34 |
| abstract_inverted_index.robustly | 7 |
| abstract_inverted_index.scalable | 169 |
| abstract_inverted_index.spectral | 113 |
| abstract_inverted_index.training | 78 |
| abstract_inverted_index.version. | 99 |
| abstract_inverted_index.accurate, | 158 |
| abstract_inverted_index.attention | 105, 120 |
| abstract_inverted_index.competing | 165 |
| abstract_inverted_index.database. | 16, 173 |
| abstract_inverted_index.distorted | 96 |
| abstract_inverted_index.efficient | 149 |
| abstract_inverted_index.extensive | 15 |
| abstract_inverted_index.extracted | 89 |
| abstract_inverted_index.features. | 118 |
| abstract_inverted_index.framework | 76 |
| abstract_inverted_index.generates | 40 |
| abstract_inverted_index.mechanism | 106, 121 |
| abstract_inverted_index.precisely | 67 |
| abstract_inverted_index.reference | 73 |
| abstract_inverted_index.retrieval | 3, 37 |
| abstract_inverted_index.timestamp | 70 |
| abstract_inverted_index.distortion | 30, 162 |
| abstract_inverted_index.embeddings | 88 |
| abstract_inverted_index.framework. | 51 |
| abstract_inverted_index.recognizes | 8 |
| abstract_inverted_index.similarity | 84 |
| abstract_inverted_index.well-known | 21 |
| abstract_inverted_index.computation | 151 |
| abstract_inverted_index.contrastive | 49 |
| abstract_inverted_index.efficiently | 5 |
| abstract_inverted_index.performance | 19 |
| abstract_inverted_index.Experimental | 142 |
| abstract_inverted_index.channel-wise | 103 |
| abstract_inverted_index.discriminate | 127 |
| abstract_inverted_index.fingerprints | 46 |
| abstract_inverted_index.particularly | 159 |
| abstract_inverted_index.time-shifted | 98 |
| abstract_inverted_index.comprehensive | 59 |
| abstract_inverted_index.corresponding | 95 |
| abstract_inverted_index.fingerprints, | 54 |
| abstract_inverted_index.reverberation | 43 |
| abstract_inverted_index.fingerprinting | 23 |
| abstract_inverted_index.state-of-the-art | 166 |
| abstract_inverted_index.spectral-temporal | 104, 137 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 90 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.44225374 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |