Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2408.03706
A common approach for sequence tagging tasks based on contextual word representations is to train a machine learning classifier directly on these embedding vectors. This approach has two shortcomings. First, such methods consider single input sequences in isolation and are unable to put an individual embedding vector in relation to vectors outside the current local context of use. Second, the high performance of these models relies on fine-tuning the embedding model in conjunction with the classifier, which may not always be feasible due to the size or inaccessibility of the underlying feature-generation model. It is thus desirable, given a collection of embedding vectors of a corpus, i.e., a datastore, to find features of each vector that describe its relation to other, similar vectors in the datastore. With this in mind, we introduce complexity measures of the local topology of the latent space of a contextual language model with respect to a given datastore. The effectiveness of our features is demonstrated through their application to dialogue term extraction. Our work continues a line of research that explores the manifold hypothesis for word embeddings, demonstrating that local structure in the space carved out by word embeddings can be exploited to infer semantic properties.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2408.03706
- https://arxiv.org/pdf/2408.03706
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403662392
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403662392Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2408.03706Digital Object Identifier
- Title
-
Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term ExtractionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-08-07Full publication date if available
- Authors
-
Benjamin Ruppik, Michael Heck, Carel van Niekerk, Renato Vukovic, Hsien-chin Lin, Shutong Feng, Marcus Zibrowius, Milica GašićList of authors in order
- Landing page
-
https://arxiv.org/abs/2408.03706Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2408.03706Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2408.03706Direct OA link when available
- Concepts
-
Term (time), Topology (electrical circuits), Computer science, Natural language processing, Mathematics, Physics, Combinatorics, Quantum mechanicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403662392 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2408.03706 |
| ids.doi | https://doi.org/10.48550/arxiv.2408.03706 |
| ids.openalex | https://openalex.org/W4403662392 |
| fwci | |
| type | preprint |
| title | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12031 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9685999751091003 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech and dialogue systems |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9663000106811523 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9567999839782715 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C61797465 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7623804807662964 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1188986 |
| concepts[0].display_name | Term (time) |
| concepts[1].id | https://openalex.org/C184720557 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5937932729721069 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7825049 |
| concepts[1].display_name | Topology (electrical circuits) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.48886555433273315 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3524629473686218 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C33923547 |
| concepts[4].level | 0 |
| concepts[4].score | 0.2301563024520874 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[4].display_name | Mathematics |
| concepts[5].id | https://openalex.org/C121332964 |
| concepts[5].level | 0 |
| concepts[5].score | 0.07061266899108887 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[5].display_name | Physics |
| concepts[6].id | https://openalex.org/C114614502 |
| concepts[6].level | 1 |
| concepts[6].score | 0.05813667178153992 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[6].display_name | Combinatorics |
| concepts[7].id | https://openalex.org/C62520636 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[7].display_name | Quantum mechanics |
| keywords[0].id | https://openalex.org/keywords/term |
| keywords[0].score | 0.7623804807662964 |
| keywords[0].display_name | Term (time) |
| keywords[1].id | https://openalex.org/keywords/topology |
| keywords[1].score | 0.5937932729721069 |
| keywords[1].display_name | Topology (electrical circuits) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.48886555433273315 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.3524629473686218 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/mathematics |
| keywords[4].score | 0.2301563024520874 |
| keywords[4].display_name | Mathematics |
| keywords[5].id | https://openalex.org/keywords/physics |
| keywords[5].score | 0.07061266899108887 |
| keywords[5].display_name | Physics |
| keywords[6].id | https://openalex.org/keywords/combinatorics |
| keywords[6].score | 0.05813667178153992 |
| keywords[6].display_name | Combinatorics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2408.03706 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2408.03706 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2408.03706 |
| locations[1].id | doi:10.48550/arxiv.2408.03706 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2408.03706 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5065297641 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9035-9217 |
| authorships[0].author.display_name | Benjamin Ruppik |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ruppik, Benjamin Matthias |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015873217 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-9841-5025 |
| authorships[1].author.display_name | Michael Heck |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Heck, Michael |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5023425411 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4551-7447 |
| authorships[2].author.display_name | Carel van Niekerk |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | van Niekerk, Carel |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5068337980 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-6303-9402 |
| authorships[3].author.display_name | Renato Vukovic |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Vukovic, Renato |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5103130759 |
| authorships[4].author.orcid | https://orcid.org/0009-0006-0027-226X |
| authorships[4].author.display_name | Hsien-chin Lin |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Lin, Hsien-chin |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5031650617 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-1307-4223 |
| authorships[5].author.display_name | Shutong Feng |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Feng, Shutong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5067361843 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-0806-3228 |
| authorships[6].author.display_name | Marcus Zibrowius |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Zibrowius, Marcus |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5051889115 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-0318-9147 |
| authorships[7].author.display_name | Milica Gašić |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Gašić, Milica |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2408.03706 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12031 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9685999751091003 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech and dialogue systems |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2408.03706 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2408.03706 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2408.03706 |
| primary_location.id | pmh:oai:arXiv.org:2408.03706 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2408.03706 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2408.03706 |
| publication_date | 2024-08-07 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 0 |
| abstract_inverted_index.a | 15, 98, 104, 107, 143, 150, 170 |
| abstract_inverted_index.It | 93 |
| abstract_inverted_index.an | 43 |
| abstract_inverted_index.be | 80, 195 |
| abstract_inverted_index.by | 191 |
| abstract_inverted_index.in | 36, 47, 71, 123, 128, 186 |
| abstract_inverted_index.is | 12, 94, 158 |
| abstract_inverted_index.of | 56, 62, 88, 100, 103, 112, 134, 138, 142, 155, 172 |
| abstract_inverted_index.on | 8, 20, 66 |
| abstract_inverted_index.or | 86 |
| abstract_inverted_index.to | 13, 41, 49, 83, 109, 119, 149, 163, 197 |
| abstract_inverted_index.we | 130 |
| abstract_inverted_index.Our | 167 |
| abstract_inverted_index.The | 153 |
| abstract_inverted_index.and | 38 |
| abstract_inverted_index.are | 39 |
| abstract_inverted_index.can | 194 |
| abstract_inverted_index.due | 82 |
| abstract_inverted_index.for | 3, 179 |
| abstract_inverted_index.has | 26 |
| abstract_inverted_index.its | 117 |
| abstract_inverted_index.may | 77 |
| abstract_inverted_index.not | 78 |
| abstract_inverted_index.our | 156 |
| abstract_inverted_index.out | 190 |
| abstract_inverted_index.put | 42 |
| abstract_inverted_index.the | 52, 59, 68, 74, 84, 89, 124, 135, 139, 176, 187 |
| abstract_inverted_index.two | 27 |
| abstract_inverted_index.This | 24 |
| abstract_inverted_index.With | 126 |
| abstract_inverted_index.each | 113 |
| abstract_inverted_index.find | 110 |
| abstract_inverted_index.high | 60 |
| abstract_inverted_index.line | 171 |
| abstract_inverted_index.size | 85 |
| abstract_inverted_index.such | 30 |
| abstract_inverted_index.term | 165 |
| abstract_inverted_index.that | 115, 174, 183 |
| abstract_inverted_index.this | 127 |
| abstract_inverted_index.thus | 95 |
| abstract_inverted_index.use. | 57 |
| abstract_inverted_index.with | 73, 147 |
| abstract_inverted_index.word | 10, 180, 192 |
| abstract_inverted_index.work | 168 |
| abstract_inverted_index.based | 7 |
| abstract_inverted_index.given | 97, 151 |
| abstract_inverted_index.i.e., | 106 |
| abstract_inverted_index.infer | 198 |
| abstract_inverted_index.input | 34 |
| abstract_inverted_index.local | 54, 136, 184 |
| abstract_inverted_index.mind, | 129 |
| abstract_inverted_index.model | 70, 146 |
| abstract_inverted_index.space | 141, 188 |
| abstract_inverted_index.tasks | 6 |
| abstract_inverted_index.their | 161 |
| abstract_inverted_index.these | 21, 63 |
| abstract_inverted_index.train | 14 |
| abstract_inverted_index.which | 76 |
| abstract_inverted_index.First, | 29 |
| abstract_inverted_index.always | 79 |
| abstract_inverted_index.carved | 189 |
| abstract_inverted_index.common | 1 |
| abstract_inverted_index.latent | 140 |
| abstract_inverted_index.model. | 92 |
| abstract_inverted_index.models | 64 |
| abstract_inverted_index.other, | 120 |
| abstract_inverted_index.relies | 65 |
| abstract_inverted_index.single | 33 |
| abstract_inverted_index.unable | 40 |
| abstract_inverted_index.vector | 46, 114 |
| abstract_inverted_index.Second, | 58 |
| abstract_inverted_index.context | 55 |
| abstract_inverted_index.corpus, | 105 |
| abstract_inverted_index.current | 53 |
| abstract_inverted_index.machine | 16 |
| abstract_inverted_index.methods | 31 |
| abstract_inverted_index.outside | 51 |
| abstract_inverted_index.respect | 148 |
| abstract_inverted_index.similar | 121 |
| abstract_inverted_index.tagging | 5 |
| abstract_inverted_index.through | 160 |
| abstract_inverted_index.vectors | 50, 102, 122 |
| abstract_inverted_index.approach | 2, 25 |
| abstract_inverted_index.consider | 32 |
| abstract_inverted_index.describe | 116 |
| abstract_inverted_index.dialogue | 164 |
| abstract_inverted_index.directly | 19 |
| abstract_inverted_index.explores | 175 |
| abstract_inverted_index.feasible | 81 |
| abstract_inverted_index.features | 111, 157 |
| abstract_inverted_index.language | 145 |
| abstract_inverted_index.learning | 17 |
| abstract_inverted_index.manifold | 177 |
| abstract_inverted_index.measures | 133 |
| abstract_inverted_index.relation | 48, 118 |
| abstract_inverted_index.research | 173 |
| abstract_inverted_index.semantic | 199 |
| abstract_inverted_index.sequence | 4 |
| abstract_inverted_index.topology | 137 |
| abstract_inverted_index.vectors. | 23 |
| abstract_inverted_index.continues | 169 |
| abstract_inverted_index.embedding | 22, 45, 69, 101 |
| abstract_inverted_index.exploited | 196 |
| abstract_inverted_index.introduce | 131 |
| abstract_inverted_index.isolation | 37 |
| abstract_inverted_index.sequences | 35 |
| abstract_inverted_index.structure | 185 |
| abstract_inverted_index.classifier | 18 |
| abstract_inverted_index.collection | 99 |
| abstract_inverted_index.complexity | 132 |
| abstract_inverted_index.contextual | 9, 144 |
| abstract_inverted_index.datastore, | 108 |
| abstract_inverted_index.datastore. | 125, 152 |
| abstract_inverted_index.desirable, | 96 |
| abstract_inverted_index.embeddings | 193 |
| abstract_inverted_index.hypothesis | 178 |
| abstract_inverted_index.individual | 44 |
| abstract_inverted_index.underlying | 90 |
| abstract_inverted_index.application | 162 |
| abstract_inverted_index.classifier, | 75 |
| abstract_inverted_index.conjunction | 72 |
| abstract_inverted_index.embeddings, | 181 |
| abstract_inverted_index.extraction. | 166 |
| abstract_inverted_index.fine-tuning | 67 |
| abstract_inverted_index.performance | 61 |
| abstract_inverted_index.properties. | 200 |
| abstract_inverted_index.demonstrated | 159 |
| abstract_inverted_index.demonstrating | 182 |
| abstract_inverted_index.effectiveness | 154 |
| abstract_inverted_index.shortcomings. | 28 |
| abstract_inverted_index.inaccessibility | 87 |
| abstract_inverted_index.representations | 11 |
| abstract_inverted_index.feature-generation | 91 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |