Enriching Word Usage Graphs with Cluster Definitions Article Swipe
Mariіa Fedorova
,
Andrey Kutuzov
,
Nikolay Arefyev
,
Dominik Schlechtweg
·
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.18024
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.18024
We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.18024
- https://arxiv.org/pdf/2403.18024
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393299685
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393299685Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.18024Digital Object Identifier
- Title
-
Enriching Word Usage Graphs with Cluster DefinitionsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-26Full publication date if available
- Authors
-
Mariіa Fedorova, Andrey Kutuzov, Nikolay Arefyev, Dominik SchlechtwegList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.18024Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.18024Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.18024Direct OA link when available
- Concepts
-
Word (group theory), Computer science, Cluster (spacecraft), Natural language processing, Linguistics, Programming language, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393299685 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.18024 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.18024 |
| ids.openalex | https://openalex.org/W4393299685 |
| fwci | |
| type | preprint |
| title | Enriching Word Usage Graphs with Cluster Definitions |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9987999796867371 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.984499990940094 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T12031 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.97079998254776 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech and dialogue systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C90805587 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6109753847122192 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q10944557 |
| concepts[0].display_name | Word (group theory) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6036943197250366 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C164866538 |
| concepts[2].level | 2 |
| concepts[2].score | 0.48880836367607117 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q367351 |
| concepts[2].display_name | Cluster (spacecraft) |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4170558452606201 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C41895202 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3518626391887665 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[4].display_name | Linguistics |
| concepts[5].id | https://openalex.org/C199360897 |
| concepts[5].level | 1 |
| concepts[5].score | 0.19742074608802795 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[5].display_name | Programming language |
| concepts[6].id | https://openalex.org/C138885662 |
| concepts[6].level | 0 |
| concepts[6].score | 0.06045454740524292 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[6].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/word |
| keywords[0].score | 0.6109753847122192 |
| keywords[0].display_name | Word (group theory) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6036943197250366 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/cluster |
| keywords[2].score | 0.48880836367607117 |
| keywords[2].display_name | Cluster (spacecraft) |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.4170558452606201 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/linguistics |
| keywords[4].score | 0.3518626391887665 |
| keywords[4].display_name | Linguistics |
| keywords[5].id | https://openalex.org/keywords/programming-language |
| keywords[5].score | 0.19742074608802795 |
| keywords[5].display_name | Programming language |
| keywords[6].id | https://openalex.org/keywords/philosophy |
| keywords[6].score | 0.06045454740524292 |
| keywords[6].display_name | Philosophy |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.18024 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.18024 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.18024 |
| locations[1].id | doi:10.48550/arxiv.2403.18024 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.18024 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5037966941 |
| authorships[0].author.orcid | https://orcid.org/0009-0002-8207-5593 |
| authorships[0].author.display_name | Mariіa Fedorova |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Fedorova, Mariia |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5071409817 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2540-5912 |
| authorships[1].author.display_name | Andrey Kutuzov |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kutuzov, Andrey |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5047490968 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Nikolay Arefyev |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Arefyev, Nikolay |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5013366042 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0685-2576 |
| authorships[3].author.display_name | Dominik Schlechtweg |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Schlechtweg, Dominik |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.18024 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Enriching Word Usage Graphs with Cluster Definitions |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9987999796867371 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829, https://openalex.org/W2296205523 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.18024 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.18024 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.18024 |
| primary_location.id | pmh:oai:arXiv.org:2403.18024 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.18024 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.18024 |
| publication_date | 2024-03-26 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2 |
| abstract_inverted_index.At | 61 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.as | 22 |
| abstract_inverted_index.be | 83 |
| abstract_inverted_index.by | 30, 57 |
| abstract_inverted_index.in | 48 |
| abstract_inverted_index.is | 67 |
| abstract_inverted_index.of | 4 |
| abstract_inverted_index.on | 88 |
| abstract_inverted_index.to | 69, 73, 75, 89 |
| abstract_inverted_index.The | 35, 78 |
| abstract_inverted_index.and | 71 |
| abstract_inverted_index.are | 16, 26 |
| abstract_inverted_index.can | 82 |
| abstract_inverted_index.for | 13, 86 |
| abstract_inverted_index.has | 39 |
| abstract_inverted_index.new | 76 |
| abstract_inverted_index.the | 10, 45, 52, 62, 65 |
| abstract_inverted_index.two | 58 |
| abstract_inverted_index.use | 70 |
| abstract_inverted_index.They | 25 |
| abstract_inverted_index.WUGs | 12, 49 |
| abstract_inverted_index.easy | 72 |
| abstract_inverted_index.from | 28, 55 |
| abstract_inverted_index.same | 63 |
| abstract_inverted_index.than | 51 |
| abstract_inverted_index.that | 41 |
| abstract_inverted_index.with | 18 |
| abstract_inverted_index.word | 5 |
| abstract_inverted_index.human | 37 |
| abstract_inverted_index.match | 44 |
| abstract_inverted_index.sense | 23 |
| abstract_inverted_index.shown | 40 |
| abstract_inverted_index.these | 42 |
| abstract_inverted_index.time, | 64 |
| abstract_inverted_index.usage | 6 |
| abstract_inverted_index.where | 9 |
| abstract_inverted_index.better | 50 |
| abstract_inverted_index.change | 92 |
| abstract_inverted_index.chosen | 54 |
| abstract_inverted_index.extend | 74 |
| abstract_inverted_index.graphs | 7 |
| abstract_inverted_index.labels | 20 |
| abstract_inverted_index.method | 66 |
| abstract_inverted_index.moving | 87 |
| abstract_inverted_index.(WUGs), | 8 |
| abstract_inverted_index.WordNet | 56 |
| abstract_inverted_index.cluster | 19 |
| abstract_inverted_index.dataset | 3 |
| abstract_inverted_index.helpful | 85 |
| abstract_inverted_index.models. | 34 |
| abstract_inverted_index.present | 1 |
| abstract_inverted_index.scratch | 29 |
| abstract_inverted_index.baseline | 59 |
| abstract_inverted_index.clusters | 47 |
| abstract_inverted_index.datasets | 81 |
| abstract_inverted_index.enriched | 17, 80 |
| abstract_inverted_index.existing | 11, 46 |
| abstract_inverted_index.language | 33 |
| abstract_inverted_index.multiple | 14 |
| abstract_inverted_index.semantic | 91 |
| abstract_inverted_index.systems. | 60 |
| abstract_inverted_index.conducted | 36 |
| abstract_inverted_index.extremely | 84 |
| abstract_inverted_index.generated | 27 |
| abstract_inverted_index.languages | 15 |
| abstract_inverted_index.modeling. | 93 |
| abstract_inverted_index.resulting | 79 |
| abstract_inverted_index.evaluation | 38 |
| abstract_inverted_index.fine-tuned | 31 |
| abstract_inverted_index.languages. | 77 |
| abstract_inverted_index.definitions | 43, 53 |
| abstract_inverted_index.explainable | 90 |
| abstract_inverted_index.functioning | 21 |
| abstract_inverted_index.definitions. | 24 |
| abstract_inverted_index.encoder-decoder | 32 |
| abstract_inverted_index.straightforward | 68 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |