Nikolay Arefyev
YOU?
Author Swipe
View article: Can Large Language Models Compete with Specialized Models in Lexical Semantic Change Detection?
Can Large Language Models Compete with Specialized Models in Lexical Semantic Change Detection? Open
In this paper, we present a comprehensive comparison between specialized Lexical Semantic Change Detection (LSCD) models and Large Language Models (LLMs) for the LSCD task. In addition to comparing models, we also investigate the role of a…
View article: An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT) Open
Training state-of-the-art large language models requires vast amounts of clean and diverse textual data. However, building suitable multilingual datasets remains a challenge. In this work, we present HPLT v2, a collection of high-quality m…
View article: An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT) Open
View article: Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection Open
There has been extensive work on human word sense annotation, i.e., manually labeling word uses in natural texts according to their senses. Such labels were primarily created for the tasks of Word Sense Disambiguation (WSD) and Word Sense …
View article: Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change Modeling
Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change Modeling Open
This paper describes our solution of the first subtask from the AXOLOTL-24 shared task on Semantic Change Modeling. The goal of this subtask is to distribute a given set of usages of a polysemous word from a newer time period between sense…
View article: Multilingual Substitution-based Word Sense Induction
Multilingual Substitution-based Word Sense Induction Open
Word Sense Induction (WSI) is the task of discovering senses of an ambiguous word by grouping usages of this word into clusters corresponding to these senses. Many approaches were proposed to solve WSI in English and a few other languages,…
View article: Tell Me Why: Language Models Help Explain the Rationale Behind Internet Protocol Design
Tell Me Why: Language Models Help Explain the Rationale Behind Internet Protocol Design Open
Request for Comments (RFCs) serve as guidebooks for the implementation of Internet protocols or network mechanisms. They reveal how these protocols and mechanisms work, but the underlying reasons for their operation are not always availabl…
View article: The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks
The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks Open
Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) labels are derived for pairs of usages. Then, these…
View article: Enriching Word Usage Graphs with Cluster Definitions
Enriching Word Usage Graphs with Cluster Definitions Open
We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language mode…
View article: A New Massive Multilingual Dataset for High-Performance Language Technologies
A New Massive Multilingual Dataset for High-Performance Language Technologies Open
We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Inter…
View article: Text augmentation for semantic frame induction and parsing
Text augmentation for semantic frame induction and parsing Open
View article: Kashtanka.pet labeled data
Kashtanka.pet labeled data Open
Data partition from kashtanka.pet
View article: Kashtanka.pet labeled data
Kashtanka.pet labeled data Open
Data partition from kashtanka.pet
View article: GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English -- Discover Lexical Semantic Change in Spanish
GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English -- Discover Lexical Semantic Change in Spanish Open
Precomputed vectors for the GlossReader system. LSCDiscovery Competition: https://codalab.lisn.upsaclay.fr/competitions/2243.
View article: GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English -- Discover Lexical Semantic Change in Spanish
GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English -- Discover Lexical Semantic Change in Spanish Open
Precomputed vectors for the GlossReader system. LSCDiscovery Competition: https://codalab.lisn.upsaclay.fr/competitions/2243.
View article: GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English – Discover Lexical Semantic Change in Spanish
GlossReader at LSCDiscovery: Train to Select a Proper Gloss in English – Discover Lexical Semantic Change in Spanish Open
The contextualized embeddings obtained from neural networks pre-trained as Language Models (LM) or Masked Language Models (MLM) are not well suitable for solving the Lexical Semantic Change Detection (LSCD) task because they are more sensi…
View article: DeepMistake at LSCDiscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators?
DeepMistake at LSCDiscovery: Can a Multilingual Word-in-Context Model Replace Human Annotators? Open
In this paper we describe our solution of the LSCDiscovery shared task on Lexical Semantic Change Discovery (LSCD) in Spanish. Our solution employs a Word-in-Context (WiC) model, which is trained to determine if a particular word has the s…
View article: The Document Vectors Using Cosine Similarity Revisited
The Document Vectors Using Cosine Similarity Revisited Open
The current state-of-the-art test accuracy (97.42\\%) on the IMDB movie\nreviews dataset was reported by \\citet{thongtan-phienthrakul-2019-sentiment}\nand achieved by the logistic regression classifier trained on the Document\nVectors usi…
View article: BOS at LSCDiscovery: Lexical Substitution for Interpretable Lexical Semantic Change Detection
BOS at LSCDiscovery: Lexical Substitution for Interpretable Lexical Semantic Change Detection Open
We propose a solution for the LSCDiscovery shared task on Lexical Semantic\nChange Detection in Spanish. Our approach is based on generating lexical\nsubstitutes that describe old and new senses of a given word. This approach\nachieves the…
View article: An Interpretable Approach to Lexical Semantic Change Detection with Lexical Substitution
An Interpretable Approach to Lexical Semantic Change Detection with Lexical Substitution Open
In this paper we propose a new Word Sense Induction (WSI) method and apply it to construct a solution for the RuShiftEval shared task on Lexical Semantic Change Detection (LSCD) for the Russian language.Our WSI al gorithm based on lexical …
View article: Zeroshot Crosslingual Transfer of a Gloss Language Model for Semantic Change Detection
Zeroshot Crosslingual Transfer of a Gloss Language Model for Semantic Change Detection Open
Consulting word definitions from a dictionary is a familiar way for a human to find out which senses a particular word has.We hypothesize that a system that can select a proper definition for a particular word occurrence can also naturally…
View article: DeepMistake: Which Senses are Hard to Distinguish for a WordinContext Model
DeepMistake: Which Senses are Hard to Distinguish for a WordinContext Model Open
In this paper, we describe our solution of the Lexical Semantic Change Detection (LSCD) problem.It is based on a WordinContext (WiC) model detecting whether two occurrences of a particular word carry the same meaning.We propose and compare…
View article: LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation
LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation Open
This paper presents our approaches to SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation task. The first approach attempted to reformulate the task as a question answering problem, while the second one frame…
View article: NB-MLM: Efficient Domain Adaptation of Masked Language Models for Sentiment Analysis
NB-MLM: Efficient Domain Adaptation of Masked Language Models for Sentiment Analysis Open
While Masked Language Models (MLM) are pre-trained on massive datasets, the additional training with the MLM objective on domain or task-specific data before fine-tuning for the final task is known to improve the final performance. This is…
View article: SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task
SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task Open
In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were availabl…
View article: GlossReader at SemEval-2021 Task 2: Reading Definitions Improves Contextualized Word Embeddings
GlossReader at SemEval-2021 Task 2: Reading Definitions Improves Contextualized Word Embeddings Open
Consulting a dictionary or a glossary is a familiar way for many humans to figure out what does a word in a particular context mean. We hypothesize that a system that can select a proper definition for a particular word occurrence can also…
View article: Conventional Monetary Policy Re-Estimated
Conventional Monetary Policy Re-Estimated Open
View article: LIORI at SemEval-2021 Task 8: Ask Transformer for measurements
LIORI at SemEval-2021 Task 8: Ask Transformer for measurements Open
This work describes our approach for subtasks of SemEval-2021 Task 8: MeasEval: Counts and Measurements which took the official first place in the competition. To solve all subtasks we use multi-task learning in a question-answering-like m…
View article: A Comparative Study of Lexical Substitution Approaches based on Neural Language Models
A Comparative Study of Lexical Substitution Approaches based on Neural Language Models Open
Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc. In this paper, we present a …
View article: A Comparative Study of Lexical Substitution Approaches based on Neural\n Language Models
A Comparative Study of Lexical Substitution Approaches based on Neural\n Language Models Open
Lexical substitution in context is an extremely powerful technology that can\nbe used as a backbone of various NLP applications, such as word sense\ninduction, lexical relation extraction, data augmentation, etc. In this paper,\nwe present…