Fast Extraction of Word Embedding from Q-contexts Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.1145/3459637.3482343
The notion of word embedding plays a fundamental role in natural language\nprocessing (NLP). However, pre-training word embedding for very large-scale\nvocabulary is computationally challenging for most existing methods. In this\nwork, we show that with merely a small fraction of contexts (Q-contexts)which\nare typical in the whole corpus (and their mutual information with words), one\ncan construct high-quality word embedding with negligible errors. Mutual\ninformation between contexts and words can be encoded canonically as a sampling\nstate, thus, Q-contexts can be fast constructed. Furthermore, we present an\nefficient and effective WEQ method, which is capable of extracting word\nembedding directly from these typical contexts. In practical scenarios, our\nalgorithm runs 11$\\sim$13 times faster than well-established methods. By\ncomparing with well-known methods such as matrix factorization, word2vec,\nGloVeand fasttext, we demonstrate that our method achieves comparable\nperformance on a variety of downstream NLP tasks, and in the meanwhile\nmaintains run-time and resource advantages over all these baselines.\n
Related Topics To Compare & Contrast
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1145/3459637.3482343
- OA Status
- green
- Cited By
- 2
- References
- 47
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3199747066