kōan: A Corrected CBOW Implementation Article Swipe
Related Concepts
Word2vec
Computer science
Word (group theory)
Implementation
Artificial intelligence
Bag-of-words model
Information retrieval
Natural language processing
Programming language
Mathematics
Embedding
Geometry
It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, kōan, at this https URL.
Related Topics
Finding more related topics…