This is a repo for training static word embeddings by combing BERT.
See our journal paper for detail: Improving Skip-Gram Embeddings Using BERT (TASLP 2021)
1.Preparing your corpus (wiki) as 'data.txt_plain', one sentence per line.
anarchism anarchism is a political philosophy that advocates self-governed societies based on voluntary institutions .
these are often described as stateless societies , although several authors have defined them more specifically as institutions based on non-hierarchical or free associations .
...
2.Training your word embeddings:
python word2vec.py
You can get the final 300-dim word embeddings through links below (Baidu or Google Storage):
https://pan.baidu.com/s/11hV_SFO36XabzFf2SE4GLw (code: vham)
https://drive.google.com/file/d/1WIfJ7XgbPoRHBDfYdx-BhzhCPaxmNRxJ/view?usp=sharing