This repository contains the datasets we used in the following paper:
@inproceedings{yaghoobzadeh2019probing,
title = {Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings},
author = {Yadollah Yaghoobzadeh, Katharina Kann, Eneko Agirre, Timothy J. Hazen and Hinrich Sch{\"{u}}tze},
booktitle = {Proceedings of ACL},
url = {https://www.aclweb.org/anthology/P19-1574}
}
(For only the (word, semantic-classes) datasets, please visit this directory: dataset.)
All WIKI-PSE files, including the corpus and trained embeddings can be dowloaded from this link:
After unzipping the data.zip, in the root, "corpus.txt" is the corpus we used to train word embeddings (e,g, "@apple@").
In "word_sclass" subdirectory, there is another "corpus.txt". This one is used to train (word, semantic-class) embeddings (e.g., "@apple@-organization").