WIKI-PSE

A Wikipedia-based resource for Probing Semantics in word Embeddings

This repository contains the datasets we used in the following paper:

@inproceedings{yaghoobzadeh2019probing,
  title     = {Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings},
  author    = {Yadollah Yaghoobzadeh, Katharina Kann, Eneko Agirre, Timothy J. Hazen and Hinrich Sch{\"{u}}tze},
  booktitle = {Proceedings of ACL},
  url       = {https://www.aclweb.org/anthology/P19-1574}
}

Data

(For only the (word, semantic-classes) datasets, please visit this directory: dataset.)

All WIKI-PSE files, including the corpus and trained embeddings can be dowloaded from this link:

data.zip

After unzipping the data.zip, in the root, "corpus.txt" is the corpus we used to train word embeddings (e,g, "@apple@").

In "word_sclass" subdirectory, there is another "corpus.txt". This one is used to train (word, semantic-class) embeddings (e.g., "@apple@-organization").

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WIKI-PSE

A Wikipedia-based resource for Probing Semantics in word Embeddings

Data

About

Releases

Packages

License

yyaghoobzadeh/WIKI-PSE

Folders and files

Latest commit

History

Repository files navigation

WIKI-PSE

A Wikipedia-based resource for Probing Semantics in word Embeddings

Data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages