The metadata for a dataset includes:
- language (en, ja, etc)
- task (analogy, similarity, etc)
- description (e.g. Bigger Analogy Test Set)
- version (e.g. 3.0)
- cite (bibtex for the paper to cite)
Word similarity:
- WordSim 353
- MEN
- SimLex
- Rare Words
- MTurk
Word analogy:
- BATS
Text classification:
- IMDb moview reviews sample
Word similarity:
- Japanese word similarity (https://github.com/tmu-nlp/JapaneseWordSimilarityDataset)
Japanese word similarity:
- JBATS