Skip to content

vecto-ai/benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Benchmarks for word embeddings evaluation

The metadata for a dataset includes:

  • language (en, ja, etc)
  • task (analogy, similarity, etc)
  • description (e.g. Bigger Analogy Test Set)
  • version (e.g. 3.0)
  • cite (bibtex for the paper to cite)

Available datasets

English

Word similarity:

  1. WordSim 353
  2. MEN
  3. SimLex
  4. Rare Words
  5. MTurk

Word analogy:

  1. BATS

Text classification:

  1. IMDb moview reviews sample

Japanese

Word similarity:

  1. Japanese word similarity (https://github.com/tmu-nlp/JapaneseWordSimilarityDataset)

Japanese word similarity:

  1. JBATS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published