Code and datasets for MMTEB (Multilingual MTEB), used in our Multilingual Representations in Embeddings Models blog post for MIT 6.S898. The repo includes code to run the tests on a Transformers model
To run MMTEB, use mmteb.py. Tasks are in tasks.py. All datasets are hosted on the HuggingFace hub, at the names linked below, except for SciFact, which is local in the repo.
Huge thanks to the MTEB authors, Niklas Muennighoff, Nouamane Tazi, Loïc Magne and Nils Reimers.
Further thanks to Conviction for supplying API credits.
Dataset | Link |
---|---|
Content Cell | Content Cell |
Content Cell | Content Cell |