We're a two-person (@pringled and @stephantul) open-source lab, with a focus on Natural Language Processing.
We believe that if you make models fast enough, you unlock new possibilities.
Using our software, you can:
- Embed the entire English Wikipedia in 5 minutes
- Classify tens of thousands of documents per second on a CPU
- Approximately deduplicate extremely large datasets in minutes
- Build the fastest RAG application in the world
- Easily evaluate which ANN algorithm works best for your data
Our projects:
- model2vec: tiny static embedding models with state-of-the-art performance.
- potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
- vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
- semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
- model2vec-rs: a Rust port of model2vec.
You can also find us on:
- 🤗 huggingface
- 💬 Discord