Skip to content
@MinishLab

Minish

Solving big problems with small models

Hello, we're Minish!

We're a two-person (@pringled and @stephantul) open-source lab, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on a CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: tiny static embedding models with state-of-the-art performance.
  • potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
  • model2vec-rs: a Rust port of model2vec.

You can also find us on:

Pinned Loading

  1. model2vec model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.7k 92

  2. semhash semhash Public

    Fast Semantic Text Deduplication & Filtering

    Python 737 42

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 290 8

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 79 8

  5. model2vec-rs model2vec-rs Public

    Official Rust Implementation of Model2Vec

    Rust 118 5

Repositories

Showing 10 of 10 repositories

Top languages

Loading…

Most used topics

Loading…