Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hello, we're minish!

We're a two-person (@pringled and @stephantul) open-source company, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: make tiny models that are still really really good.
  • potion: the best small model in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, approximate deduplication for your text datasets.

You can also find us on:

Pinned Loading

  1. model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.1k 49

  2. semhash Public

    Fast Semantic Text Deduplication

    Python 567 24

  3. vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 258 8

  4. tokenlearn Public

    Pre-train Static Word Embeddings

    Python 48 3

Repositories

Showing 9 of 9 repositories
  • korok Public

    Lightweight Hybrid Search and Reranking

    Python 9 MIT 1 0 0 Updated Mar 10, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    Python 48 MIT 3 2 1 Updated Mar 7, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 258 MIT 8 1 0 Updated Mar 2, 2025
  • model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1,092 MIT 49 4 3 Updated Mar 2, 2025
  • semhash Public

    Fast Semantic Text Deduplication

    Python 567 MIT 24 1 2 Updated Feb 28, 2025
  • .github Public

    Readme

    0 0 0 0 Updated Feb 15, 2025
  • SCSS 0 MIT 0 0 0 Updated Feb 6, 2025
  • watertemplate Public template

    Template

    Makefile 2 MIT 1 0 0 Updated Dec 9, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    Python 10 MIT 0 0 0 Updated Sep 25, 2024

Top languages

Loading…

Most used topics

Loading…