Skip to content

themaximalist/embeddings.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embeddings.js

Embeddings.js — Simple Embeddings library for Node.js

GitHub Repo stars NPM Downloads GitHub code size in bytes GitHub License

Embeddings.js is a simple way to get text embeddings in Node.js. Embeddings are useful for text similarity search using a vector database.

await embeddings("Hello World!"); // embedding array

Install

npm install @themaximalist/embeddings.js

To use local embeddings, be sure to install the model as well

npm install @xenova/transformers

Configure

Embeddings.js works out of the box with local embeddings, but if you use the OpenAI or Mistral embeddings you'll need an API key in your environment.

export OPENAI_API_KEY=<your-openai-api-key>
export MISRAL_API_KEY=<your-mistral-api-key>

Usage

Using Embeddings.js is as simple as calling a function with any string.

import embeddings from "@themaximalist/embeddings.js";

// defaults to local embeddings
const embedding = await embeddings("Hello World!");
// 384 dimension embedding array

Switching embedding models is easy:

// openai
const embedding = await embeddings("Hello World", {
    service: "openai"
});
// 1536 dimension embedding array

// mistral
const embedding = await embeddings("Hello World", {
    service: "mistral"
})
// 1024 dimension embedding array

Cache

Embeddings.js caches by default, but you can disable it by passing cache: false as an option.

// don't cache (on by default)
const embedding = await embeddings("Hello World!", {
    cache: false
});

The cache file is written to .embeddings.cache.json—you can also delete this file to reset the cache.

API

The Embeddings.js API is a simple function you call with your text, with an optional config object.

await embeddings(
    input, // Text input to compute embeddings
    {
        service: "openai", // Embedding service
        model: "text-embedding-ada-002", // Embedding model
        cache: true, // Cache embeddings
        cache_file: ".embeddings.cache.json", // Cache file
    }
);

Options

  • service <string>: Embedding service provider. Default is transformers, a local embedding provider.
  • model <string>: Embedding service model. Default is Xenova/all-MiniLM-L6-v2, a local embedding model. If no model is provided, it will use the default for the selected service.
  • cache <bool>: Cache embeddings. Default is true.
  • cache_file <string>: Cache file. Default is .embeddings.cache.json.

Response

Embeddings.js returns a float[] — an array of floating-point numbers.

[ -0.011776604689657688,   0.024298833683133125,  0.0012317118234932423, ... ]

The length of the array is the dimensions of the embedding. When performing text similarity, you'll want to know the dimensions of your embeddings to use them in a vector database.

Dimension Embeddings

  • Local: 384
  • OpenAI: 1536
  • Mistral: 1024

The Embeddings.js API ensures you have a simple way to use embeddings from multiple providers.

Debug

Embeddings.js uses the debug npm module with the embeddings.js namespace.

View debug logs by setting the DEBUG environment variable.

> DEBUG=embeddings.js*
> node src/get_embeddings.js
# debug logs

Vector Database

Embeddings can be used in any vector database like Pinecone, Chroma, PG Vector, etc...

For a local vector database that runs in-memory and uses Embeddings.js internally, check out VectorDB.js.

Projects

Embeddings.js is currently used in the following projects:

License

MIT

Author

Created by The Maximalist, see our open-source projects.