Proposal name
Built-in Embedding API
Short description
We propose a new Web Platform API that allows developers to generate high-dimensional vector representations (embeddings) of text content directly on the user's device using a browser-provided model. This enables low-latency, privacy-preserving semantic features (like search, RAG, and moderation) without the costs of cloud APIs or the bandwidth overhead of downloading large models via WebAssembly/WebGPU.
Read the complete Explainer
Example use cases
- Semantic Search: Search notes or documents based on meaning rather than exact keyword matches, entirely on-device.
- Retrieval-Augmented Generation (RAG): Embed user queries to find relevant offline passages to feed into a local or cloud LLM.
- Real-time Content Intelligence: Proactively flag potentially toxic comments or provide moderation hints as a user types.
A rough idea or two about implementation
The API follows the standard availability -> create -> execute pattern of other Built-In AI APIs. It supports batching and returns a structured object containing the raw Float32Array vectors and relevant metadata (like the embedding space identifier for compatibility tracking).
const semanticEmbedder = await SemanticEmbedder.create();
// Embed strings (supports batching)
const result = await semanticEmbedder.embed([
"The quick brown fox jumps over the lazy dog.",
"A fast, dark-colored fox leaps over a resting hound."
]);
// Extract the raw vectors for storage (e.g., IndexedDB) or Cosine Similarity
const vector1 = result.embeddings[0].values;
const vector2 = result.embeddings[1].values;
semanticEmbedder.destroy();
Proposal name
Built-in Embedding API
Short description
We propose a new Web Platform API that allows developers to generate high-dimensional vector representations (embeddings) of text content directly on the user's device using a browser-provided model. This enables low-latency, privacy-preserving semantic features (like search, RAG, and moderation) without the costs of cloud APIs or the bandwidth overhead of downloading large models via WebAssembly/WebGPU.
Read the complete Explainer
Example use cases
A rough idea or two about implementation
The API follows the standard
availability->create->executepattern of other Built-In AI APIs. It supports batching and returns a structured object containing the rawFloat32Arrayvectors and relevant metadata (like the embedding space identifier for compatibility tracking).