Semantic Caching Demo

Simple full-stack demo that shows a semantic cache layer using OpenAI embeddings.

Embeddings: text-embedding-3-small (cheap and good for clustering/caching)
LLM: gpt-4o-mini by default
Cache: JSON file at data/cache.json
Endpoints: POST /api/query first checks cache, then calls the LLM on miss

Setup

1. Install

npm install

2. Configure env

cp .env.example .env
# edit .env and set OPENAI_API_KEY

3. Run

npm start

Open http://localhost:3000

How it works

Cache lookup: Incoming prompt is embedded via text-embedding-3-small. We compute cosine similarity vs stored embeddings and pick the best of topK. If similarity >= threshold, it's a cache hit.
Cache miss: We call the chat model, return the response, and store { prompt, embedding, response } for future hits.

API

POST /api/query

{
  "prompt": "How's the weather in NYC right now?",
  "threshold": 0.9,
  "topK": 3
}

Response:

{
  "source": "cache" | "llm",
  "cacheHit": true | false,
  "similarity": 0.9432, // present on hits
  "matchedPrompt": "What is the weather like in New York today?", // on hits
  "response": "...",
  "prompt": "..."
}

GET /api/cache returns the cache content.
DELETE /api/cache clears the cache.

Demo steps

1. Clear cache with the UI button.
2. Send: What is the weather like in New York today? → first time will be a miss (LLM called).
3. Send: How's the weather in NYC right now? → should be a cache hit (similar meaning).
4. Try unrelated prompts → miss and then build cache.

Notes

You can tweak threshold and topK in the UI. Start with threshold 0.90. Raising it reduces false hits; lowering increases reuse.
This demo stores responses verbatim in a local JSON file. For production, use a vector DB and handle privacy/security appropriately.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Caching Demo

Setup

How it works

API

Demo steps

Notes

About

Uh oh!

Releases

Packages

Languages

vgiri2015/semantic-cache

Folders and files

Latest commit

History

Repository files navigation

Semantic Caching Demo

Setup

How it works

API

Demo steps

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages