Skip to content

vgiri2015/semantic-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Caching Demo

Simple full-stack demo that shows a semantic cache layer using OpenAI embeddings.

  • Embeddings: text-embedding-3-small (cheap and good for clustering/caching)
  • LLM: gpt-4o-mini by default
  • Cache: JSON file at data/cache.json
  • Endpoints: POST /api/query first checks cache, then calls the LLM on miss

Setup

  • 1. Install
npm install
  • 2. Configure env
cp .env.example .env
# edit .env and set OPENAI_API_KEY
  • 3. Run
npm start

Open http://localhost:3000

How it works

  • Cache lookup: Incoming prompt is embedded via text-embedding-3-small. We compute cosine similarity vs stored embeddings and pick the best of topK. If similarity >= threshold, it's a cache hit.
  • Cache miss: We call the chat model, return the response, and store { prompt, embedding, response } for future hits.

API

  • POST /api/query
{
  "prompt": "How's the weather in NYC right now?",
  "threshold": 0.9,
  "topK": 3
}

Response:

{
  "source": "cache" | "llm",
  "cacheHit": true | false,
  "similarity": 0.9432, // present on hits
  "matchedPrompt": "What is the weather like in New York today?", // on hits
  "response": "...",
  "prompt": "..."
}
  • GET /api/cache returns the cache content.
  • DELETE /api/cache clears the cache.

Demo steps

  • 1. Clear cache with the UI button.
  • 2. Send: What is the weather like in New York today? → first time will be a miss (LLM called).
  • 3. Send: How's the weather in NYC right now? → should be a cache hit (similar meaning).
  • 4. Try unrelated prompts → miss and then build cache.

Notes

  • You can tweak threshold and topK in the UI. Start with threshold 0.90. Raising it reduces false hits; lowering increases reuse.
  • This demo stores responses verbatim in a local JSON file. For production, use a vector DB and handle privacy/security appropriately.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published