Skip to content

victorgearhead/Conversational-Memory-Stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Conversational Memory Stack Architecture for AI Systems

⚡ Why This Exists

Every common memory approach solves part of the problem.
Buffer, summary, vector, graph — each is useful, but incomplete alone.

Real memory is not a tool. It’s a stack.


🧩 The Big Idea: Layered Memory

Layer Role Speed Purpose
⚡ Cache Reflex ~16ms Instant responses
🧠 Vector DB Episodes Medium Semantic recall
🕸️ Graph DB Reasoning Slower Multi-hop logic
🔄 Invalidation Truth Critical Correctness over time

🏗️ Architecture Overview

User Query
   ↓
⚡ Semantic Cache (Redis)
   ↓ (miss)
🧠 Vector Search (Qdrant)
   ↓
🕸️ Graph Traversal (FalkorDB)
   ↓
Response + Cache Write

⚡ Lightning-Fast Cache Layer

  • Semantic caching (NOT string matching)
  • ~16ms latency on RTX 3050
  • Stores query + result + metadata

The fastest answer is the one you don’t compute twice.


🧠 Vector Memory (Qdrant)

  • Stores embeddings + summaries
  • Handles “what happened before”
  • User-scoped filtering (security-first)

🕸️ Graph Memory (FalkorDB)

  • Nodes = memories
  • Edges = relationships
  • Enables multi-hop reasoning

🔄 Invalidation > TTL

  • TTL = time-based expiry ❌
  • Invalidation = correctness ✅

✔ Mark, don’t delete
✔ Preserve history
✔ Enable audit & rollback


🎯 Query Modes

Mode Description
⚡ Direct Cache hit
🔍 Search Vector retrieval
🧠 Reason Graph traversal

🔐 Security by Design

  • User-based filtering at retrieval
  • No cross-user leakage
  • Permission handled early (not after)

🧪 Performance

  • Model: Granite 3.3
  • GPU: RTX 3050
  • Cache Latency: ~16ms
  • Speedup: ~50x vs full pipeline

🧠 Key Principles

  • Cache at meaning level, not strings
  • Memory is layered, not singular
  • Invalidation is mandatory
  • Routing is leverage
  • Continuity is baseline UX

References

  • Zep COnversational Memory

🚀 Future Work

  • Dual cache (global + user)
  • Smart routing layer
  • Event-driven invalidation
  • Graph-aware reranking

🏁 Final Thought

Enterprise AI isn’t a chatbot.
It’s memory + routing + reasoning + correctness — working together.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors