Your AI Agent Doesn't Need Neo4j — Lightweight Ontology in Practice #20
xg-gh-25
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Your AI Agent Doesn't Need Neo4j — Darwinian Knowledge Management
You've Seen This Problem
Your AI Agent has been running for 3 months. It's accumulated 200+ "lessons learned." Then you notice:
Knowledge bloat isn't a storage problem. It's a lifecycle problem.
The Obvious Solution: Graph Database
Neo4j. Amazon Neptune. Stardog. Define ontology in OWL, query with SPARQL, traverse with Cypher. Structured storage + semantic queries + relationship reasoning. Sounds perfect.
We considered it. Then hit three fatal issues:
1. Maintenance tax is brutal. Every code change requires syncing triples. A rename operation in Neo4j is MATCH + DELETE + CREATE chains. In Markdown it's
sed.2. Query cost > injection cost. 1M context window ≈ 750K words. Your entire knowledge base might be 12K tokens. Directly injecting into the system prompt costs far less than maintaining a RAG pipeline + embeddings + retrieval + re-ranking.
3. Graph databases don't manage death. Neo4j stores forever. Nodes once created, never die. You need custom cron jobs + business logic + versioned cleanup. And "when to forget" is the hardest decision.
The Core Thesis: Darwinism vs Encyclopedia
Most knowledge management systems follow the encyclopedia model: store everything, never delete, filter at query time.
We chose the Darwinian model: knowledge has a lifecycle. Used = strengthened. Unused = decays. Eventually = dies.
Why Darwinian is better for AI Agents:
Agents don't need "all knowledge." They need "knowledge relevant right now." A 6-month-old Python 3.9 workaround in a 3.12 environment isn't just useless — it's harmful.
Forgetting is a feature, not a bug. Human brains forget because forgetting frees cognitive resources for what matters. An agent's context window is finite — useless knowledge occupying space is compressing useful knowledge.
References = natural selection. If a knowledge entry hasn't been referenced by any pipeline, any decision, any conversation in 90 days — it's probably not important. No human judgment needed. The system knows.
Implementation: Three Layers (~1,000 Lines of Python)
Don't let the theory scare you. The core is: a schema + entries with lifecycles + a relation file.
Layer 1: Define How Knowledge Is Organized (Schema)
You need to answer: how many types? What does each look like?
We use 5 types (MECE — Mutually Exclusive, Collectively Exhaustive):
Why not free-form tags? Because tags have no query contract. You can't say "BUILD stage needs all guidelines + pitfalls" if every entry has arbitrary tags. Fixed types = programmable query interface.
Schema doesn't need OWL. A Markdown file's section structure IS schema. The agent just
Reads it.Layer 2: Each Entry Is a Living Entity
Three fields determine life and death:
ref:Nlast:datedecay:stateDecay rules:
Key: no human involvement. The decay engine runs daily. You don't need "quarterly knowledge reviews" — the system knows who lives and who dies.
Layer 3: Relations Between Entries
10 relation types (
motivated_by,supersedes,extends,applies_to,conflicts_with, etc.), all in one YAML file.The killer feature: relations grow automatically. When the system processes a file and references a knowledge entry, it auto-creates an
applies_torelation. Next time it processes the same file, that entry gets priority boost.More use → richer relations → better recommendations → more use. Flywheel.
Mistakes We Made (So You Don't Have To)
Mistake 1: Signal words for type classification can't be common words.
First version used "pipeline", "step", "→" as
processsignals. Result: 37/179 entries misclassified — those words appear in ALL types of technical text. Fix: signal words must be unique to the type ("race condition" → pitfall, "workflow:" → process).Mistake 2: Title matching for reference tracking has false positives.
Title "Build" matches any text containing "Build". Fix: skip titles < 8 chars, use word boundary regex for 8-20 chars, substring for 20+.
Mistake 3: The decay system can't have "exemption paths."
First version: if an entry has no date, skip decay assessment. Result: manually added knowledge never decays. Fix: no date = treat as infinitely old → immediate decay trigger.
Mistake 4 (most expensive): If a gate can pass because data is "absent," the gate doesn't exist.
Our pipeline validator checked "is the adversarial review tier correct?" But what if the entire field is missing?
Noneisn't wrong, isn't right — it skips the check entirely. Four times the same bug class before we learned: absent = violation, not exemption.The Compound Loop: Why Knowledge Gets Better Over Time
Evidence:
Start Today: 3 Steps (10 Minutes to 1 Hour)
You don't need our system. The core idea works with any agent:
Step 1: Add type labels to your lessons (10 minutes)
Just
[guideline]/[pitfall]/[decision]is enough to start.Step 2: Add reference counting (30 minutes)
In your agent's post-execution hook:
Step 3: Write a daily cron for decay (1 hour)
That's it. The relation layer (Layer 3) is advanced — add it when you have 100+ entries and discover "globally popular ≠ currently relevant."
When You DO Need a Real KG
Honest boundaries. Consider Neo4j/Neptune when ANY of these hold:
For 1-3 person + AI teams, these conditions won't hold for the foreseeable future. When 179 entries scan in 0.1ms, you don't need an index.
Principles (Take These, Forget the Implementation Details)
Conclusion
A best practice nobody has referenced in 90 days has the same value to the agent as a deleted entry — zero. The difference is the former still occupies tokens, pollutes context, and consumes attention.
Kill it. Let the living knowledge breathe.
~1,000 lines of Python + Markdown + YAML. Zero external dependencies. Start today.
Author: XG | SwarmAI — Human directs, AI delivers
Code: github.com/xg-gh-25/SwarmAI
Discussion: github.com/xg-gh-25/SwarmAI/discussions/20
Beta Was this translation helpful? Give feedback.
All reactions