# 📓 The GenAI Revolution Cookbook

**Title:** How to Build a Knowledge Graph Chatbot with Neo4j, Chainlit, GPT-4o

**Description:** Ship a Python knowledge graph chatbot using Neo4j, Chainlit, and GPT-4o—auto-generate Cypher, visualize results, and answer complex data questions accurately.

**📖 Read the full article:** [How to Build a Knowledge Graph Chatbot with Neo4j, Chainlit, GPT-4o](https://blog.thegenairevolution.com/article/how-to-build-a-knowledge-graph-chatbot-with-neo4j-chainlit-gpt-4o)

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



## Introduction to the Build

Knowledge graphs are fantastic at modeling complex relationships, but here's the problem: querying them requires Cypher, and that's a real barrier for non\-technical users. So I decided to build something that would solve this \- a production\-ready chatbot that takes natural language questions, converts them into Cypher queries, runs them against Neo4j, and returns both tabular results and interactive visualizations.

What you'll be building here combines GPT\-4o for query generation, Neo4j for graph storage, CrewAI for agent orchestration, and Chainlit for the conversational UI. By the time you're done, you'll have a working system that handles multi\-hop reasoning, enforces read\-only safety, and gracefully manages errors. This directly addresses those common pain points we all face \- LLM hallucinations, brittle SQL generation, and all that UI plumbing nobody wants to deal with. And yes, the whole thing runs in a Colab notebook with validation at each step.

## Why This Approach Works

**Graphs over tables**: Look, relational databases really struggle with multi\-hop queries. You know, things like friends\-of\-friends or recommendation paths. Neo4j's native graph traversal makes these queries not just possible but actually fast and expressive.

**GPT\-4o for Cypher generation**: I've found that GPT\-4o combines strong language understanding with surprisingly reliable structured output when you give it proper schema context and examples. This means you don't need hand\-coded query templates anymore, and the system adapts when your schema evolves.

**CrewAI for orchestration**: CrewAI gives you a lightweight agent framework that manages tool invocation and conversation flow. You don't have to build a custom orchestration layer from scratch, which, trust me, is more complicated than it sounds.

**Chainlit for UI**: Here's the thing about Chainlit \- it offers a chat interface with minimal boilerplate. You can focus on your backend logic instead of wrestling with frontend plumbing. And it supports streaming, tables, and charts right out of the box.

**Read\-only enforcement**: By restricting Cypher to MATCH/RETURN/CALL and using a read\-only Neo4j role, you prevent accidental or malicious writes. This makes the system actually safe for production use, which is obviously critical.

## Setup \& Installation

First, run this cell to install all dependencies with pinned versions:

In [None]:
%pip install -q neomodel==5.2.1 crewai==0.28.8 crewai-tools==0.1.6 openai==1.12.0 chainlit==1.0.200 python-dotenv==1.0.0 plotly==5.18.0 pandas==2.0.3

Next, you'll need to create a .env file with your credentials. This cell writes a template \- just replace the placeholders with your real values:

In [None]:
%%writefile .env
NEO4J_URI=bolt+s://your-instance.databases.neo4j.io:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

Now load and validate your environment variables:

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

required_keys = ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD", "OPENAI_API_KEY", "OPENAI_MODEL"]
missing = [k for k in required_keys if not os.getenv(k)]
if missing:
    raise EnvironmentError(f"Missing required environment variables: {', '.join(missing)}")

print("All required environment variables found.")

## Step\-by\-Step Implementation

### Step 1: Connect to Neo4j and Validate

Let's start by creating the directory structure and writing the Neo4j connection module. This module normalizes the URI, initializes the connection, and provides a test function:

In [None]:
!mkdir -p tools utils config

In [None]:
%%writefile tools/query_knowledge_graph.py
import os
from typing import Any, Dict, List, Tuple, Optional
from neomodel import db, config as neo_config
from dotenv import load_dotenv

required_keys = ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"]
missing = [k for k in required_keys if not os.getenv(k)]
if missing:
    raise EnvironmentError(
        f"Missing required environment variables: {', '.join(missing)}\n"
        "Please set them before running the application."
    )

print("All required Neo4j environment variables found.")
load_dotenv()


def _normalize_neo4j_uri(uri: str) -> str:
    return uri.replace("neo4j+s://", "bolt+s://").replace("neo4j://", "bolt://")


def init_neo4j_connection() -> None:
    uri = os.getenv("NEO4J_URI")
    user = os.getenv("NEO4J_USERNAME")
    pwd = os.getenv("NEO4J_PASSWORD")
    if not uri or not user or not pwd:
        raise RuntimeError("Missing Neo4j credentials.")

    bolt_uri = _normalize_neo4j_uri(uri)
    neo_config.DATABASE_URL = bolt_uri
    neo_config.MAX_POOL_SIZE = 20
    db.set_connection(bolt_uri, user=user, password=pwd)
    print("Neo4j connection initialized.")


def test_connection() -> Tuple[List[Tuple[Any]], List[Dict[str, Any]]]:
    return db.cypher_query("RETURN 'Connection successful' AS message")


def safe_init() -> None:
    try:
        init_neo4j_connection()
        res, _ = test_connection()
        print(f"Neo4j: {res[0][0]}")
    except Exception as e:
        raise RuntimeError(f"Neo4j connection failed: {e}")


def run_cypher(cypher: str, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    cypher_lower = cypher.strip().lower()
    unsafe_keywords = ["create", "merge", "delete", "set", "remove", "detach"]
    if any(kw in cypher_lower for kw in unsafe_keywords):
        return {"error": "Unsafe Cypher detected. Only read-only queries allowed.", "columns": [], "rows": [], "data": []}

    try:
        results, meta = db.cypher_query(cypher, params or {})
        columns = [m["name"] for m in meta]
        rows = [list(r) for r in results]
        data = [dict(zip(columns, row)) for row in rows]
        return {"columns": columns, "rows": rows, "data": data}
    except Exception as e:
        print(f"Cypher execution error: {e}")
        return {"error": str(e), "columns": [], "rows": [], "data": []}


def query_knowledge_graph(nl_query: str, cypher: str, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    result = run_cypher(cypher, params)
    result["nl_query"] = nl_query
    result["cypher"] = cypher
    return result

Test the connection to make sure everything's working:

In [None]:
from tools.query_knowledge_graph import safe_init, run_cypher

safe_init()
print(run_cypher("RETURN 1 AS one"))
print(run_cypher("MATCH (n) RETURN count(n) AS nodes LIMIT 1"))

You should see something like: `{'columns': ['one'], 'rows': [[1]], 'data': [{'one': 1}]}` along with a node count.

### Step 2: Define Agent and Task Configuration

Now we'll write the agent and task YAML files. The agent uses GPT\-4o and the knowledge graph tool to generate and execute Cypher queries:

In [None]:
%%writefile config/agents.yaml
cypher_agent:
  role: "Knowledge Graph Query Specialist"
  goal: "Generate accurate Cypher queries from natural language and return results."
  backstory: |
    You are an expert in Neo4j and Cypher. Given a schema and a question, you produce a valid Cypher query.
    You always return the query in a

cypher code block, followed by any chart configuration in a

```json
block if visualization is requested.
```

In [None]:
%%writefile config/tasks.yaml
generate_and_execute_query:
  description: |
    Given the question: {question}
    And the schema: {schema}
    Generate a Cypher query to answer it. Return the query in a

cypher block.
    If the user asks for a chart, also return a

```json
block with chart config (type, x, y, title).
  expected_output: |
    A
```

cypher code block with the query, and optionally a

```json
block with chart config.
  agent: cypher_agent
```

### Step 3: Build the CrewAI Agent and Tool

Create a utility to load YAML configuration:

In [None]:
%%writefile utils/yaml_loader.py
import yaml
from pathlib import Path

def load_yaml(file_path: str):
    with open(Path(file_path), "r") as f:
        return yaml.safe_load(f)

Define the CrewAI tool wrapper for the knowledge graph query function:

In [None]:
%%writefile tools/__init__.py
from crewai_tools import tool
from .query_knowledge_graph import query_knowledge_graph

@tool("Query Knowledge Graph")
def query_knowledge_graph_tool(nl_query: str, cypher: str) -> dict:
    """
    Execute a Cypher query against the Neo4j knowledge graph.
    Args:
        nl_query (str): The natural language question.
        cypher (str): The Cypher query to execute.
    Returns:
        dict: Query results with columns, rows, and data.
    """
    return query_knowledge_graph(nl_query, cypher)

Time to assemble the agent and task. This cell creates the agent, attaches the tool, and defines the task:

In [None]:
%%writefile agent_setup.py
import os
from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI
from tools import query_knowledge_graph_tool
from utils.yaml_loader import load_yaml

agents_config = load_yaml("config/agents.yaml")
tasks_config = load_yaml("config/tasks.yaml")

llm = ChatOpenAI(model=os.getenv("OPENAI_MODEL"), temperature=0)

cypher_agent = Agent(
    config=agents_config["cypher_agent"],
    tools=[query_knowledge_graph_tool],
    llm=llm,
    verbose=True
)

def create_query_task(question: str, schema: str) -> Task:
    return Task(
        config=tasks_config["generate_and_execute_query"],
        agent=cypher_agent,
        context={"question": question, "schema": schema}
    )

### Step 4: Parse Agent Output and Execute Tool

Write a parser to extract Cypher and chart JSON from the agent's response:

In [None]:
%%writefile utils/parser.py
import re
import json
from typing import Optional, Tuple

def extract_cypher(text: str) -> Optional[str]:
    match = re.search(r"

cypher\s+(.*?)\s+

In [None]:
", text, re.DOTALL | re.IGNORECASE)
    return match.group(1).strip() if match else None

def extract_chart_config(text: str) -> Optional[dict]:
    match = re.search(r"

json\s+(.*?)\s+

In [None]:
", text, re.DOTALL | re.IGNORECASE)
    if match:
        try:
            return json.loads(match.group(1).strip())
        except json.JSONDecodeError:
            return None
    return None

Let's test the agent and parser with a sample question:

In [None]:
from agent_setup import cypher_agent, create_query_task
from crewai import Crew
from utils.parser import extract_cypher, extract_chart_config

schema = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie), (Person)-[:DIRECTED]->(Movie)
"""

question = "Who acted in movies released after 2010?"
task = create_query_task(question, schema)
crew = Crew(agents=[cypher_agent], tasks=[task], verbose=True)

result = crew.kickoff()
print("Agent output:", result)

cypher = extract_cypher(str(result))
chart_config = extract_chart_config(str(result))
print("Extracted Cypher:", cypher)
print("Chart config:", chart_config)

You should get a Cypher query like `MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.year > 2010 RETURN p.name, m.title` and optionally a chart config JSON.

### Step 5: Wire the Chainlit UI

Create the Chainlit app. This app maintains conversation history, sends questions to the agent, extracts and executes Cypher, and renders results as tables and charts:

In [None]:
%%writefile chat.py
import os
import chainlit as cl
from crewai import Crew
from agent_setup import cypher_agent, create_query_task
from tools.query_knowledge_graph import query_knowledge_graph, safe_init
from utils.parser import extract_cypher, extract_chart_config
import plotly.graph_objects as go
import pandas as pd

safe_init()

SCHEMA = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie), (Person)-[:DIRECTED]->(Movie)
"""

@cl.on_chat_start
async def start():
    cl.user_session.set("history", [])
    await cl.Message(content="Ask me anything about the knowledge graph!").send()

@cl.on_message
async def main(message: cl.Message):
    question = message.content
    history = cl.user_session.get("history")
    history.append({"role": "user", "content": question})

    task = create_query_task(question, SCHEMA)
    crew = Crew(agents=[cypher_agent], tasks=[task], verbose=False)
    
    try:
        result = crew.kickoff()
        agent_output = str(result)
        history.append({"role": "assistant", "content": agent_output})
        cl.user_session.set("history", history[-10:])

        cypher = extract_cypher(agent_output)
        chart_config = extract_chart_config(agent_output)

        if not cypher:
            await cl.Message(content="Could not extract a valid Cypher query.").send()
            return

        query_result = query_knowledge_graph(question, cypher)

        if "error" in query_result:
            await cl.Message(content=f"Query error: {query_result['error']}").send()
            return

        df = pd.DataFrame(query_result["data"])
        table_md = df.to_markdown(index=False)
        await cl.Message(content=f"**Results:**\n\n{table_md}").send()

        if chart_config and not df.empty:
            chart_type = chart_config.get("type", "bar")
            x_col = chart_config.get("x")
            y_col = chart_config.get("y")
            title = chart_config.get("title", "Chart")

            if x_col in df.columns and y_col in df.columns:
                if chart_type == "bar":
                    fig = go.Figure(data=[go.Bar(x=df[x_col], y=df[y_col])])
                elif chart_type == "line":
                    fig = go.Figure(data=[go.Scatter(x=df[x_col], y=df[y_col], mode='lines+markers')])
                else:
                    fig = go.Figure(data=[go.Bar(x=df[x_col], y=df[y_col])])
                
                fig.update_layout(title=title, xaxis_title=x_col, yaxis_title=y_col)
                await cl.Message(content="", elements=[cl.Plotly(name="chart", figure=fig)]).send()

    except Exception as e:
        await cl.Message(content=f"Error: {str(e)}").send()

To run the Chainlit app in Colab, you can use a tunnel service like ngrok to expose it publicly:

In [None]:
!chainlit run chat.py --host 0.0.0.0 --port 8000

If you're developing locally, just run `chainlit run chat.py` and open the provided URL.

### Step 6: Validate End\-to\-End

Test the complete system with a sample question. This cell simulates the agent workflow without the UI:

In [None]:
from agent_setup import cypher_agent, create_query_task
from crewai import Crew
from tools.query_knowledge_graph import query_knowledge_graph
from utils.parser import extract_cypher

schema = """
Nodes: Person(name, age), Movie(title, year)
Relationships: (Person)-[:ACTED_IN]->(Movie)
"""

question = "List all actors and their movies"
task = create_query_task(question, schema)
crew = Crew(agents=[cypher_agent], tasks=[task], verbose=False)

result = crew.kickoff()
cypher = extract_cypher(str(result))
print("Generated Cypher:", cypher)

if cypher:
    query_result = query_knowledge_graph(question, cypher)
    print("Query result:", query_result)

You should see a Cypher query and a result dictionary with columns, rows, and data.

## Production Considerations

**Read\-only enforcement**: Use a Neo4j role with read\-only permissions. The run\_cypher function already rejects unsafe keywords, but you should validate this in your Neo4j user setup as well.

**Error handling and retries**: You'll want to wrap OpenAI and Neo4j calls in retry logic with exponential backoff. Libraries like tenacity work great for handling transient 429 or network errors.

**Token and cost control**: I recommend limiting conversation history to the last 10 messages. Prune or summarize older context. And definitely monitor token usage per request and set budget alerts in your OpenAI dashboard.

**Logging and observability**: Add correlation IDs to each request. Log Cypher queries, parameters, execution time, and errors. Use structured logging (JSON) for easy parsing and alerting \- this has saved me countless debugging hours.

**Schema versioning**: Store schema definitions in version control. Update prompts when the schema changes. Always test queries against a staging graph before deploying to production. I learned this one the hard way.

**Caching**: Cache normalized Cypher queries with a hash key and return saved results for repeated questions. Use an LRU cache with TTL. For deeper optimization, consider adding semantic caching with Redis Vector to reuse similar queries using embeddings. Actually, this can dramatically improve response times for common queries.

**Testing**: Write unit tests for run\_cypher, extract\_cypher, and extract\_chart\_config. Create integration tests that validate end\-to\-end flows with seed data. Assert expected column names, row counts, and chart configs. Don't skip this \- it'll save you from embarrassing production issues.

## Conclusion and Next Steps

So you've built a knowledge graph chatbot that translates natural language into Cypher, executes queries safely, and visualizes results. The system uses GPT\-4o for query generation, CrewAI for orchestration, and Chainlit for a conversational UI. You've implemented read\-only enforcement, error handling, and incremental validation at each step.

Here's what you can do next:

* **Swap LLMs**: Replace GPT\-4o with Claude or Llama by changing the llm parameter in agent\_setup.py. Test prompt compatibility and adjust schema examples if needed. Each model has its quirks.
* **Add RAG for schema context**: If your schema is large or dynamic, embed schema documentation and retrieve relevant subsets before query generation. Check out our guide to structured data extraction with LLMs for best practices on deterministic outputs.
* **Integrate with external APIs**: Extend the agent with additional tools \- REST API calls, database lookups, whatever you need to enrich query results or trigger actions based on graph insights.
* **Deploy to production**: Containerize the app with Docker, set up environment\-specific configs, and deploy to your cloud platform of choice (AWS, GCP, Azure). Use a reverse proxy like nginx and enable HTTPS. Don't forget this last part.
* **Build stateful workflows**: For more complex agent behaviors, explore building a stateful AI agent with LangGraph to manage multi\-turn conversations and conditional logic. This opens up a whole new world of possibilities.