# Knowledge Graphs and RAG with Neo4j and LangChain v1.x

This notebook explores:
- Why Knowledge Graphs matter for RAG systems
- Neo4j fundamentals and Cypher query language
- Building RAG systems with structured knowledge
- Advanced patterns using LangChain v1.x agents

## Part 1: Why Knowledge Graphs?

### The Core Difference
Traditional RAG stores information as text chunks. Knowledge Graphs store information as **relationships between entities**.

### Key Components
1. **Nodes (Entities)**: Things that exist - people, places, concepts
2. **Edges (Relations)**: How things connect - FOUNDED, ACTED_IN, DIRECTED
3. **Properties (Attributes)**: Details about nodes/edges - {name: "Alice", age: 30}

### The Problem Knowledge Graphs Solve

**Traditional RAG:** Query "Which companies founded by Stanford dropouts are now worth over $1 trillion?"
- Embed question, find similar documents
- Hope the right chunks are retrieved
- LLM must infer relationships from disconnected text

**Knowledge Graph RAG:** Same query
- Direct Cypher query: `MATCH (person:Person)-[:ATTENDED]->(stanford:University {name: "Stanford"})`
- Graph already knows the relationships
- Query structured knowledge, not text fragments

## Part 2: Environment Setup

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

# Load environment variables
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["NEO4J_URI"] = os.getenv("NEO4J_URI")
os.environ["NEO4J_USERNAME"] = os.getenv("NEO4J_USERNAME")
os.environ["NEO4J_PASSWORD"] = os.getenv("NEO4J_PASSWORD")

## Part 3: Neo4j Fundamentals

### Cypher Query Language Basics

**Node Syntax:** `(variable:Label {property: value})`
- `()` defines a node
- `:Label` categorizes the node type
- `{properties}` stores node attributes

**Relationship Syntax:** `-[:TYPE {property: value}]->`
- `--` defines connection
- `->` shows direction
- `[:TYPE]` categorizes the relationship

**Pattern Matching:** Describe what you're looking for
```cypher
MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)
WHERE movie.title = "Casino"
RETURN person.name
```

In [2]:
# Connect to Neo4j
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD")
)

print("Connected to Neo4j successfully")

  graph = Neo4jGraph(


Connected to Neo4j successfully


## Part 4: Load Movie Dataset into Neo4j

We'll load a movies dataset with actors, directors, and genres to demonstrate graph capabilities.

In [3]:
# Clear existing data (optional - use cautiously)
graph.query("MATCH (n) DETACH DELETE n")
print("Existing data cleared")

Existing data cleared


In [4]:
# Load movies dataset from CSV
# Creates Movie nodes with properties, Person nodes for actors/directors, Genre nodes
# Establishes ACTED_IN, DIRECTED, and IN_GENRE relationships

movie_query = """
LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' as row

MERGE(m:Movie{id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)

FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))

FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))

FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movie_query)
print("Movie dataset loaded successfully")

Movie dataset loaded successfully


In [5]:
# Refresh schema to get current graph structure
graph.refresh_schema()
print("Graph Schema:")
print(graph.schema)

Graph Schema:
Node properties:
Movie {id: STRING, released: DATE, title: STRING, imdbRating: FLOAT}
Person {name: STRING}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


## Part 5: Basic Cypher Query Examples

In [6]:
# Example 1: Find all movies
result = graph.query("""
MATCH (m:Movie)
RETURN m.title, m.released, m.imdbRating
ORDER BY m.imdbRating DESC
LIMIT 5
""")

print("Top 5 rated movies:")
for movie in result:
    print(f"{movie['m.title']} ({movie['m.released']}) - Rating: {movie['m.imdbRating']}")

Top 5 rated movies:
Shawshank Redemption, The (1994-10-14) - Rating: 9.3
Pulp Fiction (1994-10-14) - Rating: 8.9
Star Wars: Episode IV - A New Hope (1977-05-25) - Rating: 8.7
Seven (a.k.a. Se7en) (1995-09-22) - Rating: 8.6
Usual Suspects, The (1995-09-15) - Rating: 8.6


In [7]:
# Example 2: Find actors in a specific movie
result = graph.query("""
MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Casino'})
RETURN p.name
""")

print("Actors in Casino:")
for actor in result:
    print(f"- {actor['p.name']}")

Actors in Casino:
- Robert De Niro
- Joe Pesci
- Sharon Stone
- James Woods


In [8]:
# Example 3: Find director and actors for a movie
result = graph.query("""
MATCH (d:Person)-[:DIRECTED]->(m:Movie {title: 'Casino'})
OPTIONAL MATCH (a:Person)-[:ACTED_IN]->(m)
RETURN d.name AS director, collect(a.name) AS actors
""")

print("Casino details:")
for row in result:
    print(f"Director: {row['director']}")
    print(f"Actors: {', '.join(row['actors'][:3])}...")

Casino details:
Director: Martin Scorsese
Actors: Robert De Niro, Joe Pesci, Sharon Stone...


In [9]:
# Example 4: Multi-hop query - Find co-actors (actors who worked with Tom Hanks)
result = graph.query("""
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(coactor:Person)
WHERE tom <> coactor
RETURN DISTINCT coactor.name AS coactor, m.title AS movie
LIMIT 5
""")

print("Actors who worked with Tom Hanks:")
for row in result:
    print(f"- {row['coactor']} in '{row['movie']}'")

Actors who worked with Tom Hanks:
- Jim Varney in 'Toy Story'
- Tim Allen in 'Toy Story'
- Don Rickles in 'Toy Story'
- Kevin Bacon in 'Apollo 13'
- Bill Paxton in 'Apollo 13'


In [10]:
# Example 5: Aggregation - Count movies per genre
result = graph.query("""
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
RETURN g.name AS genre, COUNT(m) AS movie_count
ORDER BY movie_count DESC
LIMIT 5
""")

print("Top genres by movie count:")
for row in result:
    print(f"{row['genre']}: {row['movie_count']} movies")

Top genres by movie count:
Drama: 162 movies
Comedy: 104 movies
Romance: 63 movies
Thriller: 55 movies
Action: 46 movies


## Part 6: Initialize LLM and Embeddings

In [25]:
# Initialize Groq LLM
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="openai/gpt-oss-120b",
    temperature=0
)

print("Groq LLM initialized")

Groq LLM initialized


In [12]:
# Initialize HuggingFace embeddings for vector operations
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={'normalize_embeddings': True}
)

print("HuggingFace embeddings initialized")

HuggingFace embeddings initialized


## Part 7: Traditional Text-to-Cypher with GraphCypherQAChain

GraphCypherQAChain automatically:
1. Analyzes the graph schema
2. Generates Cypher queries from natural language
3. Executes queries against Neo4j
4. Formats results into natural language

In [13]:
from langchain_neo4j import GraphCypherQAChain

# Create basic text-to-cypher chain
cypher_chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    verbose=True,
    allow_dangerous_requests=True  # Required for v1.x
)

print("GraphCypherQAChain created")

GraphCypherQAChain created


In [14]:
# Test query 1: Who directed Casino?
response = cypher_chain.invoke({"query": "Who was the director of the movie Casino?"})
print(f"\nAnswer: {response['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:DIRECTED]->(m:Movie {title: 'Casino'}) RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Martin Scorsese'}][0m

[1m> Finished chain.[0m

Answer: Martin Scorsese was the director of the movie Casino.


In [15]:
# Test query 2: How many movies did Tom Hanks act in?
response = cypher_chain.invoke({"query": "How many movies has Tom Hanks acted in?"})
print(f"\nAnswer: {response['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)[0m
Full Context:
[32;1m[1;3m[{'count(m)': 2}][0m

[1m> Finished chain.[0m

Answer: Tom Hanks has acted in 2 movies.


In [16]:
# Test query 3: Multi-hop reasoning
response = cypher_chain.invoke({"query": "What are the genres of movies that Robert De Niro acted in?"})
print(f"\nAnswer: {response['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Robert De Niro'})-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre) RETURN g.name[0m
Full Context:
[32;1m[1;3m[{'g.name': 'Action'}, {'g.name': 'Crime'}, {'g.name': 'Thriller'}, {'g.name': 'Drama'}, {'g.name': 'Crime'}, {'g.name': 'Drama'}, {'g.name': 'Horror'}, {'g.name': 'Sci-Fi'}][0m

[1m> Finished chain.[0m

Answer: Robert De Niro acted in movies of the following genres: Action, Crime, Thriller, Drama, Horror, Sci-Fi.


## Part 8: Improved Text-to-Cypher with Few-Shot Examples

Few-shot prompting improves Cypher generation accuracy by showing the LLM example patterns.

In [35]:
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# Define example question-query pairs with ESCAPED curly braces in Cypher
examples = [
    {
        "question": "How many artists are there?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)"
    },
    {
        "question": "Which actors played in the movie Casino?",
        # Double curly braces to escape: {{title: 'Casino'}}
        "query": "MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name"
    },
    {
        "question": "How many movies has Tom Hanks acted in?",
        # Escape: {{name: 'Tom Hanks'}}
        "query": "MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)"
    },
    {
        "question": "List all the genres of the movie Schindler's List",
        # Escape: {{title: 'Schindler\\'s List'}}
        "query": "MATCH (m:Movie {{title: 'Schindler\\\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name"
    },
    {
        "question": "Which actors have worked in movies from both the comedy and action genres?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name"
    }
]

# Create example prompt template
example_prompt = PromptTemplate.from_template(
    "User input: {question}\nCypher query: {query}"
)

# Create few-shot prompt with schema included
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="""You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query.

Schema:
{schema}

Here are some examples:""",
    suffix="User input: {question}\nCypher query: ",
    input_variables=["question", "schema"]
)

print("Few-shot prompt template created")
print(few_shot_prompt.format(question="How many artists are there?", schema="foo"))

Few-shot prompt template created
You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query.

Schema:
foo

Here are some examples:

User input: How many artists are there?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)

User input: Which actors played in the movie Casino?
Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name

User input: How many movies has Tom Hanks acted in?
Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)

User input: List all the genres of the movie Schindler's List
Cypher query: MATCH (m:Movie {title: 'Schindler\\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name

User input: Which actors have worked in movies from both the comedy and action genres?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT 

In [36]:
# Create improved chain with few-shot examples
improved_chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    cypher_prompt=few_shot_prompt,
    verbose=True,
    allow_dangerous_requests=True
)

print("Improved GraphCypherQAChain created with few-shot examples")

Improved GraphCypherQAChain created with few-shot examples


In [37]:
# Test improved chain
response = improved_chain.invoke("Which actors played in the movie Casino?")
print(f"\nAnswer: {response['result']}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a:Person)
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Robert De Niro'}, {'a.name': 'Joe Pesci'}, {'a.name': 'Sharon Stone'}, {'a.name': 'James Woods'}][0m

[1m> Finished chain.[0m

Answer: Robert De Niro, Joe Pesci, Sharon Stone, James Woods played in the movie Casino.


## Part 9: Hybrid RAG - Combining Vector Search with Graph Queries

For complex questions, we can combine:
1. Vector similarity search on unstructured text
2. Structured graph traversal on relationships

This gives us the best of both worlds.

In [38]:
# Load and process movie plot data for vector search
from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load documents about movies
loader = WikipediaLoader(query="Film noir", load_max_docs=3)
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
docs = text_splitter.split_documents(documents)

print(f"Loaded {len(documents)} documents, split into {len(docs)} chunks")

Loaded 3 documents, split into 14 chunks


In [39]:
# Create vector store with ChromaDB
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="movie_knowledge",
    embedding_function=embeddings,
    persist_directory="./chroma_movie_db"
)

# Add documents to vector store
vector_store.add_documents(docs)

print("Vector store created and populated")

Vector store created and populated


## Part 10: Agent-Based RAG with LangChain v1.x

Using the new `create_agent` pattern, we'll build an agent that can:
1. Search the vector store for contextual information
2. Query the graph database for structured data
3. Combine both sources to answer complex questions

In [40]:
from langchain.tools import tool
from langchain.agents import create_agent

# Tool 1: Search vector store for contextual information
@tool
def search_movie_context(query: str) -> str:
    """Search for general movie information, history, and context."""
    results = vector_store.similarity_search(query, k=3)
    context = "\n\n".join([doc.page_content for doc in results])
    return context

# Tool 2: Query Neo4j graph for structured data
@tool
def query_movie_graph(question: str) -> str:
    """Query the movie database for specific facts about movies, actors, directors, and genres."""
    try:
        response = cypher_chain.invoke({"query": question})
        return response['result']
    except Exception as e:
        return f"Error querying graph: {str(e)}"

print("Tools defined")

Tools defined


In [41]:
# Create agent with both tools
tools = [search_movie_context, query_movie_graph]

agent = create_agent(
    llm,
    tools=tools,
    system_prompt="""You are a movie expert assistant. You have access to:
    1. A vector search tool for general movie knowledge and context
    2. A graph database tool for specific facts about movies, actors, and directors
    
    Use the appropriate tool based on the question:
    - Use vector search for conceptual questions, themes, and general information
    - Use graph database for specific facts, relationships, and structured queries
    - Combine both when needed for comprehensive answers
    
    Be concise and accurate in your responses."""
)

print("Agent created with hybrid RAG capabilities")

Agent created with hybrid RAG capabilities


In [42]:
# Test 1: Specific graph query
result = agent.invoke({
    "messages": [{"role": "user", "content": "Who directed Casino and who were the main actors?"}]
})

print("\nQuery: Who directed Casino and who were the main actors?")
print(f"Answer: {result['messages'][-1].content}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:DIRECTED]->(m:Movie {title: "Casino"}) 
MATCH (a:Person)-[:ACTED_IN]->(m) 
RETURN p.name, collect(a.name)[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Martin Scorsese', 'collect(a.name)': ['Robert De Niro', 'Joe Pesci', 'Sharon Stone', 'James Woods']}][0m

[1m> Finished chain.[0m

Query: Who directed Casino and who were the main actors?
Answer: **Director:** Martin Scorsese  

**Main Actors:**  
- Robert De Niro  
- Joe Pesci  
- Sharon Stone  
- James Woods


In [43]:
# Test 2: Contextual question requiring vector search
result = agent.invoke({
    "messages": [{"role": "user", "content": "What are the characteristics of film noir as a genre?"}]
})

print("\nQuery: What are the characteristics of film noir as a genre?")
print(f"Answer: {result['messages'][-1].content}")


Query: What are the characteristics of film noir as a genre?
Answer: **Film noir** is less a rigid “genre” than a stylistic and thematic mode that emerged in American cinema in the early‑1940s and peaked through the late‑1950s. Its most widely‑cited characteristics fall into two groups: visual‑style elements and narrative‑theme elements.

| Visual‑style hallmarks | Narrative‑theme hallmarks |
|------------------------|---------------------------|
| **Low‑key, high‑contrast lighting** (chiaroscuro) that creates deep shadows and striking light‑and‑dark patterns. | **Moral ambiguity** – protagonists are often cynical, world‑weary anti‑heroes or reluctant criminals. |
| **Oblique or “Dutch” camera angles** and unconventional framing that heighten disorientation. | **Fatalism & existential dread** – characters feel trapped by fate, circumstance, or a corrupt society. |
| **Urban night settings** – rain‑slick streets, neon signs, fog, smoke‑filled interiors, and cramped alleys. | **Crime‑ce

In [44]:
# Test 3: Hybrid query requiring both tools
result = agent.invoke({
    "messages": [{"role": "user", "content": "Is Casino a film noir? Explain why or why not, and who were involved in making it."}]
})

print("\nQuery: Is Casino a film noir? Explain why or why not, and who were involved in making it.")
print(f"Answer: {result['messages'][-1].content}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino", released: "1995-11-22"}) 
OPTIONAL MATCH (m)<-[:DIRECTED]-(d:Person) 
OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Person) 
OPTIONAL MATCH (m)-[:IN_GENRE]->(g:Genre) 
RETURN m.title, collect(d.name) AS directors, collect(a.name) AS main_cast, collect(g.name) AS genres[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino", released: "1995"})-[:IN_GENRE]->(g:Genre) 
MATCH (m)-[:DIRECTED]->(d:Person) 
MATCH (m)-[:ACTED_IN]->(a:Person) 
RETURN g.name, collect(d.name), collect(a.name)[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino", released: "1995-01-01"}) 
MATCH (p1:Person {name: "Martin Scorsese"}) 
MATCH (p2:Person {name: "Rob

## Part 11: Advanced Pattern - Multi-Step Reasoning with State

For complex queries requiring multiple steps, we can use custom state to track progress.

In [45]:
from typing import TypedDict
from langchain.agents import AgentState

# Define custom state for complex queries
class MovieResearchState(AgentState):
    query: str
    graph_results: str
    vector_results: str
    analysis: str

# Create agent with custom state
research_agent = create_agent(
    llm,
    tools=tools,
    state_schema=MovieResearchState,
    system_prompt="""You are conducting comprehensive movie research.
    Use both tools systematically and store results in the state.
    Provide thorough, well-researched answers."""
)

print("Research agent with custom state created")

Research agent with custom state created


In [46]:
# Complex research query
result = research_agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Research Martin Scorsese's crime films: What patterns do you see in his collaborations and genre choices?"
    }],
    "query": "",
    "graph_results": "",
    "vector_results": "",
    "analysis": ""
})

print("\nResearch Query: Martin Scorsese's crime films")
print(f"Answer: {result['messages'][-1].content}")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre {name: 'Crime'}) 
MATCH (m)<-[:DIRECTED]-(d:Person {name: 'Martin Scorsese'}) 
MATCH (m)<-[:ACTED_IN]-(a:Person)
RETURN m.released.year AS year, collect(a.name) AS actors, 
       'Not available' AS writer, 'Not available' AS composer, 'Not available' AS cinematographer
[0m
Full Context:
[32;1m[1;3m[{'year': 1995, 'actors': ['Robert De Niro', 'Joe Pesci', 'Sharon Stone', 'James Woods'], 'writer': 'Not available', 'composer': 'Not available', 'cinematographer': 'Not available'}, {'year': 1976, 'actors': ['Diahnne Abbott', 'Gino Ardito', 'Frank Adu', 'Victor Argo'], 'writer': 'Not available', 'composer': 'Not available', 'cinematographer': 'Not available'}][0m

[1m> Finished chain.[0m


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)-[:IN_GENRE]->(g:Genre), (p:Person)-[:DIRECTED]->(m) 
WHERE p.name = 'Mart

## Part 12: Performance Comparison - Traditional RAG vs Knowledge Graph RAG

In [47]:
# Test queries that benefit from graph structure
test_queries = [
    "What movies did Robert De Niro and Joe Pesci both act in?",
    "How many movies did Tom Hanks act in?",
    "Who directed the highest rated movie?"
]

print("Comparing Traditional RAG vs Graph RAG:\n")

for query in test_queries:
    print(f"Query: {query}")
    
    # Traditional vector search
    vector_results = vector_store.similarity_search(query, k=1)
    print(f"Traditional RAG: Limited to text chunks, may miss structured relationships")
    
    # Graph query
    graph_result = cypher_chain.invoke({"query": query})
    print(f"Graph RAG: {graph_result['result']}")
    print("-" * 80)
    print()

Comparing Traditional RAG vs Graph RAG:

Query: What movies did Robert De Niro and Joe Pesci both act in?
Traditional RAG: Limited to text chunks, may miss structured relationships


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p1:Person {name: 'Robert De Niro'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(p2:Person {name: 'Joe Pesci'}) RETURN m.title[0m
Full Context:
[32;1m[1;3m[{'m.title': 'Casino'}][0m

[1m> Finished chain.[0m
Graph RAG: Robert De Niro and Joe Pesci both acted in Casino.
--------------------------------------------------------------------------------

Query: How many movies did Tom Hanks act in?
Traditional RAG: Limited to text chunks, may miss structured relationships


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)[0m
Full Context:
[32;1m[1;3m[{'count(m)': 2}][0m

[1m> Finished chain.[0m
Graph RAG: Tom Hank

## Part 13: Key Takeaways

### Why Knowledge Graphs Matter
1. **Direct relationship queries**: No need to infer connections from text
2. **Multi-hop reasoning**: Traverse complex relationships efficiently
3. **Structured knowledge**: Combine with vector search for comprehensive RAG

### When to Use Knowledge Graphs
- Queries about relationships ("who worked with whom")
- Multi-step reasoning ("friends of friends")
- Aggregations ("count movies per genre")
- Path finding ("shortest connection between two people")

### When to Use Vector Search
- Semantic similarity ("find similar concepts")
- Unstructured content ("search documents")
- Fuzzy matching ("approximate search")

### Best of Both Worlds
Combine Knowledge Graphs + Vector Search for:
- Comprehensive context from vectors
- Precise relationships from graphs
- Robust RAG systems that handle diverse queries

### Modern LangChain v1.x Patterns
- Use `create_agent` instead of deprecated chains
- Define tools with `@tool` decorator
- Custom state with `TypedDict` for complex workflows
- Combine multiple data sources in agent tools