https://python.langchain.com/docs/use_cases/graph/quickstart/

**Start the Movie graph vectorDB in neo4j Desktop**

In [1]:
from dotenv import load_dotenv
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.graphs import Neo4jGraph
load_dotenv()
import warnings
warnings.filterwarnings("ignore")

**Load the LLM (e.g: GEMINI 2.5 Flash)**

In [2]:
llm = ChatGoogleGenerativeAI(
    model = os.getenv("MODEL_NAME"),
    temperature = os.getenv("TEMPERATURE"),
    google_api_key=os.getenv("GOOGLE_API_KEY"),
)

**Add Neo4j credentials (These information need to be kept secret)**

In [3]:
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "12345678"
NEO4J_DATABASE = 'neo4j'

In [4]:

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE)

  graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE)


**Print the graph database schema**

In [5]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Movie {imdbRating: FLOAT, taglineEmbedding: LIST, tagline: STRING, released: DATE, title: STRING, id: STRING}
Person {name: STRING}
Genre {name: STRING}
Location {name: STRING}
SimilarMovie {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Movie)-[:WAS_TAKEN_IN]->(:Location)
(:Movie)-[:IS_SIMILAR_TO]->(:SimilarMovie)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


Questions

In [6]:
q_one = "What was the cast of the Casino?"
q_two = "What are the most common genres for movies released in 1995?"
q_three = "What are the similar movies to the ones that Tom Hanks acted in?"

### **Chain**

**`Simple Agent (a)`:**

In [7]:
from langchain.chains import GraphCypherQAChain

chain = GraphCypherQAChain.from_llm(
    graph=graph, 
    llm=llm, 
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_requests=True
)

In [8]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name
[0m
Full Context:
[32;1m[1;3m[{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of the Casino was James Woods, Robert De Niro, Sharon Stone, Joe Pesci.', 'intermediate_steps': [{'query': "cypher\nMATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name\n"}, {'context': [{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}]}]}

LLM response: The cast of the Casino was James Woods, Robert De Niro, Sharon Stone, Joe Pesci.


In [9]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE m.released.year = 1995
RETURN g.name AS genre, count(m) AS movieCount
ORDER BY movieCount DESC
[0m
Full Context:
[32;1m[1;3m[{'genre': 'Comedy', 'movieCount': 10}, {'genre': 'Adventure', 'movieCount': 6}, {'genre': 'Romance', 'movieCount': 5}, {'genre': 'Action', 'movieCount': 5}, {'genre': 'Children', 'movieCount': 4}, {'genre': 'Drama', 'movieCount': 4}, {'genre': 'Crime', 'movieCount': 3}, {'genre': 'Thriller', 'movieCount': 3}, {'genre': 'Fantasy', 'movieCount': 2}, {'genre': 'Animation', 'movieCount': 2}][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': 'The most common genres for movies released in 1995 are Comedy with 10 movies, Adventure with 6 movies, Romance and Action with 5 movies each, Children and Drama with 4 movies each, Crime and Thriller with 3 movies each, and Fanta

In [10]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
MATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)
RETURN sm.name
[0m
Full Context:
[32;1m[1;3m[{'sm.name': 'Finding Nemo'}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer.", 'intermediate_steps': [{'query': "cypher\nMATCH (p:Person)-[:ACTED_IN]->(m:Movie)\nWHERE p.name = 'Tom Hanks'\nMATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)\nRETURN sm.name\n"}, {'context': [{'sm.name': 'Finding Nemo'}]}]}

LLM response: I don't know the answer.


**`Simple Agent (b):`**

**Validating relationship direction**

LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the validate_cypher parameter.

In [11]:
chain = GraphCypherQAChain.from_llm(
    graph=graph, 
    llm=llm, 
    verbose=True, 
    validate_cypher=True,
    return_intermediate_steps=True,
    allow_dangerous_requests=True
)

In [12]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name
[0m
Full Context:
[32;1m[1;3m[{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of the Casino was James Woods, Robert De Niro, Sharon Stone, Joe Pesci.', 'intermediate_steps': [{'query': "cypher\nMATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Casino' RETURN p.name\n"}, {'context': [{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}]}]}

LLM response: The cast of the Casino was James Woods, Robert De Niro, Sharon Stone, Joe Pesci.


In [13]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE m.released.year = 1995
RETURN g.name AS Genre, count(g) AS MovieCount
ORDER BY MovieCount DESC[0m
Full Context:
[32;1m[1;3m[{'Genre': 'Comedy', 'MovieCount': 10}, {'Genre': 'Adventure', 'MovieCount': 6}, {'Genre': 'Romance', 'MovieCount': 5}, {'Genre': 'Action', 'MovieCount': 5}, {'Genre': 'Children', 'MovieCount': 4}, {'Genre': 'Drama', 'MovieCount': 4}, {'Genre': 'Crime', 'MovieCount': 3}, {'Genre': 'Thriller', 'MovieCount': 3}, {'Genre': 'Fantasy', 'MovieCount': 2}, {'Genre': 'Animation', 'MovieCount': 2}][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': 'The most common genres for movies released in 1995 are: Comedy (10), Adventure (6), Romance (5), Action (5), Children (4), Drama (4), Crime (3), Thriller (3), Fantasy (2), and Animation (2).', 'intermediate_steps': [{'query': 'MATCH (m:M

In [14]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
MATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)
RETURN sm.name
[0m
Full Context:
[32;1m[1;3m[{'sm.name': 'Finding Nemo'}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer.", 'intermediate_steps': [{'query': "cypher\nMATCH (p:Person)-[:ACTED_IN]->(m:Movie)\nWHERE p.name = 'Tom Hanks'\nMATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)\nRETURN sm.name\n"}, {'context': [{'sm.name': 'Finding Nemo'}]}]}

LLM response: I don't know the answer.


----------------------------------------

**`Improved Agents`: Contains 4 steps**
1. Detecting entities in the user input
2. Match entities to database.
3. Define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.
4. Generating answers based on database results

### **Strategies to improve graph database query generation by mapping values from user inputs to database**

When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database. Therefore, we can introduce a new step in graph database QA system to accurately map values.

**Detecting entities in the user input**

We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database.

In [15]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda
import json
import re

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Extract all important entities (like person names, movie names, or years) from the user's question. "
               "Return only valid JSON in format: {{\"names\": [\"...\"]}}"),
        ("human", "Question: {question}")
    ]
)

# LLM returned response may have raw json or raw cypher query
# This function will filter the "`" or "json/cypher" from response
def parse_entities_flexible(output):
    if isinstance(output, dict):
        return output

    try:
        # Remove leading/trailing whitespace
        cleaned = output.strip()
        
        # Match and remove triple backtick code block with optional language tag
        code_block_match = re.match(r"^```(?:\s*)(\w+)?(?:\s*)\n?(.*)```$", cleaned, re.DOTALL | re.IGNORECASE)
        if code_block_match:
            language = code_block_match.group(1) or ""
            content = code_block_match.group(2).strip()
        else:
            # Fallback: remove stray backticks if not a full code block
            content = cleaned.strip("`").strip()
            language = ""

        if language.lower() == "json":
            try:
                return json.loads(content)
            except Exception as e:
                print("Error parsing JSON:", e)
                return {"names": []}
        else:
            # For non-JSON (e.g., cypher), just return the content as-is
            return content

    except Exception as e:
        print("Unexpected error:", e)
        return {"names": []}


entity_chain = prompt | llm | StrOutputParser() | RunnableLambda(parse_entities_flexible)

In [16]:
entities_q_two = entity_chain.invoke({"question": q_two})
print(entities_q_two)

entities_q_three = entity_chain.invoke({"question": q_three})
print(entities_q_three)

{'names': ['1995']}
{'names': ['Tom Hanks']}


**Utilizing a simple CONTAINS clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings.**

In [17]:
match_query = """MATCH (p:Person|Movie)
WHERE p.name CONTAINS $value OR p.title CONTAINS $value
RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type
LIMIT 1
"""

def map_to_database(values)->str:
    """
    Maps the values to entities in the database and returns the mapping information.

    Args:
        values (list): A list of values to map to entities in the database.

    Returns:
        str: A string containing the mapping information of each value to entities in the 
    """
    result = ""
    for entity in values["names"]:
        response = graph.query(match_query, {"value": entity})
        try:
            result += f"{entity} maps to {response[0]['result']} {response[0]['type']} in database\n"
        except IndexError:
            pass
    return result

In [18]:
print("2:", map_to_database(entities_q_two))
print("3:", map_to_database(entities_q_three))

2: 
3: Tom Hanks maps to Tom Hanks Person in database



**Custom Cypher generating chain**

We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement. We will be using the LangChain expression language to accomplish that.

In [19]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Generate Cypher statement based on natural language input
cypher_template = """Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Entities in the question map to the following database values:
{entities_list}
Question: {question}
Cypher query:"""  # noqa: E501

cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question, convert it to a Cypher query. No pre-amble.",
        ),
        ("human", cypher_template),
    ]
)

cypher_response = (
    RunnablePassthrough.assign(names=entity_chain)
    | RunnablePassthrough.assign(
        entities_list=lambda x: map_to_database(x),
        schema=lambda _: graph.get_schema,
    )
    | cypher_prompt
    | llm.bind(stop=["\nCypherResult:"])
    | StrOutputParser()
    | parse_entities_flexible
)

In [20]:
cypher_q_three = cypher_response.invoke({"question": q_three})
print(cypher_q_three)


MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
MATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)
RETURN DISTINCT sm.name


**Generating answers based on database results**

Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer. Again, we will be using LCEL

In [21]:
from langchain.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema

# Cypher validation tool for relationship directions
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in graph.structured_schema.get("relationships")
]
cypher_validation = CypherQueryCorrector(corrector_schema)

# Generate natural language response based on database results
response_template = """Based on the the question, Cypher query, and Cypher response, write a natural language response:
Question: {question}
Cypher query: {query}
Cypher Response: {response}"""

response_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and Cypher response, convert it to a natural"
            " language answer. No pre-amble.",
        ),
        ("human", response_template),
    ]
)

chain = (
    RunnablePassthrough.assign(query=cypher_response)
    | RunnablePassthrough.assign(
        response=lambda x: graph.query(cypher_validation(x["query"])),
    )
    | response_prompt
    | llm
    | StrOutputParser()
)

  cypher_validation = CypherQueryCorrector(corrector_schema)


In [22]:
chain.invoke({"question": q_one})

'The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'

In [23]:
chain.invoke({"question": q_two})

'For movies released in 1995, the most common genres were Comedy with 10 movies, followed by Adventure with 6 movies, and Romance and Action each with 5 movies. Other popular genres included Children and Drama (4 movies each), Crime and Thriller (3 movies each), Fantasy and Animation (2 movies each), and Horror (1 movie).'

In [24]:
chain.invoke({"question": q_three})

'The similar movies to the ones that Tom Hanks acted in include Finding Nemo.'

In [25]:
chain.invoke({"question": "How many of the movies have the Action genre?"})

'There are 5 movies with the Action genre.'

Exercise:

In [None]:
chain.invoke({"question": "From the movies that were taken in United States, how many had the comedy genre?"})