# graph-nd
__Knowledge in Graphs not Documents!__

1. GraphRAG in 4 lines of code in 5 minutes. No Cypher necessary.
2. Designed to extend to production - not just a demo.
3. Easily merges mixed structured & unstructured data.

In [8]:
from dotenv import load_dotenv
import os

load_dotenv('.env', override=True)

uri = os.getenv('NEO4J_URI')
username = os.getenv('NEO4J_USERNAME')
password = os.getenv('NEO4J_PASSWORD')

In [9]:
import os

from graph_nd import GraphRAG
from neo4j import GraphDatabase
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

db_client = GraphDatabase.driver(uri, auth=(username, password))
embedding_model = OpenAIEmbeddings(model='text-embedding-ada-002')
llm = ChatOpenAI(model="gpt-4o", temperature=0.0)


# Instantiate graph
graphrag = GraphRAG(db_client, llm, embedding_model)

# 1) Get the graph schema. Can also define exactly via json/pydantic spec with .define method
graphrag.schema.infer("a simple graph of hardware components "
                      "where components (with id, name, and description properties)  "
                      "can be types of or inputs to other components.")

# 2) Merge data. Can also directly merge node & rel records extracted else where
graphrag.data.merge_csvs(['component-types.csv', 'component-input-output.csv']) # structured data
graphrag.data.merge_pdf('component-catalog.pdf') #unstructured

# 3) GraphRAG agent for better answers.
graphrag.agent("what sequence of components depend on silicon wafers?")

[Schema] Generated schema:
 {
    "description": "A simple graph schema for hardware components and their relationships.",
    "nodes": [
        {
            "description": "Represents a hardware component with an id, name, and description.",
            "id": {
                "description": "",
                "name": "id",
                "type": "STRING"
            },
            "label": "Component",
            "properties": [
                {
                    "description": "",
                    "name": "name",
                    "type": "STRING"
                },
                {
                    "description": "",
                    "name": "description",
                    "type": "STRING"
                }
            ],
            "searchFields": [
                {
                    "description": "Semantic search field for the component's name.",
                    "name": "name_textembedding",
                    "type": "TEXT_EMBEDDING",
           

Extracting entities from text: 100%|██████████| 8/8 [00:10<00:00,  1.35s/it]
Merging entities from text: 100%|██████████| 8/8 [00:22<00:00,  2.82s/it]



what sequence of components depend on silicon wafers?
Tool Calls:
  node_search (call_NQNd0dDWHvMO9k1sB5LmakVQ)
 Call ID: call_NQNd0dDWHvMO9k1sB5LmakVQ
  Args:
    search_query: silicon wafers
    top_k: 5
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
Name: node_search

[
    {
        "id": "N26",
        "name": "Wafer",
        "description": "Silicon wafers are the basic building block for chip production. To produce them, a furnace forms a cylinder of silicon (or other semiconducting materials), which is then cut into disc-shaped wafers. These wafers are then processed, split and packaged into individual chips. Most wafers are made purely of silicon or another material, but others have more complex structures. Dopants, such as boron, aluminum, phosphorous, platinum or other elements, may be added to alter the level of semiconductivity. 300 mm wafers, produced by Japanese, Taiwanese, German, and Korean firms, are used to produce n

In [10]:
graphrag.agent("what components have the most input?")


what components have the most input?
Tool Calls:
  aggregate (call_GTKiwNEzsDvZJIHTTwpY5tSk)
 Call ID: call_GTKiwNEzsDvZJIHTTwpY5tSk
  Args:
    agg_instructions: Aggregate the components based on the number of INPUT_TO relationships they have, and return the components with the most INPUT_TO relationships.
Running Query:
MATCH (c:Component)-[:INPUT_TO]->(:Component)
WITH c, COUNT(*) AS inputCount
RETURN c.id, c.name, inputCount
ORDER BY inputCount DESC
Name: aggregate

[
    {
        "c.id": "N91",
        "c.name": "Electronic gases",
        "inputCount": 5
    },
    {
        "c.id": "N59",
        "c.name": "Wafer and photomask handling",
        "inputCount": 4
    },
    {
        "c.id": "N60",
        "c.name": "Process control",
        "inputCount": 4
    },
    {
        "c.id": "N85",
        "c.name": "Core intellectual property",
        "inputCount": 4
    },
    {
        "c.id": "N84",
        "c.name": "Electronic design automation software",
        "inputCount":

In [11]:
graphrag.agent("can you describe what gpus do?")


can you describe what gpus do?
Tool Calls:
  node_search (call_lsNuQwwCtwCk8Jaq8VQc1yeh)
 Call ID: call_lsNuQwwCtwCk8Jaq8VQc1yeh
  Args:
    search_query: GPU
    top_k: 1
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
Name: node_search

[
    {
        "id": "N2",
        "name": "Logic chip design: Discrete GPUs",
        "description": "Discrete graphics processing units (\"GPUs\") have long been used for graphics processing (for example, in video game consoles) and in the last decade have become the most used chip for training artificial intelligence algorithms. The United States monopolizes the design market for GPUs, including standalone \"discrete GPUs,\" the most powerful GPUs.",
        "search_score": 0.910675048828125
    }
]

A GPU, or Graphics Processing Unit, is a specialized hardware component primarily used for graphics processing. Discrete GPUs, in particular, are standalone units that have been traditionally used in a

## Saving & Reloading GraphSchema
You can also `.export` & `.load` the schema to/from json files allowing you to easily save, reload, iterate, and version control the schema. This allows you to make custom edits as well.

Note: You can also use the `schema.define` method to specify the schema exactly passing a Pydantic GraphSchema object

In [12]:
# export and look at graph schema
graphrag.schema.export("your-graphrag-schema.json")

[Schema] Schema successfully exported to your-graphrag-schema.json


In [13]:
new_graphrag = GraphRAG(db_client, llm, embedding_model)
new_graphrag.schema.load("your-graphrag-schema.json")
new_graphrag.agent("can you describe what gpus do?")

[Schema] Schema successfully loaded from your-graphrag-schema.json

can you describe what gpus do?
Tool Calls:
  node_search (call_99hXAIoo7JKRqCCqCyTqKuwv)
 Call ID: call_99hXAIoo7JKRqCCqCyTqKuwv
  Args:
    search_query: GPU
    top_k: 5
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'description'}
Name: node_search

[
    {
        "id": "N2",
        "name": "Logic chip design: Discrete GPUs",
        "description": "Discrete graphics processing units (\"GPUs\") have long been used for graphics processing (for example, in video game consoles) and in the last decade have become the most used chip for training artificial intelligence algorithms. The United States monopolizes the design market for GPUs, including standalone \"discrete GPUs,\" the most powerful GPUs.",
        "search_score": 0.9160003662109375
    },
    {
        "id": "N4",
        "name": "Logic chip design: AI ASICs",
        "description": "Application-specific integrated

## Merge Nodes & Rels Directly
For cases where you need to control the mapping yourself (instead of relying on the LLM in GraphRAG), you can format your own node and relationship dict records and merge directly via the `data.merge_nodes` and `data.merge_relationships` methods.

In [None]:
##TODO Show direct NODE/REL Merges
# graphrag.data.merge_nodes(label:str, records: List[Dict], source_metadata: Union[bool, Dict[str, Any]] = True)

# graphrag.data.merge_relationships(rel_type:str, start_node_label:str, end_node_label: str, records: List[Dict],
#                                 source_metadata: Union[bool, Dict[str, Any]] = True)

In [None]:
##TODO Remove functionality by source and remove specific nodes & rels

In [None]:
##TODO run read query

## Create React Agent Factory

You can create a Langgraph agent with prebuilt GraphRAG and memory store/checkpointers.
Think of this as an Agent with "knowledge" -> an agent that has an embedded "left brain" knowledge graph

![Agent with Knowledge](img/kg_agent.png)


In [14]:
from langchain_core.messages import HumanMessage

#create langgraph agent
agent = graphrag.create_react_agent()

# use just like any other langgraph agent
config = {"configurable": {"thread_id": "thread-1"}}

for step in agent.stream(
    {"messages": [HumanMessage(content="what sequence of components depend on silicon wafers?")]},
    stream_mode="values", config=config
):
    step["messages"][-1].pretty_print()


what sequence of components depend on silicon wafers?
Tool Calls:
  node_search (call_KTQWyiBuEkyFC3cI8x2hquZq)
 Call ID: call_KTQWyiBuEkyFC3cI8x2hquZq
  Args:
    search_query: silicon wafers
    top_k: 1
Name: node_search

Error: 1 validation error for node_search
search_config
  Field required [type=missing, input_value={'search_query': 'silicon wafers', 'top_k': 1}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
 Please fix your mistakes.
Tool Calls:
  node_search (call_0xCj89OSTWLYIvYYei5aCilZ)
 Call ID: call_0xCj89OSTWLYIvYYei5aCilZ
  Args:
    search_query: silicon wafers
    top_k: 1
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
Name: node_search

[
    {
        "id": "N26",
        "name": "Wafer",
        "description": "Silicon wafers are the basic building block for chip production. To produce them, a furnace forms a cylinder of silicon (or other semiconducting materials), wh

### Add Additional Tools
You can add any additional tools you would like.

In [15]:
import getpass
from langchain_community.tools.tavily_search import TavilySearchResults

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("Tavily API key:\n")

web_search = TavilySearchResults(max_results=3)
#create langgraph agent
agent = graphrag.create_react_agent(tools=[web_search])

 # use just like any other langgraph agent
config = {"configurable": {"thread_id": "thread-1"}}

for step in agent.stream(
    {"messages": [HumanMessage(content="what sequence of components depend on silicon wafers? and what companies may be involved?")]},
    stream_mode="values", config=config
):
    step["messages"][-1].pretty_print()


what sequence of components depend on silicon wafers? and what companies may be involved?
Tool Calls:
  node_search (call_pPj0tQIBIyXlwGMrd165yNz3)
 Call ID: call_pPj0tQIBIyXlwGMrd165yNz3
  Args:
    search_query: silicon wafer
    top_k: 5
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
Name: node_search

[
    {
        "id": "N26",
        "name": "Wafer",
        "description": "Silicon wafers are the basic building block for chip production. To produce them, a furnace forms a cylinder of silicon (or other semiconducting materials), which is then cut into disc-shaped wafers. These wafers are then processed, split and packaged into individual chips. Most wafers are made purely of silicon or another material, but others have more complex structures. Dopants, such as boron, aluminum, phosphorous, platinum or other elements, may be added to alter the level of semiconductivity. 300 mm wafers, produced by Japanese, Taiwanese, German, and 

In [None]:
#TODO: Show Example with MCP via https://github.com/langchain-ai/langchain-mcp-adapters

## Clean up

In [7]:
# drop everything in the graph (nodes, rels, indexes,...everything)
graphrag.data.nuke()