# graph-nd
__Knowledge in Graphs not Documents!__

1. GraphRAG in 4 lines of code in 5 minutes. No Cypher necessary.
2. Designed to extend to production - not just a demo.
3. Easily merges mixed structured & unstructured data.

In [1]:
from dotenv import load_dotenv
import os

load_dotenv('.env', override=True)

uri = os.getenv('NEO4J_URI')
username = os.getenv('NEO4J_USERNAME')
password = os.getenv('NEO4J_PASSWORD')

In [3]:
import os

from graphrag import  GraphRAG
from neo4j import GraphDatabase
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

db_client = GraphDatabase.driver(uri, auth=(username, password))
embedding_model = OpenAIEmbeddings(model='text-embedding-ada-002')
llm = ChatOpenAI(model="gpt-4o", temperature=0.0)


# Instantiate graph
graphrag = GraphRAG(db_client, llm, embedding_model)

# 1) Get the graph schema. Can also define exactly via json/pydantic spec
graphrag.schema.infer("a simple graph of hardware components "
                      "where components (with id, name, and description properties)  "
                      "can be types of or inputs to other components.")

# 2) Merge data. Can also directly merge node & rel records extracted else where
graphrag.data.merge_csvs(['component-types.csv', 'component-input-output.csv']) # structured data
graphrag.data.merge_pdf('component-catalog.pdf') #unstructured

# 3) GraphRAG agent for better answers.
graphrag.agent("what sequence of components depend on silicon wafers?")

[Schema] Generated schema:
 {
    "description": "A simple graph schema for hardware components and their relationships.",
    "nodes": [
        {
            "description": "Represents a hardware component with an id, name, and description.",
            "id": {
                "description": "",
                "name": "id",
                "type": "STRING"
            },
            "label": "Component",
            "properties": [
                {
                    "description": "",
                    "name": "name",
                    "type": "STRING"
                },
                {
                    "description": "",
                    "name": "description",
                    "type": "STRING"
                }
            ],
            "searchFields": [
                {
                    "description": "Semantic search field for the component name.",
                    "name": "name_textembedding",
                    "type": "TEXT_EMBEDDING",
             

Extracting entities from text: 100%|██████████| 8/8 [00:49<00:00,  6.22s/it]
Merging entities from text: 100%|██████████| 8/8 [00:23<00:00,  2.90s/it]



what sequence of components depend on silicon wafers?
Tool Calls:
  node_search (call_YHZMf0kc1g8BTCi8Z2DGZxhI)
 Call ID: call_YHZMf0kc1g8BTCi8Z2DGZxhI
  Args:
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
    search_query: silicon wafers
Name: node_search

[
    {
        "id": "N26",
        "name": "Wafer",
        "description": "Silicon wafers are the basic building block for chip production. To produce them, a furnace forms a cylinder of silicon (or other semiconducting materials), which is then cut into disc-shaped wafers. These wafers are then processed, split and packaged into individual chips. Most wafers are made purely of silicon or another material, but others have more complex structures. Dopants, such as boron, aluminum, phosphorous, platinum or other elements, may be added to alter the level of semiconductivity. 300 mm wafers, produced by Japanese, Taiwanese, German, and Korean firms, are used to produce nearly all adv

In [8]:
graphrag.agent("what sequence of components depend on silicon wafers?")


what sequence of components depend on silicon wafers?
Tool Calls:
  node_search (call_fX0u7489aIWKch8e2qViWDHL)
 Call ID: call_fX0u7489aIWKch8e2qViWDHL
  Args:
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
    search_query: silicon wafers
Name: node_search

[
    {
        "id": "N26",
        "name": "Wafer",
        "description": "Silicon wafers are the basic building block for chip production. To produce them, a furnace forms a cylinder of silicon (or other semiconducting materials), which is then cut into disc-shaped wafers. These wafers are then processed, split and packaged into individual chips. Most wafers are made purely of silicon or another material, but others have more complex structures. Dopants, such as boron, aluminum, phosphorous, platinum or other elements, may be added to alter the level of semiconductivity. 300 mm wafers, produced by Japanese, Taiwanese, German, and Korean firms, are used to produce nearly all adv

In [9]:
graphrag.agent("can you describe what gpus do?")


can you describe what gpus do?
Tool Calls:
  node_search (call_PoXCaQ0jNHPQbVeDKC0E3yiy)
 Call ID: call_PoXCaQ0jNHPQbVeDKC0E3yiy
  Args:
    search_config: {'search_type': 'SEMANTIC', 'node_label': 'Component', 'search_prop': 'name'}
    search_query: GPU
Name: node_search

[
    {
        "id": "N2",
        "name": "Logic chip design: Discrete GPUs",
        "description": "Discrete graphics processing units (\"GPUs\") have long been used for graphics processing (for example, in video game consoles) and in the last decade have become the most used chip for training artificial intelligence algorithms. The United States monopolizes the design market for GPUs, including standalone \"discrete GPUs,\" the most powerful GPUs.",
        "search_score": 0.910675048828125
    },
    {
        "id": "N78",
        "name": "Testing",
        "description": "Chips undergo tests requiring a range of specialized equipment, including system-on-a-chip test tools, linear and discrete devices, burn-i

In [10]:
graphrag.agent("what components have the most input?")


what components have the most input?
Tool Calls:
  aggregate (call_ERuYm1rHjgUWYRZPYepfC302)
 Call ID: call_ERuYm1rHjgUWYRZPYepfC302
  Args:
    agg_instructions: Aggregate the components based on the number of 'INPUT_TO' relationships they have, and return the components with the most 'INPUT_TO' relationships.
Running Query:
MATCH (c:Component)-[:INPUT_TO]->(:Component)
WITH c, COUNT(*) AS inputCount
RETURN c.id, c.name, inputCount
ORDER BY inputCount DESC
Name: aggregate

[
    {
        "c.id": "N91",
        "c.name": "Electronic gases",
        "inputCount": 5
    },
    {
        "c.id": "N59",
        "c.name": "Wafer and photomask handling",
        "inputCount": 4
    },
    {
        "c.id": "N60",
        "c.name": "Process control",
        "inputCount": 4
    },
    {
        "c.id": "N85",
        "c.name": "Core intellectual property",
        "inputCount": 4
    },
    {
        "c.id": "N84",
        "c.name": "Electronic design automation software",
        "inputCou

In [7]:
# export and look at graph schema
graphrag.schema.export("graphrag-schema.json")

[Schema] Schema successfully exported to graphrag-schema.json


## Clean up

In [13]:
# drop everything in the graph (nodes, rels, indexes,...everything)
graphrag.data.nuke()