<a href="https://colab.research.google.com/github/that1guy15/ai_notebooks/blob/main/GraphRAG_101.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Source: https://medium.com/data-science-in-your-pocket/graphrag-using-langchain-31b1ef8328b9

In [1]:
%pip install --upgrade --quiet  json-repair networkx langchain-core langchain-openai langchain-experimental langchain-community

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.7/399.7 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.5/51.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m206.9/206.9 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m375.6/375.6 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Initialize your LLM object & reference text

Use any SOTA chat LLM for best results as Knowledge Graph creation is a complicated task.

In [2]:
import os
from pprint import pprint as pp
from google.colab import userdata
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import OpenAI, ChatOpenAI
import networkx as nx
from langchain.chains import GraphQAChain
from langchain_core.documents import Document
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph


os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
os.environ["LANGCHAIN_API_KEY"] = userdata.get("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "GraphRAG_101"

open_api_key = os.environ.get("OPEN_API_KEY")
langchain_api_key = os.environ.get("LANGCHAIN_API_KEY")

In [3]:
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0, max_tokens=4000)
# llm.client.api_type = "chat"

text = """
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""

Load this text as `GraphDocuments` and create a `GraphTransformer` object using the LLM-loaded

In [4]:
documents = [Document(page_content=text)]
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(documents)

# Create the Knowledge Graph.

For this, you better provide a list of entities and relationships you wish to extract else LLM might identify everything as an entity or relationship

In [5]:
llm_transformer_filtered = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=["Person", "Country", "Organization"],
    allowed_relationships=["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
)
graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
    documents
)

This creates a graph with nodes defined from `allowed_nodes` and relationships from `allowed_relationships`.

In [6]:
pp(graph_documents_filtered[0].nodes)
pp(graph_documents_filtered[0].relationships)

[Node(id='Marie Curie', type='Person', properties={}),
 Node(id='Pierre Curie', type='Person', properties={}),
 Node(id='University Of Paris', type='Organization', properties={}),
 Node(id='Poland', type='Country', properties={}),
 Node(id='France', type='Country', properties={})]
[Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='Poland', type='Country', properties={}), type='NATIONALITY', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='France', type='Country', properties={}), type='NATIONALITY', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='Pierre Curie', type='Person', properties={}), type='SPOUSE', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='University Of Paris', type='Organization', properties={}), type='WORKED_AT', properties={})]


If you want an llm to define the graph nodes and relationships you can just pass the LLM object `llm` and let it discover the schema.

This is great for discovering a graph when you are not familar with the data or schema. Be aware, the Graph schema can change for numerous reasons and you need to handle these possible changes accordingly.

In [7]:
llm_transformer_discovered = LLMGraphTransformer(
    llm=llm
)
graph_documents_discovered = llm_transformer_filtered.convert_to_graph_documents(
    documents
)

In [8]:
pp(graph_documents_discovered[0].nodes)
pp(graph_documents_discovered[0].relationships)

[Node(id='Marie Curie', type='Person', properties={}),
 Node(id='Pierre Curie', type='Person', properties={}),
 Node(id='Poland', type='Country', properties={}),
 Node(id='France', type='Country', properties={}),
 Node(id='University Of Paris', type='Organization', properties={})]
[Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='Poland', type='Country', properties={}), type='NATIONALITY', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='France', type='Country', properties={}), type='NATIONALITY', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='Pierre Curie', type='Person', properties={}), type='SPOUSE', properties={}),
 Relationship(source=Node(id='Marie Curie', type='Person', properties={}), target=Node(id='University Of Paris', type='Organization', properties={}), type='WORKED_AT', properties={})]


# Create Networkx Graph

In [9]:
graph = NetworkxEntityGraph()

# Add nodes to the graph
for node in graph_documents_filtered[0].nodes:
    graph.add_node(node.id)

# Add edges to the graph
for edge in graph_documents_filtered[0].relationships:
    graph._graph.add_edge(
            edge.source.id,
            edge.target.id,
            relation=edge.type,
        )

# Create GraphQAChain

In [10]:
chain = GraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    verbose=True
)

In [11]:
question = """Who is Marie Curie?"""
chain.run(question)

  chain.run(question)


Entities Extracted:
[32;1m[1;3mMarie Curie[0m
Full Context:
[32;1m[1;3mMarie Curie NATIONALITY Poland
Marie Curie NATIONALITY France
Marie Curie SPOUSE Pierre Curie
Marie Curie WORKED_AT University Of Paris[0m

[1m> Finished chain.[0m


'Marie Curie was a Polish and French scientist known for her pioneering research on radioactivity. She was married to Pierre Curie and worked at the University of Paris.'

# Using GraphIndexCreator

Another approach is to use GraphIndexCreator in LangChain which is very similar to the above approach

In [13]:
from langchain.indexes import GraphIndexCreator
from langchain.chains import GraphQAChain

index_creator = GraphIndexCreator(llm=llm)
graph = index_creator.from_text(text)

chain = GraphQAChain.from_llm(llm, graph=graph, verbose=True)
chain.run("What did Pierre Curie won?")



Entities Extracted:
[32;1m[1;3mPierre Curie[0m
Full Context:
[32;1m[1;3mPierre Curie was a co-winner of Marie Curie's first Nobel Prize[0m

[1m> Finished chain.[0m


'Pierre Curie won a Nobel Prize.'