# GraphRAG with MongoDB and LangChain

This notebook is a companion to the **Graph Retrieval-Augmented Generation (GraphRAG)** page. Refer to the page for set-up instructions and detailed explanations.

This notebook walks you through a **GraphRAG implementation using MongoDB Atlas**, leveraging **Atlas Search** and **LangChain's MongoDBGraphStore**. Unlike traditional vector-based RAG, GraphRAG enhances retrieval by structuring knowledge as a graph, allowing for **relationship-aware retrieval and multi-hop reasoning**.

## What You'll Learn

- **Automatically construct a knowledge graph** from documents using an LLM.
- **Store and query entity relationships** within MongoDB.
- **Retrieve context-aware responses** by combining knowledge graph traversal with LLM-generated answers.

By the end of this notebook, you will have a working **GraphRAG implementation** that improves accuracy and explainability in retrieval-augmented generation systems.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/use-cases/graphrag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb

You should consider upgrading via the '/Users/thibaut.gourdel/Documents/jupyterlab/myenv/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Prerequisites

Before you begin, make sure you have the following set up:

- An Atlas cluster up and running (you'll need the [connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/))
- An OpenAI API key to use GPT-4o as the LLM  *(You can switch the chat model using [LangChain integrations](https://python.langchain.com/docs/integrations/chat/))*

In [2]:
import os

ATLAS_CONNECTION_STRING = ("<connection-string>")
ATLAS_DB_NAME = "documents"
ATLAS_COLLECTION = "wikipedia"

os.environ["OPENAI_API_KEY"] = "<openai-api-key>"

In [3]:
from pymongo import MongoClient

# Connect to your local Atlas deployment or Atlas Cluster
client = MongoClient(ATLAS_CONNECTION_STRING)

# Select the sample_airbnb.listingsAndReviews collection
collection = client[ATLAS_DB_NAME][ATLAS_COLLECTION]

In [58]:
from langchain_openai import OpenAI
from langchain.chat_models import init_chat_model

# Set up LLM
# We strongly recommend using the best models such as gpt-40, claude sonnet 3.5+, etc for best results
chatModel = init_chat_model("gpt-4o", model_provider="openai", temperature=0)

## Load Data from Wikipedia

Wikipedia is a rich source of unstructured information. Using the LangChain Wikipedia loader, you can fetch multiple pages for a given query. 

Unlike traditional vector-based RAG, which struggles to capture relationships across scattered content, GraphRAG links entities and concepts across pages—making Wikipedia an ideal use case for demonstrating how GraphRAG connects the dots for deeper insights.

In [44]:
from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter

# Load Wikipedia pages corresponding to the query "Large Language Models"
wikipedia_pages = WikipediaLoader(query="Sherlock Holmes", load_max_docs=3).load()
len(wikipedia_pages)

1

In [45]:
# Split the documents into chunks for efficient downstream processing (graph creation)
text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
wikipedia_docs = text_splitter.split_documents(wikipedia_pages)

# Print the first document
# wikipedia_docs[0]

## Create and load the knowledge graph in MongoDB

MongoDB stores graph-like structures by using **document references** within collections. Each document acts as a **node (entity)**, and relationships are represented by fields that reference **other documents**, forming **edges** in the graph.

With MongoDB’s `$graphLookup` aggregation stage used behind the scene by the MongoDBGraphStore class, you can perform **recursive traversals** to query related entities, enabling efficient **graph-style queries** directly within the database.

In [47]:
from langchain_mongodb.graphrag.graph import MongoDBGraphStore

# Set up a MongoDBGraphStore to point to the collection storing the graph-like structure
# Also provide the LLM model used for entity extraction for both knowledge graph creation and query entity extraction
store = MongoDBGraphStore(connection_string=ATLAS_CONNECTION_STRING, database_name=ATLAS_DB_NAME, collection_name=ATLAS_COLLECTION, entity_extraction_model=chatModel )

# Extract entity and create Knowledge graph and load into MongoDB
store.add_documents(wikipedia_docs)

[BulkWriteResult({'writeErrors': [], 'writeConcernErrors': [], 'nInserted': 0, 'nUpserted': 7, 'nMatched': 0, 'nModified': 0, 'nRemoved': 0, 'upserted': [{'index': 0, '_id': 'Sherlock Holmes'}, {'index': 1, '_id': 'Arthur Conan Doyle'}, {'index': 2, '_id': 'Dr. John H. Watson'}, {'index': 3, '_id': 'Joseph Bell'}, {'index': 4, '_id': 'Sir Henry Littlejohn'}, {'index': 5, '_id': 'C. Auguste Dupin'}, {'index': 6, '_id': 'Monsieur Lecoq'}]}, acknowledged=True)]

## Visualize the knowledge graph

To **visualize the knowledge graph**, you can export the structured data to visualization libraries like pyvis.

This makes it easy to explore and understand the relationships and hierarchies within your data.

In [55]:
import networkx as nx
from pyvis.network import Network
from pymongo import MongoClient

client = MongoClient(ATLAS_CONNECTION_STRING)
collection = client[ATLAS_DB_NAME][ATLAS_COLLECTION]

docs = list(collection.find())

# Function to convert attributes dictionary to a display string
def format_attributes(attributes):
    if not attributes:
        return ""
    parts = []
    for key, values in attributes.items():
        parts.append(f"{key}: {', '.join(values)}")
    return "<br>".join(parts)

# Create a NetworkX graph
G = nx.DiGraph()

# Add nodes with their attributes
for doc in docs:
    node_id = doc["_id"]
    # Combine document type and its attributes (if any) for the hover tooltip
    node_info = f"Type: {doc.get('type', '')}"
    if "attributes" in doc:
        attr_str = format_attributes(doc["attributes"])
        if attr_str:
            node_info += "<br>" + attr_str
    G.add_node(node_id, title=node_info, label=node_id)

# Add edges based on relationships
for doc in docs:
    source = doc["_id"]
    rels = doc.get("relationships", {})
    target_ids = rels.get("target_ids", [])
    rel_types = rels.get("types", [])
    rel_attrs = rels.get("attributes", [])
    
    # Ensure all three lists have the same length
    for i in range(len(target_ids)):
        target = target_ids[i]
        edge_type = rel_types[i] if i < len(rel_types) else ""
        # Get edge attributes info if available
        extra_attr = {}
        if i < len(rel_attrs) and rel_attrs[i]:
            extra_attr = rel_attrs[i]
        edge_info = f"Relationship: {edge_type}"
        if extra_attr:
            edge_info += "<br>" + format_attributes(extra_attr)
        # Add the edge with title attribute for hover
        G.add_edge(source, target, title=edge_info, label=edge_type)

# Create and show the network using pyvis
net = Network(height="550px", width="100%", notebook=True, directed=True)
net.from_nx(G)
net.show("graph.html")

graph.html


## LLM-based Question Answering with Graph Retrieval (GraphRAG)

The `MongoDBGraphStore` class offers a convenient `chat_response` method that enables LLM-based question answering grounded in graph data. This method retrieves relevant entities and relationships from the graph based on the user query to generate accurate and context-aware responses.

Use this for building intelligent assistants that leverage structured knowledge graphs for enhanced understanding and retrieval.

In [54]:
query = "Who inspired Sherlock Holmes?"

answer = store.chat_response(query)
answer.content

'Sherlock Holmes was inspired by Dr. Joseph Bell, a lecturer at the University of Edinburgh, known for his keen observational skills and logical reasoning, which greatly influenced Arthur Conan Doyle, the creator of Sherlock Holmes.'