# End-to-End TigerGraph GraphRag for Document Question Answering

This notebook demostrates how to use TigerGraph Graph-RAG (currently in Beta), an AI assistant for your TigerGraph databases. TigerGraph Graph-RAG enables you to ask questions in natural language about your document data stored in TigerGraph and get answers in a human-readable format. GraphRAG is a graph-based retrieval-augmented generation approach that is used to answer questions about the document data stored in TigerGraph. TigerGraph Graph-RAG is built to help users get started with GraphRAG and to provide a seamless experience for users to interact with their document data within TigerGraph.

## Setup Environment


* Follow [Docker setup ](https://github.com/tigergraph/ecosys/blob/master/demos/guru_scripts/docker/README.md) to set up your docker Environment.
* Please follow (Overview of installing Docker Compose)[https://docs.docker.com/compose/install/] to install Docker Compose for your platform accordingly.


#### TigerGraph Docker Image

To use TigerGraph Community Edition without a license key, download the corresponding docker image from https://dl.tigergraph.com/ and load to Docker:
```
docker load -i ./tigergraph-4.2.0-community-docker-image.tar.gz
docker images
```

You should be able to find `tigergraph/community:4.2.0` in the image list.

#### Graph-RAG Docker Images

The following images are also needed for TigerGraph Graph-RAG. Docker Compose will automatically download them, but you can download them manually if preferred:

```
docker pull <image_name>

tigergraph/graphrag:latest
tigergraph/ecc:latest
tigergraph/chat-history:latest
tigergraph/graphrag-ui:latest
nginx:latest
```

### Deploy Graph-RAG with Docker Compose
#### Get docker-compose file
Download the [docker-compose.yml](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/graphrag/docker-compose.yml) file directly

The Docker Compose file contains all dependencies for Graph-RAG including a TigerGraph database. If you want to use a separate TigerGraph instance, you can comment out the `tigergraph` section from the docker compose file and restart all services. However, please follow the instructions below to make sure your standalone TigerGraph server is accessible from other Graph-RAG containers.

#### Set up configurations

Next, download the following configuration files and put them in a `configs` subdirectory of the directory contains the Docker Compose file:
* [configs/db_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/graphrag/configs/db_config.json)
* [configs/llm_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/graphrag/configs/llm_config.json)
* [configs/chat_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/graphrag/configs/chat_config.json)
* [configs/nginx.conf](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/graphrag/configs/nginx.conf)

#### Adjust configurations

Edit `configs/llm_config.json` and replace `<YOUR_OPENAI_API_KEY>` to your own OPENAI_API_KEY. 
 
> If desired, you can also change the model to be used for the embedding service and completion service to your preferred models to adjust the output from the LLM service.

#### Start all services

Now, simply run `docker compose up -d` and wait for all the services to start.

## Build GraphRAG From Scratch

If you want to experience the whole process of Graph-RAG, you can build the GraphRAG from scratch. However, please review the LLM model and service setting carefully because it will cost some money to re-generate embedding and data structure for the raw data.

#### Step 1: Database Connection Creation

In [None]:
import os
from pyTigerGraph import TigerGraphConnection
from dotenv import load_dotenv

load_dotenv()
# We first create a connection to the database
host = "http://192.168.11.12"
username = os.getenv("USERNAME", "tigergraph")
password = os.getenv("PASS", "tigergraph")
conn = TigerGraphConnection(
    host=host,
    username=username,
    password=password,
    gsPort="14240",
    restppPort="14240",
    graphname = "TigerGraphRAG"
)

# And then add GraphRAG's address to the connection. This address
# is the host's address where the GraphRAG container is running.
conn.ai.configureGraphRAGHost(f"{host}:8000")

## Create a Graph and Ingest Data

We provide utilities to setup your TigerGraph database with a schema and load your desired documents. In this example, we are utilizing the TigerGraph documentation as our dataset. The documents are processed into a JSONL file of the following format:

```json
{"url": "some_url_here", "content": "Text of the document"}
```

The following code block will create a graph called `TigerGraphRAG` and load the documents into the graph. The schema that is created looks like this:

![graphrag_schema](./pictures/GraphRAGSchema.png)

Create Graph

In [None]:
conn.gsql(f"""CREATE GRAPH {conn.graphname}()""")

Get connection token if authentication is enabled

In [None]:
# We need to authenticate the connection
conn.getToken()

Create SuportAI schema and install related queries

In [None]:
conn.ai.initializeSupportAI()

Create DocumentIngest for local file

In [None]:
res = conn.ai.createDocumentIngest(
    data_source="local",
    data_source_config={"data_path": "./data/tg_tutorials.jsonl"},
    loader_config={"doc_id_field": "doc_id", "content_field": "content", "doc_type": "markdown"},
    file_format="json",
)
print(res)

Run DocumentIngest to load documents to graph

In [None]:
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], res["data_path"])

Alternatively, create and run DocumentIngest for data files on Cloud storage

In [None]:
access = ""
sec = ""
res = conn.ai.createDocumentIngest(
    data_source="s3",
    data_source_config={"aws_access_key": access, "aws_secret_key": sec},
    loader_config={"doc_id_field": "url", "content_field": "content", "doc_type": ""},
    file_format="json",
)

In [None]:
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], "s3://tg-documentation/pytg_current/pytg_current.jsonl")

## Build Knowledge Graph from the documents loaded

Constructing the full knowledge graph with semantics, context, and structure: we trigger an end-to-end pipeline that performs llm integration operations:context, and structure: including document chunking, embedding, upserting, extraction, and community detection.

In [None]:
conn.ai.forceConsistencyUpdate("graphrag")

## Comparing Document Search Methods

TigerGraph GraphRAG offers hybrid vector and graph-based methods for searching documents within the graph, including:

- **Hybrid**: This method uses a combination of vector search and graph traversal to find the most relevant information to the query. It uses the similarity algorithm to search the embeddings of documents, document chunks, entities, and relationships. These results serve as the starting point for the graph traversal. The graph traversal is used to find the most relevant information to the query.


- **GraphRAG (Community Search)**: This method enhances retrieval by leveraging graph traversal and community detection. It starts from top-k similar communities and performs a graph traversal across relevant relationships to identify communities of related chunks. The traversal is guided by connection patterns in the graph rather than just semantic similarity, enabling richer and more coherent context retrieval. GraphRAG is especially effective in complex knowledge graphs where multi-hop reasoning or structural connections are important.

TigerGraph GraphRAG provides a way to generate the response to the user's query using a LLM, based on the search results from the methods above. You can compare the responses generated by the LLM for each of the search methods to see which one is the most relevant to the user's query. In this example, we can see that the Hybrid method generates the most relevant response to the user's query.

Now, open [http://{your-server-ip}:80](http://192.168.11.12:80) to access and try the GraphRag system using UI. 

In addition to the UI, APIs are available to programmatically generate responses


In [None]:
query = "How do I connect to a TigerGraph database using Python?"

### Answer question using GraphRAG (Community Search)

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="graphrag",
                        method_parameters={"community_level": 2, "top_k": 3, "verbose": True})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])

Check verbose info for more details if needed

In [None]:
import json
print(json.dumps(resp["verbose"], indent=4))

### Answer question using Hybrid Search

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="hybrid",
                        method_parameters = {"indices": ["Document", "DocumentChunk", "Entity", "Relationship"],
                                             "top_k": 5,
                                             "num_hops": 2,
                                             "num_seen_min": 3,
                                             "verbose": True})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])

In [None]:
print(resp["retrieved"])