# Using TigerGraph Graph-RAG for Document Question Answering

This notebook demostrates how to use TigerGraph Graph-RAG (currently in Beta), an AI assistant for your TigerGraph databases. TigerGraph Graph-RAG enables you to ask questions in natural language about your document data stored in TigerGraph and get answers in a human-readable format. GraphRAG is a graph-based retrieval-augmented generation approach that is used to answer questions about the document data stored in TigerGraph. TigerGraph Graph-RAG is built to help users get started with GraphRAG and to provide a seamless experience for users to interact with their document data within TigerGraph.

## Setup Environment


* Follow [Docker setup ](https://github.com/tigergraph/ecosys/blob/master/demos/guru_scripts/docker/README.md) to set up your docker Environment.
* Please follow (Overview of installing Docker Compose)[https://docs.docker.com/compose/install/] to install Docker Compose for your platform accordingly.


#### TigerGraph Docker Image

To use TigerGraph Community Edition without a license key, download the corresponding docker image from https://dl.tigergraph.com/ and load to Docker:
```
docker load -i ./tigergraph-4.2.0-community-docker-image.tar.gz
docker images
```

You should be able to find `tigergraph/community:4.2.0` in the image list.

#### Graph-RAG Docker Images

The following images are also needed for TigerGraph Graph-RAG. Docker Compose will automatically download them, but you can download them manually if preferred:

```
docker pull <image_name>

tigergraphml/copilot:latest
tigergraphml/ecc:latest
tigergraphml/chat-history:latest
tigergraphml/copilot-ui:latest
nginx:latest
```
### Deploy Graph-RAG with Docker Compose
#### Get docker-compose file
Download the [docker-compose.yml](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/copilot/docker-compose.yml) file directly

The Docker Compose file contains all dependencies for Graph-RAG including a TigerGraph database. If you want to use a separate TigerGraph instance, you can comment out the `tigergraph` section from the docker compose file and restart all services. However, please follow the instructions below to make sure your standalone TigerGraph server is accessible from other Graph-RAG containers.

#### Set up configurations

Next, download the following configuration files and put them in a `configs` subdirectory of the directory contains the Docker Compose file:
* [configs/db_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/copilot/configs/db_config.json)
* [configs/llm_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/copilot/configs/llm_config.json)
* [configs/chat_config.json](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/copilot/configs/chat_config.json)
* [configs/nginx.conf](https://raw.githubusercontent.com/tigergraph/ecosys/refs/heads/master/tutorials/copilot/configs/nginx.conf)

#### Adjust configurations

Edit `configs/llm_config.json` and replace `<YOUR_OPENAI_API_KEY>` to your own OPENAI_API_KEY. 
 
> If desired, you can also change the model to be used for the embedding service and completion service to your preferred models to adjust the output from the LLM service.

#### Start all services

Now, simply run `docker compose up -d` and wait for all the services to start.

## Build GraphRAG From Scratch

If you want to experience the whole process of Copilot, you can build the GraphRAG from scratch. However, please review the LLM model and service setting carefully because it will cost some money to re-generate embedding and data structure for the raw data.

#### Step 1: Database Connection Creation

In [4]:
import os
import json
from pyTigerGraph import TigerGraphConnection

if __name__ == "__main__":
    with open("./configs/db_config.json") as cfg:
        config = json.load(cfg)

    config["hostname"] = "http://192.168.11.11"
    config["username"] = "tigergraph"
    config["password"] = "tigergraph"
    
    # We first create a connection to the database
    conn = TigerGraphConnection(
        host=config["hostname"],
        username=config["username"],
        password=config["password"],
        restppPort=config["restppPort"],
    )
    conn.graphname = "TigerGraphRAG_demo"

conn.gsql(f"""CREATE GRAPH {conn.graphname}()""")
# And then add Graph-RAG's address to the connection. This address
# is the host's address where the Graph-RAG container is running.


conn.getToken()

# And then add Graph-RAG's address to the connection. This address
# is the host's address where the Graph-RAG container is running.
conn.ai.configureCoPilotHost("http://192.168.11.11:8000")


#### Step 2: Initialize Graph and Ingest Data

We provide utilities to setup your TigerGraph database with a schema and load your desired documents. In this example, we are utilizing the TigerGraph documentation as our dataset. The documents are processed into a JSONL file of the following format:

```json
{"url": "some_url_here", "content": "Text of the document"}
```

The following code cell will run schema change jobs for `TigerGraphRAG_demo`, including basic schema, vector embeddings, indexing and install retriever and related queries.

In [7]:
conn.ai.initializeSupportAI()

{'host_name': 'http://tigergraph',

#### Step 3: Ingest Data

The following code will ingest data using a local loading job. 

In [8]:
def load_data(conn: TigerGraphConnection):
    load_job = """CREATE LOADING JOB load_documents_content_as_json {
    DEFINE FILENAME DocumentContent;
    LOAD DocumentContent TO TEMP_TABLE tc (doc_id, doc_type, content) VALUES (flatten_json_array($0, $"doc_id", $"doc_type", $"content")) USING SEPARATOR="|||||||||||";

    LOAD TEMP_TABLE tc TO VERTEX Document VALUES($"doc_id", gsql_current_time_epoch(0), _, _);
    LOAD TEMP_TABLE tc TO VERTEX Content VALUES($"doc_id", $"doc_type", $"content", gsql_current_time_epoch(0));
    LOAD TEMP_TABLE tc TO EDGE HAS_CONTENT VALUES($"doc_id" Document, $"doc_id" Content);
    }"""
    
    conn.gsql(f"USE GRAPH {conn.graphname}\n{load_job}")
    conn.runLoadingJobWithFile("./data/tg_tutorials.jsonl", "DocumentContent", "load_documents_content_as_json", sep="|||||||||||||")
load_data(conn)

Alternatively, create and run DocumentIngest for data files on Cloud storage

In [None]:
access = ""
sec = ""
res = conn.ai.createDocumentIngest(
    data_source="s3",
    data_source_config={"aws_access_key": access, "aws_secret_key": sec},
    loader_config={"doc_id_field": "url", "content_field": "content", "doc_type": ""},
    file_format="json",
)
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], "s3://tg-documentation/pytg_current/pytg_current.jsonl")

#### Step 4: Build Knowledge Graph 

The following code builds the knowledge graph by performing chunking, embedding, upserting, and extraction using an LLM.

In [9]:
conn.ai.forceConsistencyUpdate(method="graphrag")

{'status': 'submitted'}

## Comparing Document Search Methods and LLM Generated Responses


TigerGraph CoPilot provides multiple methods to search documents in the graph. The methods are:
- **HNSW Overlap**: This method uses a combination of vector search and graph traversal to find the most relevant information to the query. It uses the HNSW algorithm to search the embeddings of documents, document chunks, entities, and relationships. These results serve as the starting point for the graph traversal. The graph traversal is used to find the most relevant information to the query.

- **Vector Search**: This method uses the HNSW algorithm to search the embeddings of one of the document, document chunk, entity, or relationship vector indices. It returns the most relevant information to the query based on the embeddings. This method is what you would expect from a traditional vector RAG solution.

- **Sibling Search**: This method is very similar to the Vector Search method, but it uses the sibling (IS_AFTER) relationships between document chunks to expand the context around the document chunk that is most relevant to the query. This method is useful when you want to get more context around the most relevant document chunk.

- **GraphRAG (Community Search)**: This method enhances retrieval by leveraging graph structure and community detection. It starts from top-k similar document chunks and performs a graph traversal across relevant relationships to identify communities of related chunks. The traversal is guided by connection patterns in the graph rather than just semantic similarity, enabling richer and more coherent context retrieval. GraphRAG is especially effective in complex knowledge graphs where multi-hop reasoning or structural connections are important.

TigerGraph CoPilot provides a way to generate the response to the user's query using a LLM, based on the search results from the methods above. You can compare the responses generated by the LLM for each of the search methods to see which one is the most relevant to the user's query. In this example, we can see that the HNSW Overlap method generates the most relevant response to the user's query.

In [None]:
query = "how to load data to tigergraph vector store, give an example in Python"
print(f"""Fetching answer for question: {query}""")

resp = conn.ai.answerQuestion(
    query,
    method="hnswoverlap",
    method_parameters = {
        "indices": ["Document", "DocumentChunk", "Entity", "Relationship"],
        "top_k": 2,
        "num_hops": 2,
        "num_seen_min": 2,
        "verbose": True
    })

print(f"""\nAnswer using HNSW_Overlap:\n{resp["response"]}""")


In [None]:

resp = conn.ai.answerQuestion(query,
                        method="vdb",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "withHyDE": False})

print(f"""\nAnswer using HNSW:\n{resp["response"]}""")

In [None]:
resp = conn.ai.answerQuestion(
    query,
    method="graphrag",
    method_parameters={
        "community_level": 2,
        "combine": False,
        "top_k": 5,
        "verbose": True
    })

print(f"""\nAnswer using GraphRAG:\n{resp["response"]}""")