# Using TigerGraph GraphRAG for Document Question Answering

This notebook demostrates how to use TigerGraph GraphRAG, an AI assistant for your TigerGraph databases. TigerGraph GraphRAG enables you to ask questions in natural language about your document data stored in TigerGraph and get answers in a human-readable format. GraphRAG is a graph-based retrieval-augmented generation approach that is used to answer questions about the document data stored in TigerGraph. TigerGraph GraphRAG is built to help users get started with GraphRAG and to provide a seamless experience for users to interact with their document data within TigerGraph.

In [None]:
import os
from pyTigerGraph import TigerGraphConnection
from dotenv import load_dotenv

load_dotenv()
# We first create a connection to the database
host = "http://localhost" 
username = os.getenv("USERNAME", "tigergraph")
password = os.getenv("PASS", "tigergraph")
conn = TigerGraphConnection(
    host=host,
    username=username,
    password=password,
    gsPort="14240",
    restppPort="14240",
    graphname = "TigerGraphRAG"
)

# And then add GraphRAG's address to the connection. This address
# is the host's address where the GraphRAG container is running.
conn.ai.configureGraphRAGHost(f"{host}:8000")

## Create a Graph and Ingest Data

We provide utilities to setup your TigerGraph database with a schema and load your desired documents. In this example, we are utilizing the TigerGraph documentation as our dataset. The documents are processed into a JSONL file of the following format:

```json
{"doc_id": "id_for_document_here", "content": "Text content of the document"}
```

The following code block will create a graph called `TigerGraphRAG` and load the documents into the graph. The schema that is created looks like this:

![graphrag_schema](../img/GraphRAGSchema.png)

Create Graph

In [None]:
conn.gsql(f"""CREATE GRAPH {conn.graphname}()""")

Get connection token if authentication is enabled

In [None]:
# We need to authenticate the connection
conn.getToken()

Create SuportAI schema and install related queries

In [None]:
conn.ai.initializeSupportAI()

Create DocumentIngest for local file

In [None]:
res = conn.ai.createDocumentIngest(
    data_source="local",
    data_source_config={"data_path": "./data/tg_tutorials.jsonl"},
    loader_config={"doc_id_field": "doc_id", "content_field": "content", "doc_type": "markdown"},
    file_format="json",
)
print(res)

Run DocumentIngest to load documents to graph

In [None]:
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], res["data_path"])

Alternatively, create and run DocumentIngest for data files on Cloud storage

In [None]:
access = ""
sec = ""
res = conn.ai.createDocumentIngest(
    data_source="s3",
    data_source_config={"aws_access_key": access, "aws_secret_key": sec},
    loader_config={"doc_id_field": "url", "content_field": "content", "doc_type": ""},
    file_format="json",
)

In [None]:
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], "s3://tg-documentation/pytg_current/pytg_current.jsonl")

## Build Knowledge Graph from the documents loaded

In [None]:
conn.ai.forceConsistencyUpdate("graphrag")

## Comparing Document Search Methods

TigerGraph GraphRAG provides multiple methods to search documents in the graph. The methods are:
- **Hybrid Search**: This method uses a combination of vector search and graph traversal to find the most relevant information to the query. It uses the selected algorithm to search the embeddings of documents, document chunks, entities, and relationships. These results serve as the starting point for the graph traversal. The graph traversal is used to find the most relevant information to the query.

- **Similarity Search**: This method uses the selected algorithm to search the embeddings of one of the document, document chunk, entity, or relationship vector indices. It returns the most relevant information to the query based on the embeddings. This method is what you would expect from a traditional vector RAG solution.

- **Sibling Search**: This method is very similar to the Vector Search method, but it uses the sibling (IS_AFTER) relationships between document chunks to expand the context around the document chunk that is most relevant to the query. This method is useful when you want to get more context around the most relevant document chunk.

In [None]:
query = "How do I get the vertex count from TigerGrpah using Python?"

### Hybrid Search in TigerGraph

In [None]:
conn.ai.searchDocuments(query,
                        method="hybrid",
                        method_parameters = {"indices": ["DocumentChunk", "Entity"],
                                             "top_k": 5,
                                             "num_hops": 2,
                                             "num_seen_min": 3,
                                             "verbose": False})

### Document Chunk Similarity Search

In [None]:
conn.ai.searchDocuments(query,
                        method="similarity",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "withHyDE": False,
                                           "verbose": False})

### Sibling Document Chunk Similarity Search

In [None]:
conn.ai.searchDocuments(query,
                        method="sibling",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "lookahead": 3,
                                           "lookback": 3,
                                           "withHyDE": False,
                                           "verbose": False})

### GraphRAG Document Chunk Community Search

In [None]:
conn.ai.searchDocuments(query,
                        method="graphrag",
                        method_parameters={"community_level": 2, "top_k": 3, "verbose": True})

## Comparing LLM Generated Responses

TigerGraph GraphRAG provides a way to generate the response to the user's query using a LLM, based on the search results from the methods above. You can compare the responses generated by the LLM for each of the search methods to see which one is the most relevant to the user's query. In this example, we can see that the Hybrid Search method generates the most relevant response to the user's query. While none of the responses were wrong, the Hybrid Search method generated the most relevant response to the user's query, by suggesting to use the `getVertexCount()` function to get the number of vertices in the graph.

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="graphrag",
                        method_parameters={"community_level": 2, "top_k": 3, "verbose": True})

In [None]:
print(resp["response"])

Check verbose info for more details if needed

In [None]:
import json
print(json.dumps(resp["verbose"], indent=4))

### Answer question using Hybrid Search

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="hybrid",
                        method_parameters = {"indices": ["DocumentChunk", "Entity"],
                                             "top_k": 5,
                                             "num_hops": 2,
                                             "num_seen_min": 3,
                                             "verbose": True})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])

### Answer question using Similarity Search

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="similarity",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "withHyDE": False})

In [None]:
print(resp["response"])

### Answer question using Sibling Search

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="sibling",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "lookahead": 3,
                                           "lookback": 3,
                                           "withHyDE": False})

In [None]:
print(resp["response"])