# Using TigerGraph CoPilot for Document Question Answering

This notebook demostrates how to use TigerGraph CoPilot (currently in Beta), an AI assistant for your TigerGraph databases. TigerGraph CoPilot enables you to ask questions in natural language about your document data stored in TigerGraph and get answers in a human-readable format. GraphRAG is a graph-based retrieval-augmented generation approach that is used to answer questions about the document data stored in TigerGraph. TigerGraph CoPilot is built to help users get started with GraphRAG and to provide a seamless experience for users to interact with their document data within TigerGraph.

In [1]:
import os
from pyTigerGraph import TigerGraphConnection
from dotenv import load_dotenv

load_dotenv()
# We first create a connection to the database
host = "http://192.168.99.201" #os.environ["HOST"]
username = os.getenv("USERNAME", "tigergraph")
password = os.getenv("PASS", "tigergraph")
conn = TigerGraphConnection(
    host=host,
    username=username,
    password=password,
    gsPort="31409"
)

# And then add CoPilot's address to the connection. This address
# is the host's address where the CoPilot container is running.
conn.ai.configureCoPilotHost("http://localhost:8000")

## Create a Graph and Ingest Data

We provide utilities to setup your TigerGraph database with a schema and load your desired documents. In this example, we are utilizing the pyTigerGraph documentation as our dataset. The documents are processed into a JSONL file of the following format:

```json
{"url": "some_url_here", "content": "Text of the document"}
```

The following code block will create a graph called `pyTigerGraphRAG` and load the documents into the graph. The schema that is created looks like this:

![supportai_schema](../img/SupportAISchema.png)

In [2]:
conn.gsql("""CREATE GRAPH pyTigerGraphRAG()""")

'The graph pyTigerGraphRAG is created.'

In [3]:
conn.graphname = "pyTigerGraphRAG"
#conn.getToken()

In [9]:
conn.ai.initializeSupportAI()

{'host_name': 'http://192.168.99.201',

In [10]:
access = "AKIARJ6KUJUIS7KJ27YO"
sec = "swYmXU+4yZbXiYCMfwSFxrcS0hNiOd6nzYog6VCZ"
res = conn.ai.createDocumentIngest(
    data_source="s3",
    data_source_config={"aws_access_key": access, "aws_secret_key": sec},
    loader_config={"doc_id_field": "url", "content_field": "content"},
    file_format="json",
)

In [11]:
conn.ai.runDocumentIngest(res["load_job_id"], res["data_source_id"], "s3://tg-documentation/pytg_current/pytg_current.jsonl")

{'job_name': 'load_documents_content_json_4cc0b2115f754540b4543469612743f6',
 'job_id': 'pyTigerGraphRAG.load_documents_content_json_4cc0b2115f754540b4543469612743f6.stream.SupportAI_pyTigerGraphRAG_bc71b650248d41df83eae15155c2bce5.1733184693598',
 'log_location': '/home/tigergraph/tigergraph/log/kafkaLoader/pyTigerGraphRAG.load_documents_content_json_4cc0b2115f754540b4543469612743f6.stream.SupportAI_pyTigerGraphRAG_bc71b650248d41df83eae15155c2bce5.1733184693598'}

In [None]:
conn.ai.forceConsistencyUpdate()

{'status': 'submitted'}

## Comparing Document Search Methods

TigerGraph CoPilot provides multiple methods to search documents in the graph. The methods are:
- **HNSW Overlap**: This method uses a combination of vector search and graph traversal to find the most relevant information to the query. It uses the HNSW algorithm to search the embeddings of documents, document chunks, entities, and relationships. These results serve as the starting point for the graph traversal. The graph traversal is used to find the most relevant information to the query.

- **Vector Search**: This method uses the HNSW algorithm to search the embeddings of one of the document, document chunk, entity, or relationship vector indices. It returns the most relevant information to the query based on the embeddings. This method is what you would expect from a traditional vector RAG solution.

- **Sibling Search**: This method is very similar to the Vector Search method, but it uses the sibling (IS_AFTER) relationships between document chunks to expand the context around the document chunk that is most relevant to the query. This method is useful when you want to get more context around the most relevant document chunk.

In [None]:
query = "How do I get a count of vertices in Python?"

### HNSW Index Overlap in Graph

In [None]:
conn.ai.searchDocuments(query,
                        method="hnswoverlap",
                        method_parameters = {"indices": ["Document", "DocumentChunk", "Entity", "Relationship"],
                                             "top_k": 2,
                                             "num_hops": 2,
                                             "num_seen_min": 2})

### Document Chunk Vector Search

In [None]:
conn.ai.searchDocuments(query,
                        method="vdb",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "withHyDE": False})

### Sibling Document Chunk Vector Search

In [None]:
conn.ai.searchDocuments(query,
                        method="sibling",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "lookahead": 3,
                                           "lookback": 3,
                                           "withHyDE": False})

## Comparing LLM Generated Responses

TigerGraph CoPilot provides a way to generate the response to the user's query using a LLM, based on the search results from the methods above. You can compare the responses generated by the LLM for each of the search methods to see which one is the most relevant to the user's query. In this example, we can see that the HNSW Overlap method generates the most relevant response to the user's query. While none of the responses were wrong, the HNSW Overlap method generated the most relevant response to the user's query, by suggesting to use the `getVertexCount()` function to get the number of vertices in the graph.

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="hnswoverlap",
                        method_parameters = {"indices": ["Document", "DocumentChunk", "Entity", "Relationship"],
                                             "top_k": 2,
                                             "num_hops": 2,
                                             "num_seen_min": 2})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="vdb",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "withHyDE": False})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])

In [None]:
resp = conn.ai.answerQuestion(query,
                        method="sibling",
                        method_parameters={"index": "DocumentChunk",
                                           "top_k": 5,
                                           "lookahead": 3,
                                           "lookback": 3,
                                           "withHyDE": False})

In [None]:
print(resp["response"])

In [None]:
print(resp["retrieved"])