# Transaction Fraud Investigation Report Generation

This notebook is an example of how to generate a report for a transaction fraud investigation. It is merely a starting point and can be customized to fit the needs of the investigation. The report is generated using a LLM of your choice and TigerGraph CoPilot. We connect to a TigerGraph database that contains the Transaction Fraud solution kit. To get started, you will need to have a TigerGraph Cloud account and have the Transaction Fraud solution kit installed. Go to the [TigerGraph Cloud](https://tgcloud.io) to sign up for an account and install the Transaction Fraud solution kit.

## Connect to TigerGraph and TigerGraph CoPilot

First, we need to connect to the TigerGraph database and TigerGraph CoPilot. We will use the `pyTigerGraph` library to connect to the database and TigerGraph CoPilot. Make sure to properly configure the connection parameters below.

In [1]:
from pyTigerGraph import TigerGraphConnection

conn = TigerGraphConnection(host="https://YOUR_TIGERGRAPH_CLOUD_HOST_HERE", graphname="Transaction_Fraud", username="YOUR_USERNAME_HERE", password="YOUR_PASSWORD_HERE")

conn.getToken()

conn.ai.configureCoPilotHost()

## Connect to OpenAI
We are using OpenAI's GPT-4o model to generate the report. You will need your own OpenAI API key. We connect to OpenAI through the `langchain` library.

In [2]:
from langchain_community.chat_models import ChatOpenAI

llm = ChatOpenAI(
    temperature=0.00001, model_name="gpt-4o-2024-05-13", openai_api_key="YOUR_OPENAI_KEY_HERE"
)

  warn_deprecated(


## Starting the Investigation

In the cells below, we ask the LLM to generate a list of questions to answer in order to start the investigation. The graph schema is provided as the information available to answer the questions.

In [3]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

from langchain.pydantic_v1 import BaseModel, Field

class Questions(BaseModel):
    questions: dict[str, str] = Field(description="A dictionary of questions to ask the data analyst. The key is the question and the value is the explanation on why to ask the question.")

question_parser = PydanticOutputParser(pydantic_object=Questions)

PROMPT = PromptTemplate(
            template="""You are a bank fraud investigator and you want to investigate a transaction with id {transaction_id}.
            You may interact with a data analyst to get insights on the transaction. 
            The data anaylst has access to the following data: {schema}.
            What would you like to ask the data analyst? Format your questions in the following way:
            {format_instructions}""",
            input_variables=["schema", "transaction_id"],
            partial_variables={
                "format_instructions": question_parser.get_format_instructions()
            }
)

In [4]:
def generate_schema_rep(conn):
    verts = conn.getVertexTypes()
    edges = conn.getEdgeTypes()
    vertex_schema = []
    for vert in verts:
        primary_id = conn.getVertexType(vert)["PrimaryId"]["AttributeName"]
        attributes = "\n\t\t".join([attr["AttributeName"] + " of type " + attr["AttributeType"]["Name"] for attr in conn.getVertexType(vert)["Attributes"]])
        if attributes == "":
            attributes = "No attributes"
        vertex_schema.append(f"{vert}\n\tPrimary Id Attribute: {primary_id}\n\tAttributes: \n\t\t{attributes}")

    edge_schema = []
    for edge in edges:
        from_vertex = conn.getEdgeType(edge)["FromVertexTypeName"]
        to_vertex = conn.getEdgeType(edge)["ToVertexTypeName"]
        #reverse_edge = conn.getEdgeType(edge)["Config"].get("REVERSE_EDGE")
        attributes = "\n\t\t".join([attr["AttributeName"] + " of type " + attr["AttributeType"]["Name"] for attr in conn.getVertexType(vert)["Attributes"]])
        if attributes == "":
            attributes = "No attributes"
        edge_schema.append(f"{edge}\n\tFrom Vertex: {from_vertex}\n\tTo Vertex: {to_vertex}\n\tAttributes: \n\t\t{attributes}") #\n\tReverse Edge: \n\t\t{reverse_edge}")

    schema_rep = f"""The schema of the graph is as follows:
    Vertex Types:
    {chr(10).join(vertex_schema)}

    Edge Types:
    {chr(10).join(edge_schema)}
    """
    return schema_rep

In [5]:
chain = PROMPT | llm | question_parser

In [6]:
out = chain.invoke({"schema": generate_schema_rep(conn), "transaction_id": '100575'})

/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/langchain_community/chat_models/openai.py:461: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/
  response = response.dict()
/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/pydantic/main.py:928: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/


In [7]:
out.questions

{'What are the details of the transaction with id 100575?': 'To understand the specifics of the transaction, including the amount, time, and whether it is marked as fraudulent.',
 'Which card was used in the transaction with id 100575?': 'To identify the card involved in the transaction and check its history and attributes.',
 'Who is the party associated with the card used in transaction 100575?': 'To identify the owner of the card and gather more information about their profile and history.',
 'Which merchant received the transaction with id 100575?': 'To identify the merchant involved in the transaction and check their attributes and history.',
 'Is the IP address or device used in transaction 100575 blocked?': 'To determine if the transaction was conducted using a suspicious or previously blocked IP address or device.',
 'What is the pagerank of the card and merchant involved in transaction 100575?': 'To assess the influence and connectivity of the card and merchant within the netw

## Asking Questions to TigerGraph CoPilot

Once the questions are generated, we ask TigerGraph CoPilot to answer the questions. The answers are then used to generate the report.

In [8]:
qa_res = {}

for key, val in out.questions.items():
    print("Asking question: ", key)
    result = conn.ai.query(key)
    qa_res[key] = {"answer": result, "reason": val}


Asking question:  What are the details of the transaction with id 100575?
Asking question:  Which card was used in the transaction with id 100575?
Asking question:  Who is the party associated with the card used in transaction 100575?
Asking question:  Which merchant received the transaction with id 100575?
Asking question:  Is the IP address or device used in transaction 100575 blocked?
Asking question:  What is the pagerank of the card and merchant involved in transaction 100575?
Asking question:  What is the community size and ID of the card and merchant involved in transaction 100575?
Asking question:  What is the shortest path length for transaction 100575?
Asking question:  What is the occupation and gender of the party associated with the card used in transaction 100575?
Asking question:  What is the merchant category for the merchant involved in transaction 100575?


In [9]:
import json

for key, val in qa_res.items():
    if val["answer"]["answered_question"]:
        print("Question: ", key)
        print("Reason: ", val["reason"])
        print("Answer: ", val["answer"]["natural_language_response"])
        print("\n\n")

Question:  What are the details of the transaction with id 100575?
Reason:  To understand the specifics of the transaction, including the amount, time, and whether it is marked as fraudulent.
Answer:  The transaction with ID 100575 has the following details:

- **Transaction Time:** 2021-04-07 02:53:00
- **Amount:** $9.34
- **Is Fraud:** Yes
- **Unix Time:** 1617763980
- **Shortest Path Length:** 0
- **Max Transaction Amount Interval:** 1848
- **Max Transaction Count Interval:** 0
- **Count of Repeated Card:** 0
- **Common Merchant Transaction Count:** 1587
- **Common Merchant Transaction Total Amount:** $132,379.90
- **Common Merchant Transaction Average Amount:** $83.42
- **Common Merchant Transaction Max Amount:** $1450.25
- **Common Merchant Transaction Min Amount:** $1.00
- **Common Card Transaction Count:** 0
- **Common Card Transaction Total Amount:** $0.00
- **Common Card Transaction Average Amount:** $0.00
- **Common Card Transaction Max Amount:** $0.00
- **Common Card Transac

In [10]:
investigation_results = {key: {
                         "reason": val["reason"],
                         "answer": val["answer"]["natural_language_response"],
                         "answer_source": val["answer"]["query_sources"]["function_call"],
                         "answer_data": val["answer"]["query_sources"]["result"]}
                         for key, val in qa_res.items() if val["answer"]["answered_question"]}

In [11]:
print(json.dumps(investigation_results, indent=4))

{
    "What are the details of the transaction with id 100575?": {
        "reason": "To understand the specifics of the transaction, including the amount, time, and whether it is marked as fraudulent.",
        "answer": "The transaction with ID 100575 has the following details:\n\n- **Transaction Time:** 2021-04-07 02:53:00\n- **Amount:** $9.34\n- **Is Fraud:** Yes\n- **Unix Time:** 1617763980\n- **Shortest Path Length:** 0\n- **Max Transaction Amount Interval:** 1848\n- **Max Transaction Count Interval:** 0\n- **Count of Repeated Card:** 0\n- **Common Merchant Transaction Count:** 1587\n- **Common Merchant Transaction Total Amount:** $132,379.90\n- **Common Merchant Transaction Average Amount:** $83.42\n- **Common Merchant Transaction Max Amount:** $1450.25\n- **Common Merchant Transaction Min Amount:** $1.00\n- **Common Card Transaction Count:** 0\n- **Common Card Transaction Total Amount:** $0.00\n- **Common Card Transaction Average Amount:** $0.00\n- **Common Card Transaction Max

## Draft a Report

Using the answers to the questions, we draft a report that can be used to investigate the transaction fraud. We also ask the LLM to generate a list of follow up questions to ask in the investigation.

In [12]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

from langchain.pydantic_v1 import BaseModel, Field

class DraftReport(BaseModel):
    draft_report: str = Field(description="A draft of the report based on the answers received from the data analyst. Include citations by adding `[x]` where x is the function call that was used to determine the answer.")
    questions: dict[str, str] = Field(description="A dictionary of questions to ask the data analyst. The key is the question and the value is the explanation on why to ask the question.")

draft_parser = PydanticOutputParser(pydantic_object=DraftReport)

REPORT_PROMPT = PromptTemplate(
            template="""You are a bank fraud investigator and you want to investigate a transaction with id {transaction_id}.
            You may interact with a data analyst to get insights on the transaction. 
            The data anaylst has access to the following data: {schema}.

            So far, you have asked the following questions and received the following answers: {investigation_results}

            Please write a draft of your report based on the answers you have received so far, and include citations to the answers you have received.
            Make sure to analyze the answers and provide a reason of why the answer is important.

            Add any additional questions you would like to ask the data analyst. Format your questions in the following way:
            {format_instructions}""",
            input_variables=["schema", "transaction_id", "investigation_results"],
            partial_variables={
                "format_instructions": draft_parser.get_format_instructions()
            }
)

In [13]:
chain = REPORT_PROMPT | llm | draft_parser

In [14]:
out = chain.invoke({"schema": generate_schema_rep(conn), "transaction_id": 100575, "investigation_results": investigation_results})

/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/langchain_community/chat_models/openai.py:461: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/
  response = response.dict()
/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/pydantic/main.py:928: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/


In [15]:
print(out.draft_report)

### Draft Report on Transaction ID 100575

**Transaction Details**

The transaction with ID 100575 occurred on April 7, 2021, at 02:53 AM. The transaction amount was $9.34, and it has been marked as fraudulent [1]. This information is crucial as it sets the context for the investigation, indicating that the transaction is indeed suspicious and warrants further scrutiny.

**Card Involved**

The card used in this transaction is identified by the card number 180077352287056 [2]. Knowing the card involved helps in tracing its history and identifying any patterns of fraudulent activities associated with it.

**Merchant Involved**

The merchant that received the transaction is identified as fraud_Auer-Mosciski [3]. This information is vital as it allows us to investigate the merchant's history and any potential involvement in fraudulent activities.

**IP Address and Device Status**

There is no information available regarding whether the IP address or device used in this transaction is block

## Ask Follow-up Questions

Here, we ask TigerGraph CoPilot to answer the follow up questions. The answers are then used to generate the final report.

In [16]:
print(json.dumps(out.questions, indent=4))

{
    "What is the IP address and device ID used in transaction 100575?": "To determine if the transaction was conducted using a suspicious or previously blocked IP address or device.",
    "What are the details of the card with ID 180077352287056?": "To investigate the history and attributes of the card involved in the transaction.",
    "What are the details of the merchant fraud_Auer-Mosciski?": "To investigate the history and attributes of the merchant involved in the transaction.",
    "Are there any other transactions involving the card 180077352287056 that are marked as fraudulent?": "To identify any patterns of fraudulent activities associated with the card.",
    "Are there any other transactions involving the merchant fraud_Auer-Mosciski that are marked as fraudulent?": "To identify any patterns of fraudulent activities associated with the merchant."
}


In [17]:
qa_res = {}

for key, val in out.questions.items():
    print("Asking question: ", key)
    result = conn.ai.query(key)
    qa_res[key] = {"answer": result, "reason": val}

Asking question:  What is the IP address and device ID used in transaction 100575?
Asking question:  What are the details of the card with ID 180077352287056?
Asking question:  What are the details of the merchant fraud_Auer-Mosciski?
Asking question:  Are there any other transactions involving the card 180077352287056 that are marked as fraudulent?
Asking question:  Are there any other transactions involving the merchant fraud_Auer-Mosciski that are marked as fraudulent?


In [18]:
qa_res

{'What is the IP address and device ID used in transaction 100575?': {'answer': {'natural_language_response': 'CoPilot had an issue answering your question. Please try again, or rephrase your prompt.',
   'answered_question': False,
   'response_type': 'inquiryai',
   'query_sources': {}},
  'reason': 'To determine if the transaction was conducted using a suspicious or previously blocked IP address or device.'},
 'What are the details of the card with ID 180077352287056?': {'answer': {'natural_language_response': 'The card with ID 180077352287056 has the following details:\n\n- **Card Number**: 180077352287056\n- **Is Fraud**: No\n- **PageRank**: 1.282823\n- **Customer ID**: 274726916\n- **Customer Size**: 247\n- **Occupation**: Psychotherapist\n\nIf you need any more information, feel free to ask!',
   'answered_question': True,
   'response_type': 'inquiryai',
   'query_sources': {'function_call': "getVerticesById(vertexType='Card', vertexIds='180077352287056')",
    'result': '[{"v_

## Finalize the Report

Finally, we generate the final report using the answers to the follow up questions as well as the draft report.

In [20]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

from langchain.pydantic_v1 import BaseModel, Field

class Report(BaseModel):
    report: str = Field(description="A drareport based on the answers received from the data analyst. Include citations by adding `[x]` where x is the function call that was used to determine the answer.")

report_parser = PydanticOutputParser(pydantic_object=Report)

REPORT_PROMPT = PromptTemplate(
            template="""You are a bank fraud investigator and you want to investigate a transaction with id {transaction_id}.
            You may interact with a data analyst to get insights on the transaction. 
            
            You have already written a draft of your report based on the answers you have received so far, found below:
            {draft_report}

            You have also asked the following questions and received the following answers: {investigation_results}

            Please write a final copy of your report based on the answers you have recieved, and include citations to the answers you have received.
            Make sure to analyze the answers and provide a reason of why the answer is important.

            Format your report in the following way:
            {format_instructions}""",
            input_variables=["draft_report", "transaction_id", "investigation_results"],
            partial_variables={
                "format_instructions": report_parser.get_format_instructions()
            }
)

In [21]:
chain = REPORT_PROMPT | llm | report_parser

In [22]:
final_out = chain.invoke({"draft_report": out.draft_report, "transaction_id": 100575, "investigation_results": qa_res})

/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/langchain_community/chat_models/openai.py:461: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/
  response = response.dict()
/opt/anaconda3/envs/pytg_dev/lib/python3.9/site-packages/pydantic/main.py:928: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.3/migration/


In [23]:
print(final_out.report)

### Final Report on Transaction ID 100575

**Transaction Details**

The transaction with ID 100575 occurred on April 7, 2021, at 02:53 AM. The transaction amount was $9.34, and it has been marked as fraudulent [1]. This information is crucial as it sets the context for the investigation, indicating that the transaction is indeed suspicious and warrants further scrutiny.

**Card Involved**

The card used in this transaction is identified by the card number 180077352287056 [2]. The card details are as follows:
- **Card Number**: 180077352287056
- **Is Fraud**: No
- **PageRank**: 1.282823
- **Customer ID**: 274726916
- **Customer Size**: 247
- **Occupation**: Psychotherapist [2]

Knowing the card involved helps in tracing its history and identifying any patterns of fraudulent activities associated with it. Despite the card not being marked as fraudulent, its involvement in a fraudulent transaction raises concerns.

**Merchant Involved**

The merchant that received the transaction is ident

In [24]:
open("report.md", "w").write(final_out.report)

4489