# Chat on Tabular Data

TableGPT Agent excels at analyzing and processing tabular data. To perform data analysis, you need to first let the agent "see" the dataset. This is done by a specific "file-reading" workflow. In short, you begin by "uploading" the dataset and let the agent read it. Once the data is read, you can ask the agent questions about it.

> To learn more about the file-reading workflow, see [File Reading](../../explanation/file-reading).

For data analysis tasks, we introduce two important parameters when creating the agent: `checkpointer` and `session_id`.

- The `checkpointer` should be an instance of `langgraph.checkpoint.base.BaseCheckpointSaver`, which acts as a versioned "memory" for the agent. (See [langgraph's persistence concept](https://langchain-ai.github.io/langgraph/concepts/persistence) for more details.)
- The `session_id` is a unique identifier for the current session. It ties the agent's execution to a specific kernel, ensuring that the agent's results are retained across multiple invocations.


In [1]:
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from pybox import LocalPyBoxManager
from tablegpt import DEFAULT_TABLEGPT_IPYKERNEL_PROFILE_DIR
from tablegpt.agent import create_tablegpt_graph

llm = ChatOpenAI(openai_api_base="YOUR_VLLM_URL", openai_api_key="whatever", model_name="TableGPT2-7B")
pybox_manager = LocalPyBoxManager(profile_dir=DEFAULT_TABLEGPT_IPYKERNEL_PROFILE_DIR)
checkpointer = MemorySaver()

agent = create_tablegpt_graph(
    llm=llm,
    pybox_manager=pybox_manager,
    checkpointer=checkpointer,
    session_id="some-session-id", # This is required when using file-reading
)

Add the file for processing in the additional_kwargs of HumanMessage. Here's an example using the [Titanic dataset](https://github.com/tablegpt/tablegpt-agent/blob/main/examples/datasets/titanic.csv).


In [2]:
from typing import TypedDict
from langchain_core.messages import HumanMessage

class Attachment(TypedDict):
    """Contains at least one dictionary with the key filename."""
    filename: str

attachment_msg = HumanMessage(
    content="",
    # Please make sure your iPython kernel can access your filename.
    additional_kwargs={"attachments": [Attachment(filename="titanic.csv")]},
)

Invoke the agent as shown in the quick start:


In [3]:
from datetime import date
from tablegpt.agent.file_reading import Stage

# Reading and processing files.
response = await agent.ainvoke(
    input={
        "entry_message": attachment_msg,
        "processing_stage": Stage.UPLOADED,
        "messages": [attachment_msg],
        "parent_id": "some-parent-id1",
        "date": date.today(),
    },
    config={
        # Using checkpointer requires binding thread_id at runtime.
        "configurable": {"thread_id": "some-thread-id"},
    },
)
response["messages"]

[HumanMessage(content='', additional_kwargs={'attachments': [{'filename': 'titanic.csv'}]}, response_metadata={}, id='ab0a7157-ad7d-4de8-9b24-1bee78ad7c55'),
 AIMessage(content="我已经收到您的数据文件，我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到 `df` 变量中，并通过 `df.info` 查看 NaN 情况和数据类型。\n```python\n# Load the data into a DataFrame\ndf = read_df('titanic.csv')\n\n# Remove leading and trailing whitespaces in column names\ndf.columns = df.columns.str.strip()\n\n# Remove rows and columns that contain only empty values\ndf = df.dropna(how='all').dropna(axis=1, how='all')\n\n# Get the basic information of the dataset\ndf.info(memory_usage=False)\n```", additional_kwargs={'parent_id': 'some-parent-id1', 'thought': '我已经收到您的数据文件，我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到 `df` 变量中，并通过 `df.info` 查看 NaN 情况和数据类型。', 'action': {'tool': 'python', 'tool_input': "# Load the data into a DataFrame\ndf = read_df('titanic.csv')\n\n# Remove leading and trailing whitespaces in column names\ndf.columns = df.columns.str.strip()\n\n# Remove rows a

Continue to ask questions for data analysis:


In [4]:
human_message = HumanMessage(content="How many men survived?")

async for event in agent.astream_events(
    input={
        # After using checkpoint, you only need to add new messages here.
        "messages": [human_message],
        "parent_id": "some-parent-id2",
        "date": date.today(),
    },
    version="v2",
    # We configure the same thread_id to use checkpoints to retrieve the memory of the last run.
    config={"configurable": {"thread_id": "some-thread-id"}},
):
    event_name: str = event["name"]
    evt: str = event["event"]
    if evt == "on_chat_model_end":
        print(event["data"]["output"])
    elif event_name == "tool_node" and evt == "on_chain_stream":
        for lc_msg in event["data"]["chunk"]["messages"]:
            print(lc_msg)
    else:
        # Other events can be handled here.
        pass


content="为了回答您的问题，我将筛选出所有男性乘客并计算其中的幸存者数量。\n```python\n# Filter male passengers who survived and count them\nmale_survivors = df[(df['Sex'] == 'male') & (df['Survived'] == 1)]\nmale_survivors_count = male_survivors.shape[0]\nmale_survivors_count\n```" additional_kwargs={} response_metadata={'finish_reason': 'stop', 'model_name': 'TableGPT2-7B'} id='run-661d7496-341d-4a6b-84d8-b4094db66ef0'
content=[{'type': 'text', 'text': '```pycon\n1\n```'}] name='python' id='1c7531db-9150-451d-a8dd-f07176454e6f' tool_call_id='2860e8bb-0fa7-421b-bb2d-bfeca873354b' artifact=[]
content='根据数据集，有 1 名男性乘客幸存。' additional_kwargs={} response_metadata={'finish_reason': 'stop', 'model_name': 'TableGPT2-7B'} id='run-db640705-0085-4f47-adb4-3e0adce694cd'
