## Prerequisites

You need to install the following libraries to run codes in this article:

1. GridDB C Client
2. GridDB Python client

Follow the instructions on the GridDB Python Package Index (Pypi) page to install these clients.

You must also install LangChain, Numpy, Pandas, and Seaborn libraries.

The scripts below install and import the libraries you will need to run the code in this blog.

In [1]:
!pip install langchain
!pip install langchain-core
!pip install langchain-openai
!pip install langchain-experimental
!pip install tabulate

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [24]:

import griddb_python as griddb
import pandas as pd
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents import AgentExecutor, Tool
from langchain.memory import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from typing import List, Dict

## Creating a Connection With GridDB

In [3]:
factory = griddb.StoreFactory.get_instance()

DB_HOST = "127.0.0.1:10001"
DB_CLUSTER = "myCluster"
DB_USER = "admin"
DB_PASS = "admin"

try:
    gridstore = factory.get_store(
        notification_member = DB_HOST,
        cluster_name = DB_CLUSTER,
        username = DB_USER,
        password = DB_PASS
    )

    container1 = gridstore.get_container("container1")
    if container1 == None:
        print("Container does not exist")
    print("Successfully connected to GridDB")

except griddb.GSException as e:
    for i in range(e.get_error_stack_size()):
        print("[", i, "]")
        print(e.get_error_code(i))
        print(e.get_location(i))
        print(e.get_message(i))

Container does not exist
Successfully connected to GridDB


## Inserting Sample Data into GridDB

In [4]:
## Dataset link: https://www.kaggle.com/datasets/iamsouravbanerjee/world-population-dataset

dataset = pd.read_csv(r"/home/mani/GridDB Projects/world_population.csv")
print(dataset.shape)
dataset.head()


(234, 17)


Unnamed: 0,Rank,CCA3,Country/Territory,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
0,36,AFG,Afghanistan,Kabul,Asia,41128771,38972230,33753499,28189672,19542982,10694796,12486631,10752971,652230,63.0587,1.0257,0.52
1,138,ALB,Albania,Tirana,Europe,2842321,2866849,2882481,2913399,3182021,3295066,2941651,2324731,28748,98.8702,0.9957,0.04
2,34,DZA,Algeria,Algiers,Africa,44903225,43451666,39543154,35856344,30774621,25518074,18739378,13795915,2381741,18.8531,1.0164,0.56
3,213,ASM,American Samoa,Pago Pago,Oceania,44273,46189,51368,54849,58230,47818,32886,27075,199,222.4774,0.9831,0.0
4,203,AND,Andorra,Andorra la Vella,Europe,79824,77700,71746,71519,66097,53569,35611,19860,468,170.5641,1.01,0.0


In [5]:
dataset.columns = dataset.columns.str.replace('[^a-zA-Z0-9]', '_', regex=True)
dataset.dtypes

Rank                             int64
CCA3                            object
Country_Territory               object
Capital                         object
Continent                       object
2022_Population                  int64
2020_Population                  int64
2015_Population                  int64
2010_Population                  int64
2000_Population                  int64
1990_Population                  int64
1980_Population                  int64
1970_Population                  int64
Area__km__                       int64
Density__per_km__              float64
Growth_Rate                    float64
World_Population_Percentage    float64
dtype: object

In [6]:
# see all GridDB data types: https://docs.griddb.net/architecture/data-model/#data-type

def map_pandas_dtype_to_griddb(dtype):
    if dtype == 'int64':
        return griddb.Type.LONG
    elif dtype == 'float64':
        return griddb.Type.FLOAT
    elif dtype == 'object':
        return griddb.Type.STRING
    # Add more column types if you want
    else:
        raise ValueError(f'Unsupported pandas type: {dtype}')

container_columns = []
for column_name, dtype in dataset.dtypes.items():
    griddb_dtype = map_pandas_dtype_to_griddb(str(dtype))
    container_columns.append([column_name, griddb_dtype])

container_info = griddb.ContainerInfo("PopulationStats",
                                      container_columns,
                                      griddb.ContainerType.COLLECTION, True)


try:
    cont = gridstore.put_container(container_info)
    for index, row in dataset.iterrows():
        cont.put(row.tolist())
    print("All rows have been successfully stored in the GridDB container.")

except griddb.GSException as e:
    for i in range(e.get_error_stack_size()):
        print("[", i, "]")
        print(e.get_error_code(i))
        print(e.get_location(i))
        print(e.get_message(i))


All rows have been successfully stored in the GridDB container.


## Creating a LangChain Chatbot to Interact with GridDB Data

In [7]:
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
llm = ChatOpenAI(api_key = OPENAI_API_KEY ,
                 temperature = 0,
                model_name = "gpt-4o")

### Problem with LangChain Chains for Creating a Chatbot for Tabular Data

In [8]:
class SelectData(BaseModel):
    container_name: str = Field(description="the container name from the user query")
    query:str = Field(description="natural language converted to SELECT query")



system_command = """ 
Convert user commands into SQL queries for Griddb.
"""

user_prompt = ChatPromptTemplate.from_messages([
    ("system", system_command),
    ("user", "{input}")
])

select_chain = user_prompt | llm.with_structured_output(SelectData)

def select_records(query):

    select_data = select_chain.invoke(query)
    container_name = select_data.container_name
    select_query = select_data.query

    print(select_query)
    
    result_container = gridstore.get_container(container_name)
    query = result_container.query(select_query)
    rs = query.fetch()
    result_data = rs.fetch_rows()
    return result_data


select_records("From the PopulationStats container, return the top 3 countries with the highest population in 2020")

SELECT country, population FROM PopulationStats WHERE year = 2020 ORDER BY population DESC LIMIT 3


GSException: 

### LangChain Agents for Interacting with Tabular Data

In [9]:
class SelectData(BaseModel):
    container_name: str = Field(description="the container name from the user query")
    natural_query:str = Field(description = "user query string to retrieve additional information from result returned by the SELECT query")

system_command = """ 
Convert user commands into SQL queries for Griddb.
"""

user_prompt = ChatPromptTemplate.from_messages([
    ("system", system_command),
    ("user", "{input}")
])

select_chain = user_prompt | llm.with_structured_output(SelectData)

In [10]:
def select_records(query):

    select_data = select_chain.invoke(query)
    container_name = select_data.container_name
    select_query = f"SELECT * FROM {container_name}"
    natural_query = select_data. natural_query

    print(f"Select query: {select_query}")
    print(f"Additional query: {natural_query}")
 
    result_container = gridstore.get_container(container_name)
    query = result_container.query(select_query)
    rs = query.fetch()
    result_data = rs.fetch_rows()

    agent = create_pandas_dataframe_agent(
            ChatOpenAI(
                api_key = OPENAI_API_KEY,
                temperature=0, 
                model="gpt-4o"),
                result_data,
                verbose=True,
                agent_type=AgentType.OPENAI_FUNCTIONS,
                allow_dangerous_code = True
            )

    response = agent.invoke(f"Return the following information: {natural_query}")
    return response

In [11]:
select_records("From the PopulationStats container, return the top 3 countries with the highest population in 2020")

Select query: SELECT * FROM PopulationStats
Additional query: top 3 countries with the highest population in 2020


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df.nlargest(3, '2020_Population')[['Country_Territory', '2020_Population']]"}`


[0m[36;1m[1;3m    Country_Territory  2020_Population
34              China       1424929781
71              India       1396387127
172     United States        335942003[0m[32;1m[1;3mThe top 3 countries with the highest population in 2020 are:

1. **China** with a population of 1,424,929,781
2. **India** with a population of 1,396,387,127
3. **United States** with a population of 335,942,003[0m

[1m> Finished chain.[0m


{'input': 'Return the following information: top 3 countries with the highest population in 2020',
 'output': 'The top 3 countries with the highest population in 2020 are:\n\n1. **China** with a population of 1,424,929,781\n2. **India** with a population of 1,396,387,127\n3. **United States** with a population of 335,942,003'}

### Creating a LangChain Chatbot Interaction with GridDB Data

In [32]:
class SelectData(BaseModel):
    container_name: str = Field(description="the container name from the user query")
    query:str = Field(description="natural language converted to SELECT query")



system_command = """ 
Convert user commands into SQL queries for Griddb.
"""

user_prompt = ChatPromptTemplate.from_messages([
    ("system", system_command),
    ("user", "{input}")
])

select_chain = user_prompt | llm.with_structured_output(SelectData)

def select_records(query):

    select_data = select_chain.invoke(query)
    container_name = select_data.container_name
    select_query = select_data.query
    
    result_container = gridstore.get_container(container_name)
    query = result_container.query(select_query)
    rs = query.fetch()
    result_data = rs.fetch_rows()
    return result_data


result_data = select_records("SELECT all records from PopulationStats container")

In [33]:
agent = create_pandas_dataframe_agent(
    ChatOpenAI(
        api_key=OPENAI_API_KEY,
        temperature=0,
        model="gpt-4"
    ),
    result_data,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    allow_dangerous_code=True,
)



def get_response(natural_query):
    # Create a conversation chain


    # Get the response from the agent
    response = agent.invoke(f"Return the following information: {natural_query}")

    # Add the interaction to the conversation memory
    return response


# Function to chat with the agent
def chat_with_agent():
    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit', 'bye']:
            print("AI: Goodbye!")
            break
        
        response = get_response(user_input)
        print(f"AI: {response['output']}")

chat_with_agent()

You:  What are the top 3 countries wth the highest population in 2020?


AI: The top 3 countries with the highest population in 2020 are China, India, and the United States.


You:  What percentage of world population China had in 2020?


AI: China had approximately 17.88% of the world population in 2020.


You:  And what percentage of world population brazil had in 2022?


AI: Brazil had 2.7% of the world population in 2022.


You:  bye


AI: Goodbye!
