### A note on terminology:

You'll notice that there are quite a few similarities between LangChain and LlamaIndex. LlamaIndex can largely be thought of as an extension to LangChain, in some ways - but they moved some of the language around. Let's spend a few moments disambiguating the language.

- `QueryEngine` -> `RetrievalQA`:
  -  `QueryEngine` is just LlamaIndex's way of indicating something is an LLM "chain" on top of a retrieval system
- `OpenAIAgent` vs. `ZeroShotAgent`:
  - The two agents have the same fundamental pattern: Decide which of a list of tools to use to answer a user's query.
  - `OpenAIAgent` (LlamaIndex's primary agent) does not need to rely on an agent excecutor due to the fact that it is leveraging OpenAI's [functional api](https://openai.com/blog/function-calling-and-other-api-updates) which allows the agent to interface "directly" with the tools instead of operating through an intermediary application process.

There is, however, a much large terminological difference when it comes to discussing data.

##### Nodes vs. Documents

As you're aware of from the previous weeks assignments, there's an idea of `documents` in NLP which refers to text objects that exist within a corpus of documents.

LlamaIndex takes this a step further and reclassifies `documents` as `nodes`. Confusingly, it refers to the `Source Document` as simply `Documents`.

The `Document` -> `node` structure is, almost exactly, equivalent to the `Source Document` -> `Document` structure found in LangChain - but the new terminology comes with some clarity about different structure-indices. 

We won't be leveraging those structured indicies today, but we will be leveraging a "benefit" of the `node` structure that exists as a default in LlamaIndex, which is the ability to quickly filter nodes based on their metadata.

![image](https://i.imgur.com/B1QDjs5.png)

# Creating a more robust RAQA system using LlamaIndex

We'll be putting together a system for querying both qualitative and quantitative data using LlamaIndex. 

To stick to a theme, we'll continue to use BarbenHeimer data as our base - but this can, and should, be extended to other topics/domains.

# Build 🏗️
There are 3 main tasks in this notebook:

- Create a Qualitative VectorStore query engine
- Create a quantitative NLtoSQL query engine
- Combine the two using LlamaIndex's OpenAI agent framework.

# Ship 🚢
Create an host a Gradio or Chainlit application to serve your project on Hugging Face spaces.

# Share 🚀
Make a social media post about your final application and tag @AIMakerspace

### BOILERPLATE

This is only relevant when running the code in a Jupyter Notebook.

In [1]:
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

### Primary Dependencies and Context Setting

#### Dependencies and OpenAI API key setting

First of all, we'll need our primary libraries - and to set up our OpenAI API key.

In [2]:
%pip install -U -q openai==0.27.8 llama-index==0.8.40 nltk==3.8.1 

Note: you may need to restart the kernel to use updated packages.


In [3]:
import llama_index

llama_index.__version__

INFO:numexpr.utils:Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


'0.8.40'

In [4]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()

True

In [5]:
pip install arxiv

Note: you may need to restart the kernel to use updated packages.


In [6]:
from langchain.document_loaders import WikipediaLoader, CSVLoader, ArxivLoader
from aimakerspace.text_utils import TextFileLoader,CharacterTextSplitter

sec_wikipedia_docs = ArxivLoader(
    query="CDP Questionnaire with regards transport", 
    load_max_docs= 5,# YOUR CODE HERE, 
    doc_content_chars_max=1_000_000### YOUR CODE HERE
    ).load()
text_loader=TextFileLoader("./data/BusinessTravelReport.txt")
business_travel_documents = text_loader.load_documents()

INFO:arxiv.arxiv:Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=CDP+Questionnaire+with+regards+transport&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100
Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=CDP+Questionnaire+with+regards+transport&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100
INFO:arxiv.arxiv:Got first page: 50 of 1683690 total results
Got first page: 50 of 1683690 total results


In [7]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

import openai
openai.api_key = os.environ["OPENAI_API_KEY"]

In [8]:
# os.environ["WANDB_API_KEY"] = getpass.getpass("WandB API Key: ")

In [9]:
pip install wandb

Note: you may need to restart the kernel to use updated packages.


In [10]:
# os.environ["WANDB_API_KEY"] = getpass.getpass("WandB API Key: ")


In [11]:
pip install wandb


Note: you may need to restart the kernel to use updated packages.


In [12]:
pip install wandb_callback

[31mERROR: Could not find a version that satisfies the requirement wandb_callback (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for wandb_callback[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [13]:

from llama_index import set_global_handler

set_global_handler("wandb", run_args={"project": "llamaindex-demo-v1"})
wandb_callback = llama_index.global_handler

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


[34m[1mwandb[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/sumush/llamaindex-demo-v1/runs/dcqo5nko
[34m[1mwandb[0m: `WandbCallbackHandler` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.


#### Context Setting

Now, LlamaIndex has the ability to set `ServiceContext`. You can think of this as a config file of sorts. The basic idea here is that we use this to establish some core properties and then can pass it to various services. 

While we could set this up as a global context, we're going to leave it as `ServiceContext` so we can see where it's applied.

We'll set a few significant contexts:

- `chunk_size` - this is what it says on the tin
- `llm` - this is where we can set what model we wish to use as our primary LLM when we're making `QueryEngine`s and more
- `embed_model` - this will help us keep our embedding model consistent across use cases


We'll also create some resources we're going to keep consistent across all of our indices today.

- `text_splitter` - This is what we'll use to split our text, feel free to experiment here
- `SimpleNodeParser` - This is what will work in tandem with the `text_splitter` to parse our full sized documents into nodes.

In [14]:
from llama_index import ServiceContext
from llama_index.node_parser.simple import SimpleNodeParser
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.llms import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()### YOUR CODE HERE
chunk_size = 500### YOUR CODE HERE
llm = OpenAI(
    temperature=0,### YOUR CODE HERE
    model="gpt-3.5-turbo-0613",### YOUR CODE HERE
    streaming=True
)

service_context = ServiceContext.from_defaults(
    llm=llm,### YOUR CODE HERE
    chunk_size=chunk_size,### YOUR CODE HERE
    embed_model=embed_model### YOUR CODE HERE
)

text_splitter = TokenTextSplitter(
    chunk_size=chunk_size### YOUR CODE HERE
)

node_parser = SimpleNodeParser(
    text_splitter=text_splitter### YOUR CODE HERE
)

###  Wikipedia Retrieval Tool

Now we can get to work creating our semantic `QueryEngine`!

We'll follow a similar pattern as we did with LangChain here - and the first step (as always) is to get dependencies.

In [15]:
%pip install -U -q tiktoken==0.4.0 sentence-transformers==2.2.2 pydantic==1.10.11

Note: you may need to restart the kernel to use updated packages.


#### GPTIndex

We'll be using [GPTIndex](https://gpt-index.readthedocs.io/en/v0.6.2/reference/indices/vector_store.html) as our `VectorStore` today!

It works in a similar fashion to tools like Pinecone, Weaveate, and more - but it's locally hosted and will serve our purposes fine. 

Also the `GPTIndex` is integrated with WandB for index versioning.

You'll also notice the return of `OpenAIEmbedding()`, which is the embeddings model we'll be leveraging. Of course, this is using the `ada` model under the hood - and already comes equipped with in-memory caching.

You'll notice we can pass our `service_context` into our `VectorStoreIndex`!

In [16]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents([], service_context=service_context)

[34m[1mwandb[0m: Logged trace tree to W&B.


In [17]:
%pip install -U -q wikipedia
%pip install wandb
%pip install wandb_callback

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
[31mERROR: Could not find a version that satisfies the requirement wandb_callback (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for wandb_callback[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


Essentially the same as the LangChain example - we're just going to be pulling information straight from Wikipedia using the built in `WikipediaReader`.

Setting `auto_suggest=False` ensures we run into fewer auto-correct based errors.

In [18]:
from llama_index.readers.wikipedia import WikipediaReader

cdp_list = [
    "Carbon Disclosure Project", 
    "Carbon Disclosure Project"
]

wiki_docs = WikipediaReader().load_data(
    pages=cdp_list,
    auto_suggest=False
    ### YOUR CODE HERE
)

#### Node Construction

Now we will loop through our documents and metadata and construct nodes (associated with particular metadata for easy filtration later).

We're using the `node_parser` we created at the top of the Notebook.

In [19]:
for cdp_doc, wiki_doc in zip(cdp_list, wiki_docs):
    nodes = node_parser.get_nodes_from_documents([wiki_doc])
    for node in nodes:
        node.metadata = {"title" : cdp_doc}
    index.insert_nodes(nodes)

In [20]:
pip install wandb_callback

[31mERROR: Could not find a version that satisfies the requirement wandb_callback (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for wandb_callback[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [21]:
wandb_callback.persist_index(index, index_name="wiki-index")

[34m[1mwandb[0m: Adding directory to artifact (/Users/ukizhake/Documents/LLM-Ops-Cohort-2-main-week3-thurs/Week 3/Thursday/wandb/run-20231202_161558-dcqo5nko/files/storage)... Done. 0.0s


In [22]:
pip install wandb_callback

[31mERROR: Could not find a version that satisfies the requirement wandb_callback (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for wandb_callback[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [23]:
from llama_index import load_index_from_storage

storage_context = wandb_callback.load_storage_context(
    artifact_url="sumush/llamaindex-demo-v1/wiki-index:v33" ### YOUR ARTIFACT URL HERE
)

index = load_index_from_storage(storage_context, service_context=service_context)

[34m[1mwandb[0m:   4 of 4 files downloaded.  


INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
Failed to log trace tree to W&B: list index out of range


In [24]:
wandb_callback.load_storage_context(artifact_url="sumush/llamaindex-demo-v1/wiki-index:v33")

[34m[1mwandb[0m:   4 of 4 files downloaded.  


StorageContext(docstore=<llama_index.storage.docstore.simple_docstore.SimpleDocumentStore object at 0x2a5848290>, index_store=<llama_index.storage.index_store.simple_index_store.SimpleIndexStore object at 0x2a5b0fc90>, vector_store=<llama_index.vector_stores.simple.SimpleVectorStore object at 0x2a5b1fc90>, graph_store=<llama_index.graph_stores.simple.SimpleGraphStore object at 0x2a56f7a10>)

#### Auto Retriever Functional Tool

This tool will leverage OpenAI's functional endpoint to select the correct metadata filter and query the filtered index - only looking at nodes with the desired metadata.

A simplified diagram: ![image](https://i.imgur.com/AICDPav.png)

First, we need to create our `VectoreStoreInfo` object which will hold all the relevant metadata we need for each component (in this case title metadata).

Notice that you need to include it in a text list.

In [25]:
from llama_index.tools import FunctionTool
from llama_index.vector_stores.types import (
    VectorStoreInfo,
    MetadataInfo,
    ExactMatchFilter,
    MetadataFilters,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine

from typing import List, Tuple, Any
from pydantic import BaseModel, Field

top_k = 3



Now we'll create our base PyDantic object that we can use to ensure compatability with our application layer. This verifies that the response from the OpenAI endpoint conforms to this schema.

In [26]:
class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(
        ..., description="List of metadata filter field names"
    )
    filter_value_list: List[str] = Field(
        ...,
        description=(
            "List of metadata filter field values (corresponding to names specified in filter_key_list)"
        )
    )

Now we can build our function that we will use to query the functional endpoint.

>The `docstring` is important to the functionality of the application.

In [27]:
def auto_retrieve_fn(
    query: str, filter_key_list: List[str], filter_value_list: List[str]
):
    """Auto retrieval function.

    Performs auto-retrieval from a vector database, and then applies a set of filters.

    """
    query = query or "Query"

    exact_match_filters = [
        ExactMatchFilter(key=k, value=v)
        for k, v in zip(filter_key_list, filter_value_list)
    ]
    retriever = VectorIndexRetriever(
        index, filters=MetadataFilters(filters=exact_match_filters), top_k=top_k
    )
    query_engine = RetrieverQueryEngine.from_args(retriever, service_context=service_context)

    response = query_engine.query(query)
    return str(response)

Now we need to wrap our system in a tool in order to integrate it into the larger application.

Source Code Here:
- [`FunctionTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/function_tool.py#L21)

In [28]:
from typing import Callable 
vector_store_info = VectorStoreInfo(
    content_info="semantic information about carbon disclosure",
    metadata_info=[MetadataInfo(
        name="title",
        type="str",
        description="title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]",
        # to_openai_function=
    )]
)
description = f"""\
Use this tool to look up semantic information about films.
The vector database schema is given below:
{vector_store_info.json()}
"""

auto_retrieve_tool = FunctionTool.from_defaults(
    fn=auto_retrieve_fn,### YOUR CODE HERE
    name="semantic-cdp-info",### YOUR CODE HERE
    description=description,### YOUR CODE HERE
    fn_schema=AutoRetrieveModel,### YOUR CODE HERE
    # tool_metadata=vector_store_info.metadata_info,
)

All that's left to do is attach the tool to an OpenAIAgent and let it rip!

Source Code Here:
- [`OpenAIAgent`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/agent/openai_agent.py#L361)

In [29]:
from llama_index.agent import OpenAIAgent

agent = OpenAIAgent.from_tools(
    tools=[
        auto_retrieve_tool### YOUR CODE HERE
    ],
)

In [30]:
agent.chat("what are the different business travel carbon emissions")

[<llama_index.tools.function_tool.FunctionTool object at 0x2a5bf9890>]
ToolMetadata(description='Use this tool to look up semantic information about films.\nThe vector database schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]"}], "content_info": "semantic information about carbon disclosure"}\n', name='semantic-cdp-info', fn_schema=<class '__main__.AutoRetrieveModel'>)


[34m[1mwandb[0m: Logged trace tree to W&B.


AgentChatResponse(response="I'm sorry, but I couldn't find any specific information about different business travel carbon emissions in the Carbon Disclosure Project database.", sources=[ToolOutput(content='Empty Response', tool_name='semantic-cdp-info', raw_input={'args': (), 'kwargs': {'query': 'business travel carbon emissions', 'filter_key_list': ['title'], 'filter_value_list': ['Carbon Disclosure Project air']}}, raw_output='Empty Response')], source_nodes=[])

In [33]:
response = agent.chat("Tell me briefly about the Carbon Disclosure Project  ")
print(str(response))

[<llama_index.tools.function_tool.FunctionTool object at 0x2a5bf9890>]
ToolMetadata(description='Use this tool to look up semantic information about films.\nThe vector database schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]"}], "content_info": "semantic information about carbon disclosure"}\n', name='semantic-cdp-info', fn_schema=<class '__main__.AutoRetrieveModel'>)


[34m[1mwandb[0m: Logged trace tree to W&B.


The Carbon Disclosure Project (CDP) is an international non-profit organization that works with companies, cities, states, and regions to help them disclose their environmental impact. CDP collects and analyzes data on carbon emissions, water usage, and deforestation from thousands of organizations worldwide. This data is then made available to investors, corporations, and policymakers to inform decision-making and drive action towards a more sustainable economy. CDP's mission is to encourage transparency and accountability in environmental reporting, and to promote the adoption of sustainable practices to address climate change and other environmental challenges.


### Business travel air SQL Tool

We'll walk through the steps of creating a natural language to SQL system in the following section.

> NOTICE: This does not have parsing on the inputs or intermediary calls to ensure that users are using safe SQL queries. Use this with caution in a production environment without adding specific guardrails from either side of the application.

In [34]:
%pip install -q -U sqlalchemy pandas

Note: you may need to restart the kernel to use updated packages.


The next few steps should be largely straightforward, we'll want to:

1. Read in our `.csv` files into `pd.DataFrame` objects
2. Create an in-memory `sqlite` powered `sqlalchemy` engine
3. Cast our `pd.DataFrame` objects to the SQL engine
4. Create an `SQLDatabase` object through LlamaIndex
5. Use that to create a `QueryEngineTool` that we can interact with through the `NLSQLTableQueryEngine`!

If you get stuck, please consult the documentation.

#### Read `.csv` Into Pandas

In [35]:
import pandas as pd

cdp_business_travel_air_df = pd.read_csv("./data/BusinessTravelAir.csv")
cdp_business_travel_rail_df = pd.read_csv("./data/BusinessTravelRail.csv")

In [36]:
cdp_business_travel_air_df

Unnamed: 0,Unnamed: 1,Unnamed: 2,"Table 3. Air,,,,,,\r\n ID, Description, Length,Miles""",CO2 Emissions,CH4 Emissions,N2O Emissions
JD-001,John Doe 1,"Medium Haul (>= 300 miles, < 2300 miles)",1000,229,10.4,8.5
JD-002,Mark,Short Haul (< 300 miles),2200,455,14.1,14.5
JD-003,Chris,Long Haul (>= 2300 miles),4799,782,2.9,25.0
JD-004,Brian,"Medium Haul (>= 300 miles, < 2300 miles)",2000,258,1.2,8.2


#### Create SQLAlchemy engine with SQLite

In [37]:
from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///:memory:")

#### Convert `pd.DataFrame` to SQL tables

In [38]:
cdp_business_travel_air_df.to_sql(
    "cdp_business_travel_air",
    engine
)

4

In [39]:
cdp_business_travel_rail_df.to_sql(
    "cdp_business_travel_rail",
    engine
)

7

#### Construct a `SQLDatabase` index

Source Code Here:
- [`SQLDatabase`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/langchain_helpers/sql_wrapper.py#L9)

In [40]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(
    engine=engine,
    include_tables=[
        "cdp_business_travel_air",### YOUR CODE HERE
        "cdp_business_travel_rail",### YOUR CODE HER
    ]
)

#### Create the NLSQLTableQueryEngine interface for all added SQL tables

Source Code Here:
- [`NLSQLTableQueryEngine`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/indices/struct_store/sql_query.py#L75C1-L75C1)

In [41]:
from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine

sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,### YOUR CODE HERE
    tables=[
        "cdp_business_travel_air",### YOUR CODE HERE,
        "cdp_business_travel_rail",### YOUR CODE HER
    ],### YOUR CODE HERE, 
    service_context=service_context,### YOUR CODE HERE
)

#### Wrap It All Up in a `QueryEngineTool`

You'll want to ensure you have a descriptive...description. 

An example is provided here:

```
"Useful for translating a natural language query into a SQL query over a table containing: "
"barbie, containing information related to reviews of the Barbie movie"
"oppenheimer, containing information related to reviews of the Oppenheimer movie"
```

Sorce Code Here: 

- [`QueryEngineTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/query_engine.py#L13)

In [42]:
from llama_index.tools.query_engine import QueryEngineTool,ToolMetadata

sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,### YOUR CODE HERE
    name="sql-query",### YOUR CODE HERE
    description=(   
      "Useful for translating a natural language query into a SQL query over a table containing: "
      "business travel air, containing information wrt company business travel by air "
      "business travel rail, containing information wrt  company business travel by rail"
        ### YOUR CODE HERE
    ),
)

In [43]:
print(str(response))

The Carbon Disclosure Project (CDP) is an international non-profit organization that works with companies, cities, states, and regions to help them disclose their environmental impact. CDP collects and analyzes data on carbon emissions, water usage, and deforestation from thousands of organizations worldwide. This data is then made available to investors, corporations, and policymakers to inform decision-making and drive action towards a more sustainable economy. CDP's mission is to encourage transparency and accountability in environmental reporting, and to promote the adoption of sustainable practices to address climate change and other environmental challenges.


### Combining The Tools Together

Now, we can simple add our tools into the `OpenAIAgent`, and off we go!

In [44]:
co2_new_agent = OpenAIAgent.from_tools(
    tools=[
        auto_retrieve_tool,### YOUR CODE HERE
        sql_tool### YOUR CODE HERE
    ],
)

In [45]:
response = co2_new_agent.chat("What is the average CO2 emissions and CH4 emissions for  business travel air . think step by step and show us the steps")

[<llama_index.tools.function_tool.FunctionTool object at 0x2a5bf9890>, <llama_index.tools.query_engine.QueryEngineTool object at 0x2b08a8f50>]
ToolMetadata(description='Use this tool to look up semantic information about films.\nThe vector database schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]"}], "content_info": "semantic information about carbon disclosure"}\n', name='semantic-cdp-info', fn_schema=<class '__main__.AutoRetrieveModel'>)


INFO:llama_index.indices.struct_store.sql_query:> Table desc str: Table 'cdp_business_travel_air' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEXT), Table 3.  Air,,,,,,
 ID, Description, Length,Miles" (TEXT), CO2 Emissions (BIGINT), CH4 Emissions (FLOAT), N2O Emissions (FLOAT), and foreign keys: .

Table 'cdp_business_travel_rail' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEXT), Table 2.  Rail,,,,,,
 ID, Description,Vehicle ,Miles" (TEXT), CO2 Emissions (BIGINT), CH4 Emissions (FLOAT), N2O Emissions (FLOAT), and foreign keys: .
> Table desc str: Table 'cdp_business_travel_air' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEXT), Table 3.  Air,,,,,,
 ID, Description, Length,Miles" (TEXT), CO2 Emissions (BIGINT), CH4 Emissions (FLOAT), N2O Emissions (FLOAT), and foreign keys: .

Table 'cdp_business_travel_rail' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEXT), Table 2.  Rail,,,,,,
 ID, Description,Vehicle ,Miles" (TEXT), CO2 Emissions (BIGINT), CH

[34m[1mwandb[0m: Logged trace tree to W&B.


In [46]:
print(str(response))

The average CO2 emissions for business travel by air is 431.0, and the average CH4 emissions for business travel by air is approximately 7.15.


In [55]:
response = co2_new_agent.chat("What is the average CO2 emissions, CH4 emissions and N2O emissions for Business travel air emissions, also calculate average CO2 emissions, CH4 emissions and N2O emissions for Business travel rail emissions and give me a summary of Co2 ch4 and n2o emissions?")

[<llama_index.tools.function_tool.FunctionTool object at 0x2a5bf9890>, <llama_index.tools.query_engine.QueryEngineTool object at 0x2b08a8f50>]
ToolMetadata(description='Use this tool to look up semantic information about films.\nThe vector database schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]"}], "content_info": "semantic information about carbon disclosure"}\n', name='semantic-cdp-info', fn_schema=<class '__main__.AutoRetrieveModel'>)
INFO:llama_index.indices.struct_store.sql_query:> Table desc str: Table 'cdp_business_travel_air' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEXT), Table 3.  Air,,,,,,
 ID, Description, Length,Miles" (TEXT), CO2 Emissions (BIGINT), CH4 Emissions (FLOAT), N2O Emissions (FLOAT), and foreign keys: .

Table 'cdp_business_travel_rail' has columns: level_0 (TEXT), level_1 (TEXT), level_2 (TEX

[34m[1mwandb[0m: Logged trace tree to W&B.


In [56]:
res=str(response)
res


'None'

In [57]:
business_travel_documents[0]

'Business travel emissions covered by target (metric tons CO2e)\nThe average CO2 emissions for business travel by air is \nThe average CH4 emissions for business travel by air is \nThe average N2O emissions for business travel by air is \nThe average CO2 emissions for business travel by rail is \nThe average CH4 emissions for business travel by rail is \nThe average N2O emissions for business travel by rail is \nThe max CO2 emissions for business travel by air is \nThe max CH4 emissions for business travel by air is \nThe max N2O emissions for business travel by air is \nThe max CO2 emissions for business travel by rail is \nThe max CH4 emissions for business travel by rail is \nThe max N2O emissions for business travel by rail is \nIn summary, \n'

In [58]:
chatmsg = "given this information "+ res +" and given this template : "+business_travel_documents[0]+ " and now,please generate a report"
chatmsg

'given this information None and given this template : Business travel emissions covered by target (metric tons CO2e)\nThe average CO2 emissions for business travel by air is \nThe average CH4 emissions for business travel by air is \nThe average N2O emissions for business travel by air is \nThe average CO2 emissions for business travel by rail is \nThe average CH4 emissions for business travel by rail is \nThe average N2O emissions for business travel by rail is \nThe max CO2 emissions for business travel by air is \nThe max CH4 emissions for business travel by air is \nThe max N2O emissions for business travel by air is \nThe max CO2 emissions for business travel by rail is \nThe max CH4 emissions for business travel by rail is \nThe max N2O emissions for business travel by rail is \nIn summary, \n and now,please generate a report'

In [59]:
# report = co2_new_agent.chat(" create a report using {res}".format(res=response))
report = co2_new_agent.chat(chatmsg)

[<llama_index.tools.function_tool.FunctionTool object at 0x2a5bf9890>, <llama_index.tools.query_engine.QueryEngineTool object at 0x2b08a8f50>]
ToolMetadata(description='Use this tool to look up semantic information about films.\nThe vector database schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the emissions reporting methods, one of [Carbon Disclosure Project air, Carbon Disclosure Project rail]"}], "content_info": "semantic information about carbon disclosure"}\n', name='semantic-cdp-info', fn_schema=<class '__main__.AutoRetrieveModel'>)


[34m[1mwandb[0m: Logged trace tree to W&B.


In [60]:
print(str(report))

Business Travel Emissions Report:

Summary of CO2, CH4, and N2O emissions for business travel:

- Business travel by air:
  - Average CO2 emissions: [Insert average CO2 emissions for business travel by air] metric tons CO2e
  - Average CH4 emissions: [Insert average CH4 emissions for business travel by air] metric tons CO2e
  - Average N2O emissions: [Insert average N2O emissions for business travel by air] metric tons CO2e
  - Maximum CO2 emissions: [Insert maximum CO2 emissions for business travel by air] metric tons CO2e
  - Maximum CH4 emissions: [Insert maximum CH4 emissions for business travel by air] metric tons CO2e
  - Maximum N2O emissions: [Insert maximum N2O emissions for business travel by air] metric tons CO2e

- Business travel by rail:
  - Average CO2 emissions: [Insert average CO2 emissions for business travel by rail] metric tons CO2e
  - Average CH4 emissions: [Insert average CH4 emissions for business travel by rail] metric tons CO2e
  - Average N2O emissions: [Inse

In [61]:
print(str(response))

None


In [62]:
wandb_callback.finish()

