# Building our First RAG bot - Skill: talk to Search Engine

We have now all the building blocks to build our first Bot that "talks with my data". These blocks are:

1) A well indexed hybrid (text and vector) engine with my data in chunks -> Azure AI Search
2) A good LLM python framework to build LLM Apps -> LangChain
3) Quality OpenAI GPT models that understand language and follow instructions -> GPT3.5 and GPT4
4) A persisten memory database -> CosmosDB

We are missing just one thing: **Agents**.

In this Notebook we introduce the concept of Agents and we use it to build or first RAG bot.

In [4]:
import random
import asyncio
from typing import Dict, List
from concurrent.futures import ThreadPoolExecutor
from typing import Optional, Type

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import ConfigurableField, ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain.callbacks.manager import AsyncCallbackManagerForToolRun, CallbackManagerForToolRun
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

#custom libraries that we will use later in the app
from common.utils import  GetDocSearchResults_Tool
from common.prompts import AGENT_DOCSEARCH_PROMPT

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

from dotenv import load_dotenv
load_dotenv()


True

In [5]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

## Introducing: Agents

The implementation of Agents is inspired by two papers: the [MRKL Systems](https://arxiv.org/abs/2205.00445) paper (pronounced ‘miracle’ 😉) and the [ReAct](https://arxiv.org/abs/2210.03629) paper.

Agents are a way to leverage the ability of LLMs to understand and act on prompts. In essence, an Agent is an LLM that has been given a very clever initial prompt. The prompt tells the LLM to break down the process of answering a complex query into a sequence of steps that are resolved one at a time.

Agents become really cool when we combine them with ‘experts’, introduced in the MRKL paper. Simple example: an Agent might not have the inherent capability to reliably perform mathematical calculations by itself. However, we can introduce an expert - in this case a calculator, an expert at mathematical calculations. Now, when we need to perform a calculation, the Agent can call in the expert rather than trying to predict the result itself. This is actually the concept behind [ChatGPT Pluggins](https://openai.com/blog/chatgpt-plugins).

In our case, in order to solve the problem "How do I build a smart bot that talks to my data", we need this REACT/MRKL approach, in which we need to instruct the LLM that it needs to use 'experts/tools' in order to read/load/understand/interact with a any particular source of data.

Let's create then an Agent that interact with the user and uses a Tool to get the information from the Search engine.

#### We start first defining the Tool/Expert

In [6]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books"
indexes = [index1_name, index2_name, index3_name]

We have to convert the Retreiver object into a Tool object ("the expert"). Check out the Tool `GetDocSearchResults_Tool` in `utils.py`

Declare the tools the agent will use

In [7]:
topK=7
tools = [GetDocSearchResults_Tool(indexes=indexes, k=5, reranker_th=1, sas_token=os.environ['BLOB_SAS_TOKEN'])]

Get the prompt to use `AGENT_DOCSEARCH_PROMPT` - you can modify this in `prompts.py`! Check it out!

In [8]:
prompt = AGENT_DOCSEARCH_PROMPT

Define the LLM to use

In [9]:
COMPLETION_TOKENS = 1500
llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], temperature=0.5, max_tokens=COMPLETION_TOKENS, streaming=True).configurable_alternatives(
    ConfigurableField(id="model"),
    default_key="gpt35",
    gpt4=AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], temperature=0.5, max_tokens=COMPLETION_TOKENS, streaming=True),
)

Construct the OpenAI Tools agent.
> OpenAI API has deprecated functions in favor of tools. The difference between the two is that the tools API allows the model to request that multiple functions be invoked at once, which can reduce response times in some architectures. It’s recommended to use the tools agent for OpenAI models.

In [10]:
agent = create_openai_tools_agent(llm.with_config(configurable={"model": "gpt35"}), tools, prompt)

Create an agent executor by passing in the agent and tools

In [11]:
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False)

Give it memory - since AgentExecutor is also a Runnable class, we do the same with did on Notebook 5

In [12]:
def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos

Because cosmosDB needs two fields (an id and a partition), and RunnableWithMessageHistory takes by default only one identifier for memory (session_id), we need to use `history_factory_config` parameter and define the multiple keys for the memory class

In [13]:
userid_spec = ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        )
session_id = ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        )

In [14]:
agent_with_chat_history = RunnableWithMessageHistory(
    agent_executor,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[userid_spec,session_id]
)

In [15]:
# configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}
config

{'configurable': {'session_id': 'session986', 'user_id': 'user57'}}

Run the Agent!

In [16]:
%%time
agent_with_chat_history.invoke({"question": "Hi, I'm Pablo Marin. What's yours"}, config=config)

CPU times: user 197 ms, sys: 9.14 ms, total: 206 ms
Wall time: 1.03 s


{'question': "Hi, I'm Pablo Marin. What's yours",
 'history': [],
 'output': "Hello Pablo Marin, I'm Jarvis, your AI assistant. How can I assist you today?"}

In [17]:
printmd(agent_with_chat_history.invoke(
    {"question": "What are markov chains and is there an application in medicine?"}, 
    config=config)["output"])

Markov chains are mathematical models that describe a sequence of events where the probability of each event depends only on the previous event. These models are based on the concept of memorylessness, meaning that the future behavior of the system depends only on its current state and is independent of its past states.

In the context of medicine, Markov chains have various applications. Here are a few examples:

1. Disease Progression Modeling: Markov chains can be used to model the progression of diseases over time. By dividing the disease progression into discrete states, such as healthy, mild, moderate, and severe, and assigning transition probabilities between these states, healthcare professionals can simulate the progression of the disease and assess the effectiveness of different interventions or treatment strategies.

2. Pharmacokinetics: Markov models are used to study the absorption, distribution, metabolism, and excretion of drugs in the body. These models help in understanding how drugs are processed by the body over time and how their concentrations change in different organs or tissues. This information is crucial for optimizing drug dosing regimens and predicting drug interactions.

3. Epidemiology and Public Health: Markov models are used to study the spread of infectious diseases within a population. By modeling the transitions between different disease states (e.g., susceptible, infected, recovered), researchers can estimate the impact of interventions such as vaccination or quarantine measures on disease transmission and control.

4. Health Economic Modeling: Markov models are widely used in health economics to evaluate the cost-effectiveness of healthcare interventions. These models simulate the natural history of a disease, treatment options, and associated costs over time. By comparing different treatment strategies, policymakers can make informed decisions about resource allocation and healthcare policy.

It's important to note that while Markov chains provide valuable insights into various medical scenarios, they are simplified representations of complex systems and make certain assumptions about the underlying processes. Therefore, their accuracy and applicability depend on the specific context and the quality of the data used to parameterize the models.

References:
1. [Modeling Infectious Disease Transmission using Markov Chains](https://doi.org/10.1111/ina.12056) - Source 1
2. [Fast Prediction of Transient Particle Transport in Enclosed Environments using Combined CFD and Markov Chain Method](https://www.ncbi.nlm.nih.gov/pubmed/23789964/) - Source 1
3. [Fast Prediction of Transient Particle Transport using Combined FFD and Markov Chain Model](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090511/) - Source 2
4. [Rapid Mixing of the Face-Reversal Markov Chain on Eulerian Orientations of Solid Subgraphs of the Triangular Lattice](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/0703/0703031v1.pdf) - Source 3

In [18]:
try:
    printmd(agent_with_chat_history.invoke(
        {"question": "Interesting, Tell me more about the use specifically in the spread of viruses"},
        config=config)["output"])
except Exception as e:
    print(e)

Certainly! Markov chains have been widely used to model the spread of viruses and infectious diseases within a population. These models help researchers and public health officials understand the dynamics of disease transmission and evaluate the effectiveness of various control measures. Here are some key aspects of using Markov chains in the context of virus spread:

1. State Transitions: In a Markov chain model for virus spread, different states represent the possible conditions of individuals within the population, such as susceptible, infected, recovered, or deceased. The transitions between these states are governed by probabilities that depend on factors like contact rates, transmission probabilities, and recovery rates. By modeling these transitions, researchers can simulate the spread of the virus over time.

2. Epidemiological Parameters: Markov chain models require the estimation of epidemiological parameters to accurately represent the virus's behavior. These parameters include the basic reproduction number (R0), which indicates the average number of secondary infections caused by an infected individual in a susceptible population. Other parameters include the duration of the infectious period, the probability of transmission per contact, and the rate of recovery or mortality. Estimating these parameters from available data is crucial for model accuracy.

3. Intervention Strategies: Markov chain models allow researchers to evaluate the impact of different intervention strategies on virus spread. By modifying the transition probabilities in the model, such as reducing contact rates or increasing the effectiveness of preventive measures like vaccination or social distancing, researchers can assess the potential effectiveness of these interventions in reducing the spread of the virus.

4. Sensitivity Analysis: Markov chain models can be used to perform sensitivity analyses to understand the impact of uncertainties in parameter values on model outcomes. By varying the parameter values within plausible ranges, researchers can assess the robustness of their findings and identify key factors that influence the spread of the virus.

It's important to note that Markov chain models provide a simplified representation of virus spread and make certain assumptions about the population dynamics and transmission processes. Real-world virus spread is influenced by various complex factors, including population demographics, behavior, and spatial dynamics. Therefore, these models are most effective when used in combination with other modeling approaches and empirical data to inform public health decision-making.

References:
1. [Modeling Infectious Disease Transmission using Markov Chains](https://doi.org/10.1111/ina.12056) - Source 1
2. [Fast Prediction of Transient Particle Transport in Enclosed Environments using Combined CFD and Markov Chain Method](https://www.ncbi.nlm.nih.gov/pubmed/23789964/) - Source 1
3. [Fast Prediction of Transient Particle Transport using Combined FFD and Markov Chain Model](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090511/) - Source 2
4. [Rapid Mixing of the Face-Reversal Markov Chain on Eulerian Orientations of Solid Subgraphs of the Triangular Lattice](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/0703/0703031v1.pdf) - Source 3

In [19]:
printmd(agent_with_chat_history.invoke({"question": "Thhank you!"}, config=config)["output"])

You're welcome! If you have any more questions, feel free to ask. I'm here to help!

#### Important: there is a limitation of GPT3.5, once we start adding long prompts, and long contexts and thorough answers, or the agent makes multiple searches for multi-step questions, we run out of space!

You can minimize this by:
- Shorter System Prompt
- Smaller chunks (less than the default of 5000 characters)
- Reducing topK to bring less relevant chunks

However, you ultimately are sacrificing quality to make everything work with GPT3.5 (cheaper and faster model)

### Let's add more things we have learned so far: dynamic LLM selection of GPT4 and asyncronous streaming

In [20]:
agent = create_openai_tools_agent(llm.with_config(configurable={"model": "gpt4"}), tools, prompt) # We select now GPT-4
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
agent_with_chat_history = RunnableWithMessageHistory(agent_executor,get_session_history,input_messages_key="question", 
                                                     history_messages_key="history", history_factory_config=[userid_spec,session_id])

In prior notebooks with use the function `.stream()` of the runnable in order to stream the tokens. However if you need to stream individual tokens from the agent or surface steps occuring within tools, you would need to use a combination of `Callbacks` and `.astream()` OR the new `astream_events` API (beta).

Let’s use here the astream_events API to stream the following events:

    Agent Start with inputs
    Tool Start with inputs
    Tool End with outputs
    Stream the agent final anwer token by token
    Agent End with outputs

In [21]:
QUESTION = "Tell me more about your last answer, search again multiple times and provide a deeper explanation"

In [22]:
async for event in agent_with_chat_history.astream_events(
    {"question": QUESTION}, config=config, version="v1",
):
    kind = event["event"]
    if kind == "on_chain_start":
        if (
            event["name"] == "AgentExecutor"
        ):  # Was assigned when creating the agent with `.with_config({"run_name": "Agent"})`
            print(
                f"Starting agent: {event['name']}"
            )
    elif kind == "on_chain_end":
        if (
            event["name"] == "AgentExecutor"
        ):  # Was assigned when creating the agent with `.with_config({"run_name": "Agent"})`
            print()
            print("--")
            print(
                f"Done agent: {event['name']}"
            )
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        if content:
            # Empty content in the context of OpenAI means
            # that the model is asking for a tool to be invoked.
            # So we only print non-empty content
            print(content, end="")
    elif kind == "on_tool_start":
        print("--")
        print(
            f"Starting tool: {event['name']} with inputs: {event['data'].get('input')}"
        )
    elif kind == "on_tool_end":
        print(f"Done tool: {event['name']}")
        # print(f"Tool output was: {event['data'].get('output')}")
        print("--")

Starting agent: AgentExecutor
--
Starting tool: docsearch with inputs: {'query': 'Markov chains in virus spread'}
--
Starting tool: docsearch with inputs: {'query': 'Markov models epidemiology'}
--
Starting tool: docsearch with inputs: {'query': 'Markov chains infectious diseases'}
Done tool: docsearch
--
Done tool: docsearch
--
Done tool: docsearch
--
Markov chain models provide a framework for understanding the spread of viruses by representing the transmission of infection as a series of probabilistic events. Here's a deeper explanation of how these models can be applied to the spread of viruses, with a focus on the specific studies and methods mentioned in the documents:

1. **Spatial Markov Chain Model for Virus Spread**:
   - A spatial Markov chain model represents individuals as nodes in a graph, with connections (vertices) between nodes representing relationships between people. The likelihood of virus transmission from one person to another depends on the intensity of their co

#### Note: Try to run this last question with GPT3.5 and see how you are going to run out of token space in the LLM

# Summary

We just built our first RAG BOT!.

- We learned that **Agents + Tools are the best way to go about building Bots**. <br>
- We converted the Azure Search retriever into a Tool using the function `GetDocSearchResults_Tool` in `utils.py`
- We learned about the events API (Beta), one way to stream the answer from agents
- We learned that for comprehensive, quality answers we will run out of space with GPT3.5. GPT4 then becomes necessary.


# NEXT

Now that we have a bot with one skill (Document Search), let's build more skills!. In the next Notebook, we are going to build an agent that can understand tabular data in csv file and can execute python commands