### RAG is a technique that enhances Large Language Model (LLM) responses by retrieving source information from external data stores to augment generated responses.

### RAG architecture involves three key processes: understanding queries, retrieving information, and generating responses.

In [1]:
from helper import load_mistral_api_key
api_key, dlai_endpoint = load_mistral_api_key(ret_key=True)

### Get Data

#### Parse the article with BeautifulSoup

In [3]:
import requests
from bs4 import BeautifulSoup
import re

response = requests.get(
    "https://www.deeplearning.ai/the-batch/a-roadmap-explores-how-ai-can-detect-and-mitigate-greenhouse-gases/"
)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")
tag = soup.find("div", re.compile("^prose--styled"))
text = tag.text
print(text)

How can AI help to fight climate change? A new report evaluates progress so far and explores options for the future.What’s new: The Innovation for Cool Earth Forum, a conference of climate researchers hosted by Japan, published a roadmap for the use of data science, computer vision, and AI-driven simulation to reduce greenhouse gas emissions. The roadmap evaluates existing approaches and suggests ways to scale them up.How it works: The roadmap identifies 6 “high-potential opportunities”: activities in which AI systems can make a significant difference based on the size of the opportunity, real-world results, and validated research. The authors emphasize the need for data, technical and scientific talent, computing power, funding, and leadership to take advantage of these opportunities.Monitoring emissions. AI systems analyze data from satellites, drones, and ground sensors to measure greenhouse gas emissions. The European Union uses them to measure methane emissions, environmental orga

#### Chunking

In [4]:
chunk_size = 512
chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]

In [6]:
len(chunks)

8

#### Get embeddings of the chunks

In [7]:
from mistralai.client import MistralClient


def get_text_embedding(txt):
    client = MistralClient(api_key=api_key)
    embeddings_batch_response = client.embeddings(model="mistral-embed", input=txt)
    return embeddings_batch_response.data[0].embedding

In [8]:
import numpy as np

text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [9]:
text_embeddings

array([[-0.03274536,  0.04751587,  0.04489136, ..., -0.03289795,
         0.02278137, -0.01459503],
       [-0.03631592,  0.05548096,  0.03271484, ..., -0.03125   ,
         0.01594543, -0.01722717],
       [-0.04876709,  0.04779053,  0.05670166, ...,  0.0046463 ,
         0.0184021 , -0.01251984],
       ...,
       [-0.02597046,  0.04049683,  0.03543091, ..., -0.01013184,
        -0.00962067, -0.00917053],
       [-0.03025818,  0.0541687 ,  0.06280518, ..., -0.00900269,
        -0.00782776, -0.00432587],
       [-0.02456665,  0.05093384,  0.04879761, ..., -0.0064888 ,
         0.02600098, -0.01386261]])

In [10]:
len(text_embeddings[0])

1024

#### Store Embeddings in a vector databsae


In [11]:
import faiss

d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

#### Embed the user query

In [12]:
question = "What are the ways that AI can reduce emissions in Agriculture?"
question_embeddings = np.array([get_text_embedding(question)])

In [13]:
question_embeddings

array([[-0.00073624,  0.04116821,  0.04318237, ..., -0.02453613,
         0.01029968,  0.00930023]])

#### Search for chunks that are similar to the query

In [14]:
D, I = index.search(question_embeddings, k=2)
print(I)

[[4 5]]


In [15]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['data to help factories use more recycled materials, cut waste, minimize energy use, and reduce downtime. Similarly, they can optimize supply chains to reduce emissions contributed by logistics.\xa0Agriculture.\xa0Farmers use AI-equipped sensors to simulate different crop rotations and weather events to forecast crop yield or loss. Armed with this data, food producers can cut waste and reduce carbon footprints. The authors cite lack of food-related datasets and investment in adapting farming practices as primary b', 'arriers to taking full advantage of AI in the food industry.Transportation.\xa0AI systems can reduce greenhouse-gas emissions by improving traffic flow, ameliorating congestion, and optimizing public transportation. Moreover, reinforcement learning can reduce the impact of electric vehicles on the power grid by optimizing their charging. More data, uniform standards, and AI talent are needed to realize this potential.Materials.\xa0Materials scientists use AI models to stu

In [16]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [17]:
from mistralai.models.chat_completion import ChatMessage


def mistral(user_message, model="mistral-small-latest", is_json=False):
    client = MistralClient(api_key=api_key)
    messages = [ChatMessage(role="user", content=user_message)]

    if is_json:
        chat_response = client.chat(
            model=model, messages=messages, response_format={"type": "json_object"}
        )
    else:
        chat_response = client.chat(model=model, messages=messages)

    return chat_response.choices[0].message.content

In [18]:
response = mistral(prompt)
print(response)

In Agriculture, AI-equipped sensors can be used to simulate different crop rotations and weather events to forecast crop yield or loss. With this data, food producers can cut waste and reduce their carbon footprints. However, the context information mentions that the full advantage of AI in the food industry is hindered by a lack of food-related datasets and investment in adapting farming practices.


### RAG + Function calling

In [19]:
def qa_with_context(text, question, chunk_size=512):
    # split document into chunks
    chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]
    # load into a vector database
    text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])
    d = text_embeddings.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embeddings)
    # create embeddings for a question
    question_embeddings = np.array([get_text_embedding(question)])
    # retrieve similar chunks from the vector database
    D, I = index.search(question_embeddings, k=2)
    retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
    # generate response based on the retrieve relevant text chunks

    prompt = f"""
    Context information is below.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {question}
    Answer:
    """
    response = mistral(prompt)
    return response

In [20]:
import functools

names_to_functions = {"qa_with_context": functools.partial(qa_with_context, text=text)}

In [21]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "qa_with_context",
            "description": "Answer user question by retrieving relevant context",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "user question",
                    }
                },
                "required": ["question"],
            },
        },
    },
]

In [23]:
question = """
What are the ways AI can mitigate climate change in transportation?
"""

client = MistralClient(api_key=api_key)

response = client.chat(
    model="mistral-large-latest",
    messages=[ChatMessage(role="user", content=question)],
    tools=tools,
    tool_choice="any",
)

response

ChatCompletionResponse(id='859b1032c705467f8368ec5cadf41ef9', object='chat.completion', created=1718651395, model='mistral-large-latest', choices=[ChatCompletionResponseChoice(index=0, message=ChatMessage(role='assistant', content='', name=None, tool_calls=[ToolCall(id='HS3rphHFl', type=<ToolType.function: 'function'>, function=FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}'))]), finish_reason=<FinishReason.tool_calls: 'tool_calls'>)], usage=UsageInfo(prompt_tokens=92, total_tokens=126, completion_tokens=34))

In [25]:
tool_function = response.choices[0].message.tool_calls[0].function
tool_function

FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}')

In [27]:
import json
args = json.loads(tool_function.arguments)
args

{'question': 'What are the ways AI can mitigate climate change in transportation?'}

In [30]:
function_response = names_to_functions[tool_function.name](**args)
function_response

'The context information does not provide specific details on how AI can mitigate climate change in transportation. However, it does mention that AI has a significant role to play in reducing greenhouse gas emissions across various sectors, including transportation. The report from the Innovation for Cool Earth Forum suggests that there are 6 "high-potential opportunities" for AI, but it does not specify which of these opportunities are related to transportation. Therefore, I cannot provide a specific answer based on the given context information.'

In [34]:
question = """
Write a python function to sort the letters in a string
"""
client = MistralClient(api_key=api_key)
response = client.chat(model='mistral-large-latest', messages=[ChatMessage(role='user', content=question)], tools=tools, tool_choice='any') 


In [35]:
response

ChatCompletionResponse(id='f6598598b6114719a766ba7b3759f876', object='chat.completion', created=1718651753, model='mistral-large-latest', choices=[ChatCompletionResponseChoice(index=0, message=ChatMessage(role='assistant', content='', name=None, tool_calls=[ToolCall(id='byxUDHAfz', type=<ToolType.function: 'function'>, function=FunctionCall(name='qa_with_context', arguments='{"question": "Write a python function to sort the letters in a string"}'))]), finish_reason=<FinishReason.tool_calls: 'tool_calls'>)], usage=UsageInfo(prompt_tokens=90, total_tokens=123, completion_tokens=33))

In [39]:
tool_function = response.choices[0].message.tool_calls[0].function
tool_function

FunctionCall(name='qa_with_context', arguments='{"question": "Write a python function to sort the letters in a string"}')

It is still calling the defined function and we must change the defined tools

In [46]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "qa_with_context",
            "description": "Answer user question about AI by retrieving relevant context about the article",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "user question",
                    }
                },
                "required": ["question"],
            },
        },
    },
]

In [48]:
question = """
Write a python function to sort the letters in a string
"""
client = MistralClient(api_key=api_key)
response = client.chat(model='mistral-large-latest', messages=[ChatMessage(role='user', content=question)], tools=tools, tool_choice='auto') 
response

ChatCompletionResponse(id='f50930fa1f474f80956ed78b78d203cf', object='chat.completion', created=1718652111, model='mistral-large-latest', choices=[ChatCompletionResponseChoice(index=0, message=ChatMessage(role='assistant', content="To sort the letters in a string using Python, you can use the built-in `sorted()` function. Here's a simple function that takes a string as input and returns a new string with its letters sorted in alphabetical order:\n\n```python\ndef sort_letters(s):\n    return ''.join(sorted(s))\n```\n\nUsage example:\n\n```python\n>>> sort_letters('hello')\n'ehllo'\n```\n\nKeep in mind that this function sorts all characters in the input string, not just letters. If you want to sort only the letters and maintain the original positions of non-letter characters, you can use the following function:\n\n```python\nimport re\n\ndef sort_letters_only(s):\n    non_letters = re.findall(r'\\W+|\\d+', s)\n    letters_only = re.sub(r'\\W+|\\d+', '', s)\n    sorted_letters = ''.join

In [51]:
tool_function = response.choices[0].message
tool_function

ChatMessage(role='assistant', content="To sort the letters in a string using Python, you can use the built-in `sorted()` function. Here's a simple function that takes a string as input and returns a new string with its letters sorted in alphabetical order:\n\n```python\ndef sort_letters(s):\n    return ''.join(sorted(s))\n```\n\nUsage example:\n\n```python\n>>> sort_letters('hello')\n'ehllo'\n```\n\nKeep in mind that this function sorts all characters in the input string, not just letters. If you want to sort only the letters and maintain the original positions of non-letter characters, you can use the following function:\n\n```python\nimport re\n\ndef sort_letters_only(s):\n    non_letters = re.findall(r'\\W+|\\d+', s)\n    letters_only = re.sub(r'\\W+|\\d+', '', s)\n    sorted_letters = ''.join(sorted(letters_only))\n\n    result = ''\n    i = 0\n    for char in s:\n        if not char.isalpha():\n            result += non_letters.pop(0)\n        else:\n            result += sorted_l