---
---

<h1>Notebook: [ Week #04: Building your own RAG Bot ]</h1>

- Your objective in this notebook is create a RAG Bot that allow the users to interact with some notes from AI Champions Bootcamp.
- A convenient way to work on this notebook is to open the earlier Jupyter Notebook in `Topic 4`. Yes, the notebook with pre-populated code cells.
- You can refer to how a simple RAG Bot (or more like a RAG pipeline) is built
- You may extend the functionalities of the bot as you wish.
- Minimumly, you should have a simple RAG Bot like the one in the earlier `Topic 4` Jupyter Notebook


---
---

# Setup

In [1]:
import os
import openai

with open('../openai_key', 'r') as file:
    key = file.read().strip()

# Set up the OpenAI API key by setting the OPENAI_API_KEY environment variable
os.environ["OPENAI_API_KEY"] = key

---

## Helper Functions

---

### Function for Generating Embedding

In [2]:
def get_embedding(input, model='text-embedding-3-small'):
    response = client.embeddings.create(
        input=input,
        model=model
    )
    return [x.embedding for x in response.data]

### Function for Text Generation

In [3]:
# This is the "Updated" helper function for calling LLM
def get_completion(prompt, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=256, n=1, json_output=False):
    if json_output == True:
      output_json_structure = {"type": "json_object"}
    else:
      output_json_structure = None

    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create( #originally was openai.chat.completions
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1,
        response_format=output_json_structure,
    )
    return response.choices[0].message.content

In [4]:
# This a "modified" helper function that we will discuss in this session
# Note that this function directly take in "messages" as the parameter.
def get_completion_by_messages(messages, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1
    )
    return response.choices[0].message.content

## Functions for Token Counting

In [5]:
# These functions are for calculating the tokens.
# ⚠️ These are simplified implementations that are good enough for a rough estimation.

import tiktoken

def count_tokens(text):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    return len(encoding.encode(text))

def count_tokens_from_message(messages):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    value = ' '.join([x.get('content') for x in messages])
    return len(encoding.encode(value))


---
---

# Create a "Chat with your Document" Bot

**\[ Overview of Steps in RAG \]**

- 1. **Document Loading**
	- In this initial step, relevant documents are ingested and prepared for further processing. This process typically occurs offline.
- 2. **Splitting & Chunking**
	- The text from the documents is split into smaller chunks or segments.
	- These chunks serve as the building blocks for subsequent stages.
- 3. **Storage**
	- The embeddings (vector representations) of these chunks are created and stored in a vector store.
	- These embeddings capture the semantic meaning of the text.
- 4. **Retrieval**
	- When an online query arrives, the system retrieves relevant chunks from the vector store based on the query.
	- This retrieval step ensures that the system identifies the most pertinent information.
- 5. **Output**
	- Finally, the retrieved chunks are used to generate a coherent response.
	- This output can be in the form of natural language text, summaries, or other relevant content.

![](https://abc-notes.data.tech.gov.sg/resources/img/topic-4-rag-overview.png)

---
---

## Document Loading

Here are the "notes" that you must include in your RAG pipeline as the `Documents`
- [Key Parameters for LLMs](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html)
- [LLMs and Hallucinations](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/3.-llms-and-hallucinations.html)
- [Prompting Techniques for BUilders](https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/4.-prompting-techniques-for-builders.html)

You have three options.
1) 💪🏼 Take up the challenge to find a way to get the content directly from the webpages above.
2) 🥴 Go with the easy route, download the notes nicely prepared in a `.txt` format. Download the zipped file [here](https://abc-notes.data.tech.gov.sg/resources/data/notes.zip)
3) 😎 “Only children choose; adults take all.” Experiment with both data sources and see which can help to the Bot to provide more accurate information for the user queries.

---

> 💡 **Feel free to add as many code cells as your need.**

---

In [6]:
import requests
from bs4 import BeautifulSoup

def soup_scrape(url):
    
    # Send HTTP request
    response = requests.get(url)
    
    # Check if request was successful
    if response.status_code == 200:
        # Parse the page content
        soup = BeautifulSoup(response.text, 'html.parser')
    
        # Extract all visible text
        text = soup.get_text(separator='\n', strip=True)
        return text
    else:
        print(f"Failed to retrieve page. Status code: {response.status_code}")

In [7]:
# < Your Code Here >
texts = {
    "key_params": soup_scrape("https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/2.-key-parameters-for-llms.html"),
    "halu": soup_scrape("https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/3.-llms-and-hallucinations.html"),
    "prompt_techs": soup_scrape("https://abc-notes.data.tech.gov.sg/notes/topic-2-deeper-dive-into-llms/4.-prompting-techniques-for-builders.html")
}

## Splitting & Chunking

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=50,
    chunk_overlap=10
)

def text_split(text):
    split_raw = text_splitter.split_text(text)
    split_clean = [chunk.replace("\n", " ") for chunk in split_raw]
    return split_clean

In [9]:
# < Your Code Here >
from langchain.schema import Document
docs = []
for article, text in texts.items():
    chunks = text_split(text)
    docs.extend([Document(page_content = chunk, metadata = {"source": article}) for chunk in chunks])

In [10]:
docs[:5]

[Document(metadata={'source': 'key_params'}, page_content='2. Key Parameters for LLMs icon : LiNotebookTabs'),
 Document(metadata={'source': 'key_params'}, page_content='Copy Title: Key Parameters for LLMs Tokens'),
 Document(metadata={'source': 'key_params'}, page_content='Tokens Key Parameters for LLM'),
 Document(metadata={'source': 'key_params'}, page_content='LLMs and Hallucination'),
 Document(metadata={'source': 'key_params'}, page_content='Prompting Techniques for Builders')]

## Storage: Embedding & Vectorstores

In [11]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

In [12]:
# < Your Code Here >
embeddings_model = OpenAIEmbeddings(model='text-embedding-3-small')

vector_store = Chroma.from_documents(
    collection_name="llm_playbook",
    documents=docs,
    embedding=embeddings_model,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not neccesary
)

In [13]:
# Show the number of documents in the vector store
vector_store._collection.count()

3175

In [14]:
# Peek at one of the documents in the vector store
vector_store._collection.peek(limit=1)

{'ids': ['aac2eb8e-ded3-47b5-a47d-50e4665e4b65'],
 'embeddings': array([[ 0.01395904, -0.02815107, -0.01866687, ..., -0.02932974,
          0.01704963, -0.0009868 ]], shape=(1, 1536)),
 'documents': ['2. Key Parameters for LLMs icon : LiNotebookTabs'],
 'uris': None,
 'included': ['metadatas', 'documents', 'embeddings'],
 'data': None,
 'metadatas': [{'source': 'key_params'}]}

## Retrieval

In [15]:
# < Your Code Here >
vector_store.similarity_search('Zero Shot', k=3)

[Document(id='be751e05-2a9b-4860-9979-23e4ecd0deba', metadata={'source': 'prompt_techs'}, page_content='None'),
 Document(id='2fecc57f-3519-402b-a845-1909b363f4b6', metadata={'source': 'prompt_techs'}, page_content='None'),
 Document(id='116d30de-aafa-4c27-b3bc-14013dcfabe6', metadata={'source': 'prompt_techs'}, page_content='None')]

In [16]:
vector_store.similarity_search_with_relevance_scores('Zero Shot', k=3)

[(Document(id='be751e05-2a9b-4860-9979-23e4ecd0deba', metadata={'source': 'prompt_techs'}, page_content='None'),
  0.10985333324070279),
 (Document(id='2fecc57f-3519-402b-a845-1909b363f4b6', metadata={'source': 'prompt_techs'}, page_content='None'),
  0.10985333324070279),
 (Document(id='116d30de-aafa-4c27-b3bc-14013dcfabe6', metadata={'source': 'prompt_techs'}, page_content='None'),
  0.10985333324070279)]

## Question & Answer

In [19]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate

In [20]:
# < Your Code Here >
# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    ChatOpenAI(model='gpt-4o-mini'),
    retriever=vector_store.as_retriever(),
    return_source_documents=True, # Make inspection of document possible
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [21]:
qa_chain.invoke("Why LLM hallucinate?")

{'query': 'Why LLM hallucinate?',
 'result': 'LLMs may hallucinate due to their reliance on statistical patterns and associations in training data, which can lead to generating plausible but false information. Additionally, they sometimes lack true understanding of context, resulting in incorrect or nonsensical outputs. Thanks for asking!',
 'source_documents': [Document(id='d8648a7c-5c48-495f-a682-8636e42f5e3c', metadata={'source': 'key_params'}, page_content='LLMs and Hallucination'),
  Document(id='d85cb3cc-b79c-43a9-bcc8-2590b68e9d1c', metadata={'source': 'prompt_techs'}, page_content='LLMs and Hallucination'),
  Document(id='ab98b484-d1e8-4043-aa7e-4501f281ed49', metadata={'source': 'halu'}, page_content='LLMs and Hallucination'),
  Document(id='0af4f85d-2af4-4927-bc79-714269cafd4e', metadata={'source': 'key_params'}, page_content='LLMs and Hallucination')]}