---
---
# Notebook: [ Week #05: Build Your RAG Pipeline with Enhanced Retrieval]

The challenges in this notebook are to implement at least:

- 1 x Technique from **Pre-Retrieval Processes**
- 1 x Technique from **Retrieval Processes**
- 1 x Technique from **Post-Retrieval Processes**

---

Note:
- You may want to challenge yourself to implement those techniques that are covered in our **Course Notes**, but **NOT in the walkthrough of the notebook**.
- You can create as many code cells as needed.

## Setup
---

In [None]:
# Feel free to install dependencies that you need
!pip install openai --quiet
!pip install langchain --quiet
!pip install langchain_openai --quiet
!pip install langchain-community --quiet
!pip install unstructured --quiet
!pip install pdfminer --quiet
!pip install langchain-experimental --quiet
!pip install langchain_cohere --quiet
!pip install pysbd --quiet
!pip install ragas --quiet
!pip install pypdf --quiet
!pip install lolviz --quiet
!pip install chromadb --quiet
!pip install pdfminer.six --quiet
!pip install tiktoken --quiet
!pip install datasets --quiet
!pip install langchain-chroma --quiet


In [None]:
from openai import OpenAI
from getpass import getpass

API_KEY = getpass("Enter your OpenAI API Key")
client = OpenAI(api_key=API_KEY)

---

## Helper Functions

---

### Function for Generating Embedding

In [None]:
def get_embedding(input, model='text-embedding-3-small'):
    response = client.embeddings.create(
        input=input,
        model=model
    )
    return [x.embedding for x in response.data]

### Function for Text Generation

In [None]:
# This is the "Updated" helper function for calling LLM
def get_completion(prompt, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=256, n=1, json_output=False):
    if json_output == True:
      output_json_structure = {"type": "json_object"}
    else:
      output_json_structure = None

    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create( #originally was openai.chat.completions
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1,
        response_format=output_json_structure,
    )
    return response.choices[0].message.content

In [None]:
# This a "modified" helper function that we will discuss in this session
# Note that this function directly take in "messages" as the parameter.
def get_completion_by_messages(messages, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1
    )
    return response.choices[0].message.content

### Functions for Token Counting

In [None]:
# This function is for calculating the tokens given the "message"
# ⚠️ This is simplified implementation that is good enough for a rough estimation

import tiktoken

def count_tokens(text):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    return len(encoding.encode(text))

def count_tokens_from_message_rough(messages):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    value = ' '.join([x.get('content') for x in messages])
    return len(encoding.encode(value))


## Setting up Credentials & Common Components for LangChain

In [None]:
import os
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = API_KEY

# embedding model that we will use for the session
embeddings_model = OpenAIEmbeddings(model='text-embedding-3-small')

# llm to be used in RAG pipeplines in this notebook
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0, seed=42)

---
---

<br>

**\[ Overview of Steps in RAG \]**

- 1. **Document Loading**
	- In this initial step, relevant documents are ingested and prepared for further processing. This process typically occurs offline.
- 2. **Splitting & Chunking**
	- The text from the documents is split into smaller chunks or segments.
	- These chunks serve as the building blocks for subsequent stages.
- 3. **Storage**
	- The embeddings (vector representations) of these chunks are created and stored in a vector store.
	- These embeddings capture the semantic meaning of the text.
- 4. **Retrieval**
	- When an online query arrives, the system retrieves relevant chunks from the vector store based on the query.
	- This retrieval step ensures that the system identifies the most pertinent information.
- 5. **Output**
	- Finally, the retrieved chunks are used to generate a coherent response.
	- This output can be in the form of natural language text, summaries, or other relevant content.

![](https://abc-notes.data.tech.gov.sg/resources/img/topic-4-rag-overview.png)

# Setting Up the Common Process

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

## Download the notes

In [None]:
# Download and unzip into local folder
url = "https://abc-notes.data.tech.gov.sg/resources/data/notes_rag.zip"

import requests
import zipfile
import io

response = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(response.content))

# Take note that the files are unzipped into a folder
z.extractall('./notes_rag')


## Document Loading

In [None]:
from langchain_community.document_loaders import TextLoader

In [None]:
< Your Code Here>

---
---


# Technique(s) for Improving Pre-Retrieval Process

In [None]:
< Your code here >

---
---
<br>

# Technique(s) for Improving Retrieval Process

In [None]:
< Your code here >

---
---
<br>

# Technique(s) for Improving Post-retrieval Process

In [None]:
< Your code here >