<a href="https://colab.research.google.com/github/vixdang0x7d3/rag-int14124-final/blob/main/simple-rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What we're going to build?

A simple RAG pipeline that's able to process a PDF document - [nutrition-textbook](https://pressbooks.oer.hawaii.edu/humannutrition2/),

We'll write the code to:

1. Open a PDF document & extract the text.
2. Format the text into appropriate chunks for feeeding them into an embedding model.
3. Embed the text aka. turn them into numerical representation which we can store for later use.
4. Build a **retrieval system** that finds relevant chunks of text based on a query
5. Create a prompt that incorporates the retrieved pieces of text.
6. Generate an answer to the query based on texts from the textbook.

In [20]:
import os

if "COLAB_GPU" in os.environ:
    print("[INFO] Running in Google Colab, installing requirements.")
    !pip install torch==2.6.0 # requires torch 2.1.1+ (for efficient sdpa implementation)
    !pip install PyMuPDF # for reading PDFs with Python
    !pip install tqdm # for progress bars
    !pip install sentence-transformers # for embedding models
    !pip install accelerate # for quantization model loading
    !pip install bitsandbytes # for quantizing models (less storage space)
    !pip install flash-attn --no-build-isolation # for faster attention mechanism = faster LLM inference

Collecting torch==2.6.0
  Downloading torch-2.6.0-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch==2.6.0)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch==2.6.0)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch==2.6.0)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3

### Document/Text Processing and Embedding Generation

In [1]:
import os
import requests

from pathlib import Path

pdf_path = Path(os.getcwd()) / "data" / "human-nutrition-text.pdf"

if not pdf_path.parent.exists():
    print(f"Creating directory {pdf_path.parent}")
    pdf_path.parent.mkdir(parents=True, exist_ok=True)

if not pdf_path.exists():
    print("File not found, downloading...")

    url = "https://pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf"

    filename = pdf_path.name

    response = requests.get(url)

    if response.status_code == 200:
        pdf_path.write_bytes(response.content)
        print(f"Downloaded {filename} to {pdf_path}")
    else:
        print(f"Failed to download {filename}. Status code: {response.status_code}")
else:
    print(f"File already exists at {pdf_path}.")

File already exists at /content/data/human-nutrition-text.pdf.


In [2]:
import pymupdf
from tqdm import tqdm

def clean_text(text: str) -> str:
    cleaned_text = text.replace("\n", " ").strip()
    return cleaned_text

def extract_text(pdf_path: Path) -> list[dict]:
    with pymupdf.open(pdf_path) as doc:
        text_data = []

        for pageno, page in tqdm(enumerate(doc), desc="Extracting text", total=len(doc)):
            text = page.get_text("text")
            text = clean_text(text)

            # Content start after page 42
            # 1 token = 4 characters
            text_data.append({
                "page_number": pageno - 42,
                "page_char_count" : len(text),
                "page_word_count" : len(text.split()),
                "page_sentence_count_raw" : len(text.split(".")),
                "page_token_count" : len(text) / 4,
                "text": text
            })

        return text_data

text_data = extract_text(pdf_path)
text_data[:2]

Extracting text: 100%|██████████| 1208/1208 [00:05<00:00, 235.03it/s]


[{'page_number': -42,
  'page_char_count': 29,
  'page_word_count': 4,
  'page_sentence_count_raw': 1,
  'page_token_count': 7.25,
  'text': 'Human Nutrition: 2020 Edition'},
 {'page_number': -41,
  'page_char_count': 0,
  'page_word_count': 0,
  'page_sentence_count_raw': 1,
  'page_token_count': 0.0,
  'text': ''}]

In [3]:
from pprint import pprint
import random

pprint(
    random.sample(text_data, k=3)
)


[{'page_char_count': 0,
  'page_number': 705,
  'page_sentence_count_raw': 1,
  'page_token_count': 0.0,
  'page_word_count': 0,
  'text': ''},
 {'page_char_count': 1806,
  'page_number': 5,
  'page_sentence_count_raw': 16,
  'page_token_count': 451.5,
  'page_word_count': 276,
  'text': 'down digestible complex carbohydrates to simple sugars, mostly  '
          'glucose. Glucose is then transported to all our cells where it is  '
          'stored, used to make energy, or used to build macromolecules. '
          'Fiber  is also a complex carbohydrate, but it cannot be broken down '
          'by  digestive enzymes in the human intestine. As a result, it '
          'passes  through the digestive tract undigested unless the bacteria '
          'that  inhabit the colon or large intestine break it down.  One gram '
          'of digestible carbohydrates yields four kilocalories of  energy for '
          'the cells in the body to perform work. In addition to  providing '
          'en

### Get some stat on the text

In [4]:
import pandas as pd

df = pd.DataFrame(text_data)
df.head()

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,text
0,-42,29,4,1,7.25,Human Nutrition: 2020 Edition
1,-41,0,0,1,0.0,
2,-40,320,42,1,80.0,Human Nutrition: 2020 Edition UNIVERSITY OF ...
3,-39,212,30,3,53.0,Human Nutrition: 2020 Edition by University of...
4,-38,797,116,3,199.25,Contents Preface University of Hawai‘i at Mā...


In [5]:
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count
count,1208.0,1208.0,1208.0,1208.0,1208.0
mean,561.5,1148.0,171.97,14.18,287.0
std,348.86,560.38,86.49,9.54,140.1
min,-42.0,0.0,0.0,1.0,0.0
25%,259.75,762.0,109.0,8.0,190.5
50%,561.5,1231.5,183.0,13.0,307.88
75%,863.25,1603.5,239.0,19.0,400.88
max,1165.0,2308.0,393.0,82.0,577.0


### Splitting pages into sentences

In [6]:
from spacy.lang.en import English

nlp = English()

nlp.add_pipe("sentencizer")

doc = nlp("This is a sentence. This is anothere sentence")
assert len(list(doc.sents)) == 2

list(doc.sents)

[This is a sentence., This is anothere sentence]

Perform transformation on our text

In [7]:
for item in tqdm(text_data):
    item["sentences"] = list(nlp(item["text"]).sents)
    item["sentences"] = [str(sent) for sent in item["sentences"]]
    item["page_sentence_count_spacy"] = len(item["sentences"])

100%|██████████| 1208/1208 [00:02<00:00, 504.26it/s]


In [8]:
random.sample(text_data, k=1)

[{'page_number': 957,
  'page_char_count': 622,
  'page_word_count': 86,
  'page_sentence_count_raw': 8,
  'page_token_count': 155.5,
  'text': 'features interactive learning activities.\xa0 These activities are  available in the web-based textbook and not available in the  downloadable versions (EPUB, Digital PDF, Print_PDF, or  Open Document).  Learning activities may be used across various mobile  devices, however, for the best user experience it is strongly  recommended that users complete these activities using a  desktop or laptop computer and in Google Chrome.  \xa0 An interactive or media element has been  excluded from this version of the text. You can  view it online here:  http://pressbooks.oer.hawaii.edu/ humannutrition2/?p=503  \xa0 958  |  Fuel Sources',
  'sentences': ['features interactive learning activities.',
   '\xa0 These activities are  available in the web-based textbook and not available in the  downloadable versions (EPUB, Digital PDF, Print_PDF, or  Open Docum

In [9]:
df = pd.DataFrame(text_data)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy
count,1208.0,1208.0,1208.0,1208.0,1208.0,1208.0
mean,561.5,1148.0,171.97,14.18,287.0,10.32
std,348.86,560.38,86.49,9.54,140.1,6.3
min,-42.0,0.0,0.0,1.0,0.0,0.0
25%,259.75,762.0,109.0,8.0,190.5,5.0
50%,561.5,1231.5,183.0,13.0,307.88,10.0
75%,863.25,1603.5,239.0,19.0,400.88,15.0
max,1165.0,2308.0,393.0,82.0,577.0,28.0


### Create chunks from sentences

In [10]:
sent_per_chunk = 10

def split_list(
    input_list: list[str],
    chunk_size: int,
) -> list[list[str]]:
    """Split list of sentences into chunk lenght sub-lists"""
    return [input_list[i:i+chunk_size] for i in range(0, len(input_list), chunk_size)]


for item in tqdm(text_data):
    item["chunks"] = split_list(item["sentences"], chunk_size=sent_per_chunk)
    item["num_chunks"] = len(item["chunks"])

100%|██████████| 1208/1208 [00:00<00:00, 337984.07it/s]


In [11]:
random.sample(text_data, k=1)

[{'page_number': 279,
  'page_char_count': 1992,
  'page_word_count': 282,
  'page_sentence_count_raw': 24,
  'page_token_count': 498.0,
  'text': 'about their safety. The health concerns of sugar substitutes  originally  stemmed  from  scientific  studies,  which  were  misinterpreted by both scientists and the public.  In the early 1970s scientific studies were published that  demonstrated that high doses of saccharin caused bladder tumors  in rats. This information fueled the still-ongoing debate of the  health consequences of all artificial sweeteners. In actuality, the  results from the early studies were completely irrelevant to humans.  The large doses (2.5 percent of diet) of saccharine caused a pellet  to form in the rat’s bladder. That pellet chronically irritated the  bladder wall, eventually resulting in tumor development. Since this  study, scientific investigation in rats, monkeys, and humans have  not found any relationship between saccharine consumption and  bladder can

In [12]:
df = pd.DataFrame(text_data)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy,num_chunks
count,1208.0,1208.0,1208.0,1208.0,1208.0,1208.0,1208.0
mean,561.5,1148.0,171.97,14.18,287.0,10.32,1.53
std,348.86,560.38,86.49,9.54,140.1,6.3,0.64
min,-42.0,0.0,0.0,1.0,0.0,0.0,0.0
25%,259.75,762.0,109.0,8.0,190.5,5.0,1.0
50%,561.5,1231.5,183.0,13.0,307.88,10.0,1.0
75%,863.25,1603.5,239.0,19.0,400.88,15.0,2.0
max,1165.0,2308.0,393.0,82.0,577.0,28.0,3.0


Merging chunks from list to a single string

In [13]:
import re


chunk_data = []
for item in tqdm(text_data):
    for chunk in item["chunks"]:
        chunk_dict = {}
        chunk_dict["page_number"] = item["page_number"]

        merged_chunk = "".join(chunk).replace("  ", " ").strip()
        merged_chunk = re.sub(r'\.([A-Z])', r'. \1', merged_chunk)

        chunk_dict["chunk"] = merged_chunk
        chunk_dict["chunk_char_count"] = len(merged_chunk)
        chunk_dict["chunk_word_count"] = len([word for word in merged_chunk.split(" ")])
        chunk_dict["chunk_token_count"] = len(merged_chunk) / 4

        chunk_data.append(chunk_dict)

len(chunk_data)

100%|██████████| 1208/1208 [00:00<00:00, 30115.30it/s]


1843

In [14]:
random.sample(chunk_data, 1)

[{'page_number': -29,
  'chunk': 'Factors Affecting Energy Intake University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 485 Factors Affecting Energy Expenditure University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 492 Dietary, Behavioral, and Physical Activity Recommendations for Weight Management University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 501 Part\xa0IX.\xa0Chapter 9. Vitamins Introduction University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 515 Fat-Soluble Vitamins University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 521 Water-Soluble Vitamins University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutrition Program 550 Antioxidants University of Hawai‘i at Mānoa Food Science and Human Nutrition Program and Human Nutri

In [15]:
df =  pd.DataFrame(chunk_data)
df.describe().round(2)

Unnamed: 0,page_number,chunk_char_count,chunk_word_count,chunk_token_count
count,1843.0,1843.0,1843.0,1843.0
mean,582.38,734.44,112.33,183.61
std,347.79,447.54,71.22,111.89
min,-42.0,12.0,3.0,3.0
25%,279.5,315.0,44.0,78.75
50%,585.0,746.0,114.0,186.5
75%,889.0,1118.5,173.0,279.62
max,1165.0,1831.0,297.0,457.75


Some of the chunks have quite low token count. We will filter out samples with less than 30 tokens and see if they are worth keeping

In [16]:
min_token_length = 30
for row in df[df["chunk_token_count"] <= min_token_length].sample(5).iterrows():
    print(
        f'Token count: {row[1]["chunk_token_count"]}\n'
        f'Text: {row[1]["chunk"]}\n\n'
    )

Token count: 16.0
Text: PART II CHAPTER 2. THE HUMAN BODY Chapter 2. The Human Body | 53


Token count: 11.75
Text: Accessed March 17, 2018. Sports Nutrition | 961


Token count: 20.25
Text: Honor your health – gentle nutrition       Calories In Versus Calories Out | 1075


Token count: 25.5
Text: http://www.ajcn.org/cgi/ pmidlookup?view=long&pmid=10197575. Accessed October 6, 2017. 640 | Magnesium


Token count: 11.75
Text: Polan EU, Taylor DR. (2003), 782 | Introduction




Many of these are page headers and footers, they don't seem to offer much information. \
We can remove them and keep only chunk dicts with over 30 tokens.

In [17]:
chunk_data = df[df["chunk_token_count"] > min_token_length].to_dict(orient="records")

pprint(
    chunk_data[:2]
)


[{'chunk': 'Human Nutrition: 2020 Edition UNIVERSITY OF HAWAI‘I AT MĀNOA FOOD '
           'SCIENCE AND HUMAN NUTRITION PROGRAM ALAN TITCHENAL, SKYLAR HARA, '
           'NOEMI ARCEO CAACBAY, WILLIAM MEINKE-LAU, YA-YUN YANG, MARIE KAINOA '
           'FIALKOWSKI REVILLA, JENNIFER DRAPER, GEMADY LANGFELDER, CHERYL '
           'GIBBY, CHYNA NICOLE CHUN, AND ALLISON CALABRESE',
  'chunk_char_count': 308,
  'chunk_token_count': 77.0,
  'chunk_word_count': 42,
  'page_number': -40},
 {'chunk': 'Human Nutrition: 2020 Edition by University of Hawai‘i at Mānoa '
           'Food Science and Human Nutrition Program is licensed under a '
           'Creative Commons Attribution 4.0 International License, except '
           'where otherwise noted.',
  'chunk_char_count': 210,
  'chunk_token_count': 52.5,
  'chunk_word_count': 30,
  'page_number': -39}]


### Embedding text chunks

In [18]:
from sentence_transformers import SentenceTransformer

# TODO(vi): Research SentenceTransformer

embedding_model = SentenceTransformer(
    model_name_or_path="all-mpnet-base-v2",
)

sentences = [
    "The Sentences Transformers library provides an easy and open-source way to create embeddings.",
    "Sentences can be embedded one by one or as a list of strings.",
    "Embeddings are one of the most powerful concepts in machine learning!",
    "Learn to use embeddings well and you'll be well on your way to being an AI engineer."
]

embeddings = embedding_model.encode(sentences)

embedding_dict = dict(zip(sentences, embeddings))

for sent, emb, in embedding_dict.items():
    print(f"Sentence: {sent}\n")
    print(f"Embedding: {emb}\n")
    print("-----------------------------------\n\n")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Sentence: The Sentences Transformers library provides an easy and open-source way to create embeddings.

Embedding: [-2.07981113e-02  3.03164795e-02 -2.01218147e-02  6.86483756e-02
 -2.55255271e-02 -8.47689621e-03 -2.07084100e-04 -6.32377341e-02
  2.81606186e-02 -3.33352946e-02  3.02635301e-02  5.30720539e-02
 -5.03526032e-02  2.62288153e-02  3.33314314e-02 -4.51578870e-02
  3.63044329e-02 -1.37113058e-03 -1.20171346e-02  1.14946300e-02
  5.04510924e-02  4.70857024e-02  2.11912952e-02  5.14607318e-02
 -2.03746632e-02 -3.58889513e-02 -6.67892222e-04 -2.94393133e-02
  4.95858938e-02 -1.05639584e-02 -1.52014289e-02 -1.31752621e-03
  4.48197350e-02  1.56022953e-02  8.60380283e-07 -1.21392391e-03
 -2.37978548e-02 -9.09427938e-04  7.34480796e-03 -2.53931968e-03
  5.23369759e-02 -4.68043573e-02  1.66214537e-02  4.71578874e-02
 -4.15599234e-02  9.01929627e-04  3.60278897e-02  3.42214443e-02
  9.68227461e-02  5.94828576e-02 -1.64984558e-02 -3.51249650e-02
  5.92514267e-03 -7.08006672e-04 -2.410

Using a GPU can significantly speed up this step

In [19]:
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using CUDA? {'YES' if device=='cuda' else 'NO'}")
if device:
    print(f"GPU: {torch.cuda.get_device_name(0)}")


Using CUDA? YES
GPU: Tesla T4


In [20]:
embedding_model.to(device)
_ = embedding_model.encode(sentences)

In [21]:
# Extract only text chunks
text_chunks = [item["chunk"] for item in chunk_data]

embeddings = embedding_model.encode(
    text_chunks,
    batch_size=32,
)

In [22]:
# Merge embedding back to our dataset
for chunk, emb in zip(chunk_data, embeddings):
    chunk["embedding"] = emb

chunk_data[0]

{'page_number': -40,
 'chunk': 'Human Nutrition: 2020 Edition UNIVERSITY OF HAWAI‘I AT MĀNOA FOOD SCIENCE AND HUMAN NUTRITION PROGRAM ALAN TITCHENAL, SKYLAR HARA, NOEMI ARCEO CAACBAY, WILLIAM MEINKE-LAU, YA-YUN YANG, MARIE KAINOA FIALKOWSKI REVILLA, JENNIFER DRAPER, GEMADY LANGFELDER, CHERYL GIBBY, CHYNA NICOLE CHUN, AND ALLISON CALABRESE',
 'chunk_char_count': 308,
 'chunk_word_count': 42,
 'chunk_token_count': 77.0,
 'embedding': array([ 6.74242750e-02,  9.02281851e-02, -5.09549817e-03, -3.17545645e-02,
         7.39082173e-02,  3.51976193e-02, -1.97986774e-02,  4.67692614e-02,
         5.35727032e-02,  5.01228590e-03,  3.33928987e-02, -1.62219873e-03,
         1.76080856e-02,  3.62653472e-02, -3.16645222e-04, -1.07118469e-02,
         1.54257836e-02,  2.62176432e-02,  2.77656200e-03,  3.64942439e-02,
        -4.44109775e-02,  1.89361908e-02,  4.90117818e-02,  1.64020006e-02,
        -4.85782698e-02,  3.18291062e-03,  2.72992644e-02, -2.04756460e-03,
        -1.22829070e-02, -7.28049

In [23]:
# Save embeddings to file
chunk_embedding_df = pd.DataFrame(chunk_data)
save_path = "chunk_embedding_df.parquet"
chunk_embedding_df.to_parquet(save_path, index=False)

Try loading them up

In [24]:
chunk_embedding_df = pd.read_parquet(save_path)
chunk_embedding_df.head()

Unnamed: 0,page_number,chunk,chunk_char_count,chunk_word_count,chunk_token_count,embedding
0,-40,Human Nutrition: 2020 Edition UNIVERSITY OF HA...,308,42,77.0,"[0.067424275, 0.090228185, -0.005095498, -0.03..."
1,-39,Human Nutrition: 2020 Edition by University of...,210,30,52.5,"[0.05521561, 0.059213944, -0.016616723, -0.020..."
2,-38,Contents Preface University of Hawai‘i at Māno...,766,114,191.5,"[0.027980171, 0.03398143, -0.020642664, 0.0019..."
3,-37,Lifestyles and Nutrition University of Hawai‘i...,941,142,235.25,"[0.06825669, 0.038127504, -0.008468552, -0.018..."
4,-36,The Cardiovascular System University of Hawai‘...,998,152,249.5,"[0.03302642, -0.008497655, 0.009571581, -0.004..."


Storing as CSV requires objects get serialized into strings so we used parquet format instead

In [25]:
# parquet format retains numpy array
chunk_embedding_df.loc[0, 'embedding']

array([ 6.74242750e-02,  9.02281851e-02, -5.09549817e-03, -3.17545645e-02,
        7.39082173e-02,  3.51976193e-02, -1.97986774e-02,  4.67692614e-02,
        5.35727032e-02,  5.01228590e-03,  3.33928987e-02, -1.62219873e-03,
        1.76080856e-02,  3.62653472e-02, -3.16645222e-04, -1.07118469e-02,
        1.54257836e-02,  2.62176432e-02,  2.77656200e-03,  3.64942439e-02,
       -4.44109775e-02,  1.89361908e-02,  4.90117818e-02,  1.64020006e-02,
       -4.85782698e-02,  3.18291062e-03,  2.72992644e-02, -2.04756460e-03,
       -1.22829070e-02, -7.28049576e-02,  1.20445676e-02,  1.07300524e-02,
        2.10001133e-03, -8.17772895e-02,  2.67830137e-06, -1.81428511e-02,
       -1.20803136e-02,  2.47174762e-02, -6.27467185e-02,  7.35438392e-02,
        2.21624617e-02, -3.28767933e-02, -1.80095788e-02,  2.22952310e-02,
        5.61365038e-02,  1.79509621e-03,  5.25931977e-02, -3.31744622e-03,
       -8.33871402e-03, -1.06284171e-02,  2.31918902e-03, -2.23934203e-02,
       -1.53012052e-02, -

### Retrieval

At the moment, we have our document index ready in the form of a simple dataframe. \
In this stage, we'll convert our embedding into tensor for GPU accelerated computation and define a similarity search function that can retrieve $k$ relevant text passages based on a user query

In [26]:
import torch
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

embeddings = torch.tensor(
    np.array(chunk_embedding_df["embedding"].tolist()),
    dtype=torch.float32
).to(device)

embeddings.shape

torch.Size([1680, 768])

In [27]:
embeddings[0]

tensor([ 6.7424e-02,  9.0228e-02, -5.0955e-03, -3.1755e-02,  7.3908e-02,
         3.5198e-02, -1.9799e-02,  4.6769e-02,  5.3573e-02,  5.0123e-03,
         3.3393e-02, -1.6222e-03,  1.7608e-02,  3.6265e-02, -3.1665e-04,
        -1.0712e-02,  1.5426e-02,  2.6218e-02,  2.7766e-03,  3.6494e-02,
        -4.4411e-02,  1.8936e-02,  4.9012e-02,  1.6402e-02, -4.8578e-02,
         3.1829e-03,  2.7299e-02, -2.0476e-03, -1.2283e-02, -7.2805e-02,
         1.2045e-02,  1.0730e-02,  2.1000e-03, -8.1777e-02,  2.6783e-06,
        -1.8143e-02, -1.2080e-02,  2.4717e-02, -6.2747e-02,  7.3544e-02,
         2.2162e-02, -3.2877e-02, -1.8010e-02,  2.2295e-02,  5.6137e-02,
         1.7951e-03,  5.2593e-02, -3.3174e-03, -8.3387e-03, -1.0628e-02,
         2.3192e-03, -2.2393e-02, -1.5301e-02, -9.9306e-03,  4.6532e-02,
         3.5747e-02, -2.5476e-02,  2.6369e-02,  3.7491e-03, -3.8268e-02,
         2.5833e-02,  4.1287e-02,  2.5818e-02,  3.3297e-02, -2.5178e-02,
         4.5152e-02,  4.4905e-04, -9.9662e-02,  4.9

In [28]:
from sentence_transformers import util, SentenceTransformer

# prepare another embedding model instance
# so we don't have to scroll all the way up
embedding_model = SentenceTransformer(
    model_name_or_path="all-mpnet-base-v2",
    device=device
)

embedding_model

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

In [29]:
query = "macronutrients functions"

query_embedding = embedding_model.encode(
    query,
    convert_to_tensor=True,
)

# benmarking similarity search
from time import perf_counter as timer

start_time = timer()

# top 3 with highest similarity score
dot_scores = util.dot_score(query_embedding, embeddings)[0]

end_time =  timer()

print(f'Query: {query}')
print(
    f'Time taken to get scores on {len(embeddings)} embeddings: {end_time-start_time:.5f} seconds.'
)

top_5 = torch.topk(dot_scores, k=5)
top_5

Query: macronutrients functions
Time taken to get scores on 1680 embeddings: 0.02434 seconds.


torch.return_types.topk(
values=tensor([0.6926, 0.6738, 0.6646, 0.6536, 0.6473], device='cuda:0'),
indices=tensor([42, 47, 41, 51, 46], device='cuda:0'))

In [30]:
import textwrap

def print_wrapped(text, wrap_length=80):
    wrapped_text = textwrap.fill(text, wrap_length)
    print(wrapped_text)

In [31]:
print(f"Query: '{query}'")
print("Result:")

for score, idx in zip(top_5[0], top_5[1]):
    print(f'Score: {score:.4f}')

    print('Text: ')
    print_wrapped(chunk_data[idx]['chunk'])
    print(f'Page No. : {chunk_data[idx]["page_number"]}\n\n')


Query: 'macronutrients functions'
Result:
Score: 0.6926
Text: 
Macronutrients Nutrients that are needed in large amounts are called
macronutrients. There are three classes of macronutrients: carbohydrates,
lipids, and proteins. These can be metabolically processed into cellular energy.
The energy from macronutrients comes from their chemical bonds. This chemical
energy is converted into cellular energy that is then utilized to perform work,
allowing our bodies to conduct their basic functions. A unit of measurement of
food energy is the calorie. On nutrition food labels the amount given for
“calories” is actually equivalent to each calorie multiplied by one thousand. A
kilocalorie (one thousand calories, denoted with a small “c”) is synonymous with
the “Calorie” (with a capital “C”) on nutrition food labels. Water is also a
macronutrient in the sense that you require a large amount of it, but unlike the
other macronutrients, it does not yield calories. Carbohydrates Carbohydrates
are m

### Define similarity functions

In [32]:
import torch

def dot_product(vec_1, vec_2):
    return torch.dot(vec_1, vec_2)


def cosine_similarity(vec_1, vec_2):
    dot_product = torch.dot(vec_1, vec_2)

    norm_vec_1 = torch.sqrt(torch.sum(vec_1 ** 2))
    norm_vec_2 = torch.sqrt(torch.sum(vec_2 ** 2))

    return dot_product / (norm_vec_1 * norm_vec_2)


# Example tensors
vector1 = torch.tensor([1, 2, 3], dtype=torch.float32)
vector2 = torch.tensor([1, 2, 3], dtype=torch.float32)
vector3 = torch.tensor([4, 5, 6], dtype=torch.float32)
vector4 = torch.tensor([-1, -2, -3], dtype=torch.float32)

# Calculate dot product
print("Dot product between vector1 and vector2:", dot_product(vector1, vector2))
print("Dot product between vector1 and vector3:", dot_product(vector1, vector3))
print("Dot product between vector1 and vector4:", dot_product(vector1, vector4))

# Calculate cosine similarity
print(
    "Cosine similarity between vector1 and vector2:",
    cosine_similarity(vector1, vector2),
)
print(
    "Cosine similarity between vector1 and vector3:",
    cosine_similarity(vector1, vector3),
)
print(
    "Cosine similarity between vector1 and vector4:",
    cosine_similarity(vector1, vector4),
)

Dot product between vector1 and vector2: tensor(14.)
Dot product between vector1 and vector3: tensor(32.)
Dot product between vector1 and vector4: tensor(-14.)
Cosine similarity between vector1 and vector2: tensor(1.0000)
Cosine similarity between vector1 and vector3: tensor(0.9746)
Cosine similarity between vector1 and vector4: tensor(-1.0000)


### Functionizing the semantic search pipeline

In [33]:
def retrieve_chunks(
    query: str,
    emdeddings: torch.tensor,
    model: SentenceTransformer,
    topk: int = 5,
):
    query_embedding = model.encode(
        query,
        convert_to_tensor=True,
    )

    dot_scores = util.dot_score(query_embedding, embeddings)[0]

    scores, indices = torch.topk(
        dot_scores,
        k=topk,
    )

    return scores, indices


def print_topk(
    query: str,
    embeddings: torch.tensor,
    document_index: list[dict],
    model: SentenceTransformer,
    topk: int = 5,
):
    scores, indices = retrieve_chunks(
        query,
        embeddings,
        model,
    )

    print(f"Query: {query}")
    print("Result: ")

    for score, idx in zip(scores, indices):
        print(f"Score: {score:.4f}")
        print_wrapped(document_index[idx]["chunk"])
        print(f"Page No. : {document_index[idx]['page_number']}\n\n")

In [34]:
query = "symtomps of pellagra"

scores, indices = retrieve_chunks(query, embeddings, embedding_model)

scores, indices

(tensor([0.4970, 0.3987, 0.3469, 0.3208, 0.3031], device='cuda:0'),
 tensor([ 822,  853, 1530, 1531, 1536], device='cuda:0'))

In [35]:
chunk_embedding_data = chunk_embedding_df.to_dict(orient='records')

print_topk(query, embeddings, chunk_embedding_data, embedding_model)

Query: symtomps of pellagra
Result: 
Score: 0.4970
Niacin deficiency is commonly known as pellagra and the symptoms include
fatigue, decreased appetite, and indigestion.  These symptoms are then commonly
followed by the four D’s: diarrhea, dermatitis, dementia, and sometimes death.
Figure 9.12  Conversion of Tryptophan to Niacin Water-Soluble Vitamins | 565
Page No. : 564


Score: 0.3987
car. Does it drive faster with a half-tank of gas or a full one?It does not
matter; the car drives just as fast as long as it has gas. Similarly, depletion
of B vitamins will cause problems in energy metabolism, but having more than is
required to run metabolism does not speed it up. Buyers of B-vitamin supplements
beware; B vitamins are not stored in the body and all excess will be flushed
down the toilet along with the extra money spent. B vitamins are naturally
present in numerous foods, and many other foods are enriched with them. In the
United States, B-vitamin deficiencies are rare; however in th

### Prepare LLM for local generation

In [36]:
if not "COLAB_GPU" in os.environ:
    from dotenv import load_dotenv
    load_dotenv()

In [37]:
import torch


gpu_mem_bytes = torch.cuda.get_device_properties(0).total_memory
gpu_mem_gb = round(gpu_mem_bytes / (2 ** 30))

print(f'Available GPU Memory: {gpu_mem_gb} GB')


Available GPU Memory: 15 GB


In [38]:
if gpu_mem_gb < 5.1:
    print(f"Your available GPU memory is {gpu_mem_gb}GB, you may not have enough memory to run a Gemma LLM locally without quantization.")
elif gpu_mem_gb < 8.1:
    print(f"GPU memory: {gpu_mem_gb} GB | Recommended model: Gemma 2B in 4-bit precision.")
    use_quantization_config = True
    model_id = "google/gemma-2b-it"
elif gpu_mem_gb < 19.0:
    print(f"GPU memory: {gpu_mem_gb} GB | Recommended model: Gemma 2B in float16 or Gemma 7B in 4-bit precision.")
    use_quantization_config = False
    model_id = "google/gemma-2b-it"
elif gpu_mem_gb > 19.0:
    print(f"GPU memory: {gpu_mem_gb} GB | Recommend model: Gemma 7B in 4-bit or float16 precision.")
    use_quantization_config = False
    model_id = "google/gemma-7b-it"

print(f"use_quantization_config set to: {use_quantization_config}")
print(f"model_id set to: {model_id}")

GPU memory: 15 GB | Recommended model: Gemma 2B in float16 or Gemma 7B in 4-bit precision.
use_quantization_config set to: False
model_id set to: google/gemma-2b-it


In [39]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import is_flash_attn_2_available

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
)


if (is_flash_attn_2_available()) and (torch.cuda.get_device_capability(0)[0] > 8):
    attn_implementation = "flash_attention_2"
else:
    attn_implementation = "sdpa"

print(f"[INFO] Using attention implementation: {attn_implementation}")

model_id = model_id
print(f"[INFO] Using model id: {model_id}")

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=model_id

)

llm_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_id,
    torch_dtype=torch.float16,
    quantization_config=quantization_config if use_quantization_config else None,
    low_cpu_mem_usage=False,
    attn_implementation=attn_implementation
)

if not use_quantization_config:
    llm_model.to(device)

[INFO] Using attention implementation: sdpa
[INFO] Using model id: google/gemma-2b-it


tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [40]:
llm_model

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm((2048,), eps=1e-06)
        (post_attention_layernorm): GemmaRMSNorm((2048,), eps=1e-06)
      )
    )
    (norm): GemmaRMSNorm((2048,), 

In [41]:
def get_model_nparam(model: torch.nn.Module):
    return sum([param.numel() for param in model.parameters()])


def get_model_memsize(model: torch.nn.Module):
    """Get how much memory a model takes up"""

    mem_params = sum(
        [param.nelement() * param.element_size() for param in model.parameters()]
    )
    mem_buffers = sum(
        [buf.nelement() * buf.element_size() for buf in model.buffers()]
    )

    model_mem_bytes = mem_params + mem_buffers
    model_mem_mb = model_mem_bytes / (1024 ** 2)
    model_mem_gb = model_mem_bytes / (1024 ** 3)

    return {
        "model_mem_bytes" : model_mem_bytes,
        "model_mem_mb" : round(model_mem_mb, 2),
        "model_mem_gb" : round(model_mem_gb, 2),
    }

In [42]:
# get the number of parameters in our model
get_model_nparam(llm_model)

2506172416

In [43]:
# get the memory requirement of our model
get_model_memsize(llm_model)

{'model_mem_bytes': 5012345344, 'model_mem_mb': 4780.15, 'model_mem_gb': 4.67}

### Generating text with the LLM

In [44]:
input_text = (
    "What are the macronutrients,"
    " and what roles do they play in the human body?"
)

print(
    f"Query: {input_text}"
)

dialog_template = [
    {
        "role" : "user",
        "content" : input_text,
    }
]

prompt = tokenizer.apply_chat_template(
    conversation=dialog_template,
    tokenize=False,
    add_generation_prompt=True,
)

print(
    f"\nPrompt (formatted):\n{prompt}"
)

Query: What are the macronutrients, and what roles do they play in the human body?

Prompt (formatted):
<bos><start_of_turn>user
What are the macronutrients, and what roles do they play in the human body?<end_of_turn>
<start_of_turn>model



In [45]:
%%time

# tokenize the input text and send it to GPU
input_ids = tokenizer(prompt, return_tensors="pt").to(device)
print(f"Model input (tokenized):\n{input_ids}")

outputs = llm_model.generate(
    **input_ids,
    max_new_tokens=256,
)

print(f"Model output (tokens):\n{outputs[0]}\n")

Model input (tokenized):
{'input_ids': tensor([[     2,      2,    106,   1645,    108,   1841,    708,    573, 186809,
         184592, 235269,    578,   1212,  16065,    749,    984,   1554,    575,
            573,   3515,   2971, 235336,    107,    108,    106,   2516,    108]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1]], device='cuda:0')}
Model output (tokens):
tensor([     2,      2,    106,   1645,    108,   1841,    708,    573, 186809,
        184592, 235269,    578,   1212,  16065,    749,    984,   1554,    575,
           573,   3515,   2971, 235336,    107,    108,    106,   2516,    108,
         21404, 235269,   1517, 235303, 235256,    476,  25497,    576,    573,
        186809, 184592,    578,   1024,  16065,    575,    573,   3515,   2971,
        235292,    109,    688,  12298,   1695, 184592,  66058,    109, 235287,
          5231, 156615,  56227,  66058,    108,    

In [46]:
# Decode the output tokens to text
outputs_decoded = tokenizer.decode(outputs[0])
print(
    f"Model output (decoded):\n{outputs_decoded}\n"
)

Model output (decoded):
<bos><bos><start_of_turn>user
What are the macronutrients, and what roles do they play in the human body?<end_of_turn>
<start_of_turn>model
Sure, here's a breakdown of the macronutrients and their roles in the human body:

**Macronutrients:**

* **Carbohydrates:**
    * Provide energy for the body's cells and tissues.
    * Carbohydrates are the primary source of energy for most cells.
    * Complex carbohydrates are those that take longer to digest, such as whole grains, fruits, and vegetables.
    * Simple carbohydrates are those that are quickly digested, such as sugar, starch, and lactose.

* **Proteins:**
    * Build and repair tissues, enzymes, and hormones.
    * Proteins are essential for immune function, hormone production, and tissue repair.
    * There are different types of proteins, each with specific functions.

* **Fats:**
    * Provide energy, insulation, and help absorb vitamins.
    * Healthy fats include olive oil, avocado, nuts, and seeds.
  

In [47]:

format_output = (
    lambda text: text
        .replace(prompt, '')
        .replace('<bos>', '')
        .replace('<eos>', '')
)

print(f"Input Text: {input_text}\n")
print(f"Output Text:\n{format_output(outputs_decoded)}")

Input Text: What are the macronutrients, and what roles do they play in the human body?

Output Text:
Sure, here's a breakdown of the macronutrients and their roles in the human body:

**Macronutrients:**

* **Carbohydrates:**
    * Provide energy for the body's cells and tissues.
    * Carbohydrates are the primary source of energy for most cells.
    * Complex carbohydrates are those that take longer to digest, such as whole grains, fruits, and vegetables.
    * Simple carbohydrates are those that are quickly digested, such as sugar, starch, and lactose.

* **Proteins:**
    * Build and repair tissues, enzymes, and hormones.
    * Proteins are essential for immune function, hormone production, and tissue repair.
    * There are different types of proteins, each with specific functions.

* **Fats:**
    * Provide energy, insulation, and help absorb vitamins.
    * Healthy fats include olive oil, avocado, nuts, and seeds.
    * Trans fats can raise cholesterol levels and increase the r

### Augmenting prompt with contextual chunks

In [48]:
import textwrap

def print_wrapped(text, wrap_length=79):
    """
    New print_wrapped version that respect the
    indentations of the LLM output and the prompt
    """

    for line in text.splitlines():
        # Preserve leading whitespace (indentation)
        indent = len(line) - len(line.lstrip())
        wrapped = textwrap.fill(
            line,
            width=wrap_length,
            subsequent_indent=' ' * indent,
            replace_whitespace=False,
            drop_whitespace=False,
        )
        print(wrapped)


def prompt_builder(query: str, context: list[dict], tokenizer: AutoTokenizer) -> str:
    """
    Augments query with text-based context.
    """

    context = "- " + "\n- ".join([
        item["chunk"] for item in context
    ])

    base_prompt = """Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.
\nExample 1:
Query: What are the fat-soluble vitamins?
Answer: The fat-soluble vitamins include Vitamin A, Vitamin D, Vitamin E, and Vitamin K. These vitamins are absorbed along with fats in the diet and can be stored in the body's fatty tissue and liver for later use. Vitamin A is important for vision, immune function, and skin health. Vitamin D plays a critical role in calcium absorption and bone health. Vitamin E acts as an antioxidant, protecting cells from damage. Vitamin K is essential for blood clotting and bone metabolism.
\nExample 2:
Query: What are the causes of type 2 diabetes?
Answer: Type 2 diabetes is often associated with overnutrition, particularly the overconsumption of calories leading to obesity. Factors include a diet high in refined sugars and saturated fats, which can lead to insulin resistance, a condition where the body's cells do not respond effectively to insulin. Over time, the pancreas cannot produce enough insulin to manage blood sugar levels, resulting in type 2 diabetes. Additionally, excessive caloric intake without sufficient physical activity exacerbates the risk by promoting weight gain and fat accumulation, particularly around the abdomen, further contributing to insulin resistance.
\nExample 3:
Query: What is the importance of hydration for physical performance?
Answer: Hydration is crucial for physical performance because water plays key roles in maintaining blood volume, regulating body temperature, and ensuring the transport of nutrients and oxygen to cells. Adequate hydration is essential for optimal muscle function, endurance, and recovery. Dehydration can lead to decreased performance, fatigue, and increased risk of heat-related illnesses, such as heat stroke. Drinking sufficient water before, during, and after exercise helps ensure peak physical performance and recovery.
\nNow use the following context items to answer the user query:
{context}
\nRelevant passages: <extract relevant passages from the context here>
User query: {query}
Answer:"""

    augmented = base_prompt.format(
        context=context,
        query=query,
    )

    dialog_template = [{
        "role" : "user",
        "content" : augmented,
    }]

    prompt = tokenizer.apply_chat_template(
        conversation=dialog_template,
        tokenize=False,
        add_generation_prompt=True,
    )

    return prompt

In [49]:
# Nutrition-style questions generated with GPT4
gpt4_questions = [
    "What are the macronutrients, and what roles do they play in the human body?",
    "How do vitamins and minerals differ in their roles and importance for health?",
    "Describe the process of digestion and absorption of nutrients in the human body.",
    "What role does fibre play in digestion? Name five fibre containing foods.",
    "Explain the concept of energy balance and its importance in weight management."
]

# Manually created question list
manual_questions = [
    "How often should infants be breastfed?",
    "What are symptoms of pellagra?",
    "How does saliva help with digestion?",
    "What is the RDI for protein per day?",
    "water soluble vitamins"
]

query_list = gpt4_questions + manual_questions

In [50]:
query = random.choice(query_list)

print(f"Query: {query}")

scores, indices = retrieve_chunks(query, embeddings, embedding_model)

context = [chunk_embedding_data[i] for i in indices]

prompt = prompt_builder(
    query,
    context,
    tokenizer
)

print_wrapped(prompt)

Query: Describe the process of digestion and absorption of nutrients in the human body.
<bos><start_of_turn>user
Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context 
before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.

Example 1:
Query: What are the fat-soluble vitamins?
Answer: The fat-soluble vitamins include Vitamin A, Vitamin D, Vitamin E, and 
Vitamin K. These vitamins are absorbed along with fats in the diet and can be 
stored in the body's fatty tissue and liver for later use. Vitamin A is 
important for vision, immune function, and skin health. Vitamin D plays a 
critical role in calcium absorption and bone health. Vitamin E acts as an 
antioxidant, protecting cells from damage. Vitamin K is essential for blood 
clotting and bone metabolism.

E

In [54]:
input_ids = tokenizer(prompt, return_tensors="pt").to(device)

outputs = llm_model.generate(
    **input_ids,
    temperature=0.7,
    do_sample=True,
    max_new_tokens=512,
)

output_text = tokenizer.decode(outputs[0])

print(f"Query: {query}\n")
# print_wrapped(f"RAG answer:\n{format_output(output_text)}")

print_wrapped(f"RAG answer:\n{format_output(output_text)}")

Query: Describe the process of digestion and absorption of nutrients in the human body.

RAG answer:
Sure, here's a summary of the process of digestion and absorption of nutrients 
in the human body:

**The process of digestion begins even before you put food into your mouth.** 
When you feel hungry, your body sends a message to your brain that it is time 
to eat. The brain then tells the mouth to get ready, and you start to salivate 
in preparation for a meal. Once you have eaten, the digestive system starts the
 process that breaks down the components of food into smaller components that 
can be absorbed and taken into the body.

**The digestive system is one of the eleven organ systems of the human body, 
and it is composed of several hollow tube-shaped organs including the mouth, 
pharynx, esophagus, stomach, small intestine, large intestine (colon), rectum, 
and anus.** The digestive system functions on two levels: mechanically to move 
and mix ingested food and chemically to brea