<a href="https://colab.research.google.com/github/wcmay/deeplearningAIT/blob/main/Carl_May_AIT_RAG_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


In [None]:
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow<10.1.0,>=8.3.2, but you have pillow 10.3.0 which is incompatible.[0m[31m
[0m

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
books = ["https://www.gutenberg.org/cache/epub/73442/pg73442-images.html",
         "https://www.gutenberg.org/cache/epub/73508/pg73508-images.html",
         "https://www.gutenberg.org/cache/epub/73504/pg73504-images.html",
         "https://www.gutenberg.org/cache/epub/73428/pg73428-images.html",
         "https://www.gutenberg.org/cache/epub/72181/pg72181-images.html"
         ]

prompts = ["Describe some of the alien creatures that appear in Edward Hamilton's short stories",
           "Describe the encounter with the fire fountain in \"Sunfire!\" by Edward Hamilton.",
           "In \"The Star-Stealers\" by Edward Hamilton, who is Hurus Hol?",
           "What is the book \"The Adventures of Huckleberry Finn\" by Mark Twain about?"]

In [None]:
model_name = "microsoft/Phi-3-mini-128k-instruct"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
template = """
<|user|>
{text}<|end|>
<|assistant|>
"""

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

result = llm(prompt.format(text=prompts[0]))
print(fill(result.strip(), width=100))

#The LLM's response is not accurate

<|user|> Describe some of the alien creatures that appear in Edward Hamilton's short stories<|end|>
<|assistant|>  Edward Hamilton, a prolific writer known for his science fiction and fantasy works,
has created an array of intriguing extraterrestrial beings across his numerous short stories. Here
are a few examples:   1. **The Sphinx-like Creature** - In one story, Hamilton introduces us to a
creature with the body of a lion but the head of a sphinx. This being is characterized by its
riddling speech patterns and enigmatic nature, often leaving readers pondering over their true
intentions or origins.   2. **The Crystal Entity** - Another notable creation is a sentient
crystalline entity from another dimension. It communicates through reflections within its own
structure, creating complex images and scenes which serve as messages or puzzles for those who dare
to interact with it.   3. **The Shapeshifting Beast** - A particularly versatile creature featured
in several tales can alter its

In [None]:
template = """
<|system|>
Use the following context to answer the question at the end. Do not use any other information. Don't try to make up an answer. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING.

{context}

Answer the following question only if it rleates to the context above. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING. <|end|>
<|user|>
{question}<|end|>
<|assistant|>
"""

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

In [None]:
web_loader = UnstructuredURLLoader(
    urls=books, mode="elements", strategy="fast",
    )
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

1083

In [None]:
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"

db_web = FAISS.from_documents(chunked_web_doc, embeddings)



In [None]:
prompt = PromptTemplate(template=template, input_variables=["context", "question"])
Chain_web = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    # retriever=db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 5, 'score_threshold': 0.8})
    # Similarity Search is the default way to retrieve documents relevant to a query, but we can use MMR by setting search_type = "mmr"
    # k defines how many documents are returned; defaults to 4.
    # score_threshold allows to set a minimum relevance for documents returned by the retriever, if we are using the "similarity_score_threshold" search type.
    # return_source_documents=True, # Optional parameter, returns the source documents used to answer the question
    retriever=db_web.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 10, 'score_threshold': 0.1}),
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
result = Chain_web.invoke(prompts[0])
result

{'query': "Describe some of the alien creatures that appear in Edward Hamilton's short stories",
 'result': '\n<|system|>\nUse the following context to answer the question at the end. Do not use any other information. Don\'t try to make up an answer. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING.\n\nAuthor: Edmond Hamilton\n\nAuthor: Edmond Hamilton\n\nAuthor: Edmond Hamilton\n\nAuthor: Edmond Hamilton\n\nAuthor: Edmond Hamilton\n\nFor in these creatures was no single point of resemblance to anything\r\nhuman, nothing which the appalled intelligence could seize upon as\r\nfamiliar. Imagine an upright cone of black flesh, several feet in\r\ndiameter and three or more in height, supported by a dozen or more\r\nsmooth long tentacles which branched from its lower end—supple,\r\nboneless octopus-arms which held the cone-body upright and which served\r\nboth as arms and legs. And near the top of that cone trunk were the\r\nonly features, the twin tiny orifice

In [None]:
result = Chain_web.invoke(prompts[1])
result

{'query': 'Describe the encounter with the fire fountain in "Sunfire!" by Edward Hamilton.',
 'result': '\n<|system|>\nUse the following context to answer the question at the end. Do not use any other information. Don\'t try to make up an answer. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING.\n\nThe five bright things had flashed down toward the great fire-fountain.\r\nThey plunged into it, out of it, climbed swift as the eye could follow,\r\nracing up its mighty geyser, frolicking in it joyously. The fountain\r\nraved higher and the five sped up and whirled and danced upon its\r\nrising plume, and Kellard thought that they were laughing.\n\nIn the gray wall to his right, miles away, was a great, slitlike\r\nopening near the roof, an opening through which there poured down a\r\nmighty torrent of blazing, liquid fire, a colossal Niagara of molten\r\nflame whose crimson, blazing radiance shot out a quivering glare which\r\nlit luridly the whole mighty cav

In [None]:
result = Chain_web.invoke(prompts[2])
result

{'query': 'In "The Star-Stealers" by Edward Hamilton, who is Hurus Hol?',
 'result': '\n<|system|>\nUse the following context to answer the question at the end. Do not use any other information. Don\'t try to make up an answer. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING.\n\nHurus Hol ceased, intently scanning my face. A moment I sat silent,\r\nthen rose and stepped to the great open window at the room\'s far side.\r\nOutside stretched the greenery of gardens, and beyond them the white\r\nroofs of buildings, gleaming beneath the faint sunlight. Instinctively\r\nmy eyes went up to the source of that light, the tiny sun, small and\r\nfaint and far, here, but still—the sun. A long moment I gazed up\r\ntoward it, and then turned back to Hurus Hol.\n\nAt once Hurus Hol led the way directly down the street toward the\r\nheart of the city, and as we hastened on beside him he answered to my\r\nquestion, "We must get to the city\'s center. There\'s something t

In [None]:
result = Chain_web.invoke(prompts[3])
result

{'query': 'What is the book "The Adventures of Huckleberry Finn" by Mark Twain about?',
 'result': '\n<|system|>\nUse the following context to answer the question at the end. Do not use any other information. Don\'t try to make up an answer. If the question is not relevant to the context, DO NOT RESPOND AND SAY NOTHING.\n\nUntil late that night the town\'s bright-lighted streets remained\r\ncrowded with unaccustomed throngs of citizens arguing the matter,\r\nsometimes heatedly, or exchanging jests concerning it with passing\r\nfriends. By most, indeed, the matter was treated more as an elaborate\r\njoke than anything else, yet one might have sensed also among those\r\nshifting throngs an unspoken elation, a curious pride. Whatever was\r\nbehind the thing, they felt, it was at least bringing fame to Brinton.\r\nNorth and south and east and west, they knew, the wires would be\r\nflashing the story. All the nation would read of it, in the morning.\r\nAnd in the morning, too, the swamp wou