<a href="https://colab.research.google.com/github/yojuna/local_llm_RAG/blob/main/schrodinger_what_is_life_mistral_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notebook running a local Mistral 7b Instruct Model, chained with Retrieval Augmented Generation (RAG), for conversing with the legendary collection of essays, in Erwin Schrödinger's What Is Life?

In [1]:
# colab autoreload

%load_ext autoreload
%autoreload 2

In [None]:
## Installation/setup

# Reading in PDF Files
!pip install -q -U pypdf
# Setting Up Vector Store
!pip install -q -U chromadb
# Using Llama-7b-GPTQ LLM model in HuggingFace
!pip install q -U torch auto-gptq transformers optimum
# LangChain - Loading PDFs, Text Chunking, BGE Embeddings, Retrieval QA Chain
!pip install -q -U langchain sentence_transformers

!pip install -q -U torch datasets transformers tensorflow langchain playwright html2text sentence_transformers faiss-cpu
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 trl==0.4.7

In [3]:
import os

# Import torch
import torch

# Import for loading PDFs from Google Drive.
# Note: Not needed if GDrive is already mounted or we are using wget to get files from Web.
# from google.colab import drive

# Imports for LLM
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline
)

from langchain.llms import HuggingFacePipeline
# from langchain import PromptTemplate  #, LLMChain
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough

# Imports to read PDFs and other types of documents
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_transformers import Html2TextTransformer
from langchain.document_loaders import AsyncChromiumLoader

from peft import LoraConfig, PeftModel

## vector stores: Chroma and FAISS
from langchain.vectorstores import Chroma
from langchain.vectorstores import FAISS

# Import the embeddings
from langchain.embeddings import HuggingFaceBgeEmbeddings

# Imports for QA Retrieval Chain
from langchain.chains import RetrievalQA

# Import to Clenup LLM Output
import textwrap
from pprint import pprint

In [5]:
# import os
# import torch
# from transformers import (
#     AutoModelForCausalLM,
#     AutoTokenizer,
#     BitsAndBytesConfig,
#     pipeline
# )
# from datasets import load_dataset
# from peft import LoraConfig, PeftModel

# from langchain.text_splitter import CharacterTextSplitter
# from langchain.document_transformers import Html2TextTransformer
# from langchain.document_loaders import AsyncChromiumLoader

# from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# from langchain.vectorstores import FAISS

# # Imports to read PDF and setup Chroma Vector Store
# from langchain.document_loaders import PyPDFLoader, DirectoryLoader
# from langchain.vectorstores import Chroma
# from langchain.text_splitter import RecursiveCharacterTextSplitter

# from langchain.prompts import PromptTemplate
# from langchain.schema.runnable import RunnablePassthrough
# from langchain.llms import HuggingFacePipeline

## Setup the LLM

In [5]:
#################################################################
# The Model Id
#################################################################

model_name='mistralai/Mistral-7B-Instruct-v0.2'

In [6]:
#################################################################
# Tokenizer
#################################################################

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

#################################################################
# bitsandbytes parameters
#################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

#################################################################
# Set up quantization config
#################################################################
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

#################################################################
# Load pre-trained config
#################################################################
mistral_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [8]:
# check the number of trainable parameters

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(mistral_model))

trainable model parameters: 262410240
all model parameters: 3752071168
percentage of trainable model parameters: 6.99%


In [9]:
# setup the LLM pipelines

standalone_query_generation_pipeline = pipeline(
 model=mistral_model,
 tokenizer=tokenizer,
 task="text-generation",
 temperature=0.0,
 repetition_penalty=1.1,
 return_full_text=True,
 max_new_tokens=1000,
)
standalone_query_generation_llm = HuggingFacePipeline(pipeline=standalone_query_generation_pipeline)

response_generation_pipeline = pipeline(
 model=mistral_model,
 tokenizer=tokenizer,
 task="text-generation",
 temperature=0.2,
 repetition_penalty=1.1,
 return_full_text=True,
 max_new_tokens=1000,
)
response_generation_llm = HuggingFacePipeline(pipeline=response_generation_pipeline)

## Get the documents/ data

In [11]:
# Get the Document / Textbook

## Need to run only once

# ## Feynman lectures on physics
# ## note: ^^ this pdf does not have extractable text for the whole textbook; need to use an OCR version

! mkdir -p docs

! wget https://archive.org/download/feynman-lectures-on-physics-volumes-1-2-3-feynman-leighton-and-sands/Feynman%20Lectures%20on%20Physics%20Volumes%201%2C2%2C3%20-%20Feynman%2C%20Leighton%20and%20Sands.pdf -O docs/archive-feynman-lectures.pdf
# or # ! wget https://antilogicalism.com/wp-content/uploads/2018/04/feynman-lectures.pdf -O docs/feynman-lectures.pdf

! wget http://strangebeautiful.com/other-texts/schrodinger-what-is-life-mind-matter-auto-sketches.pdf -O docs/what-is-life.pdf

## add more .pdf files to the docs folder as needed

--2024-01-29 20:16:03--  https://archive.org/download/feynman-lectures-on-physics-volumes-1-2-3-feynman-leighton-and-sands/Feynman%20Lectures%20on%20Physics%20Volumes%201%2C2%2C3%20-%20Feynman%2C%20Leighton%20and%20Sands.pdf
Resolving archive.org (archive.org)... 207.241.224.2
Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ia802508.us.archive.org/29/items/feynman-lectures-on-physics-volumes-1-2-3-feynman-leighton-and-sands/Feynman%20Lectures%20on%20Physics%20Volumes%201%2C2%2C3%20-%20Feynman%2C%20Leighton%20and%20Sands.pdf [following]
--2024-01-29 20:16:03--  https://ia802508.us.archive.org/29/items/feynman-lectures-on-physics-volumes-1-2-3-feynman-leighton-and-sands/Feynman%20Lectures%20on%20Physics%20Volumes%201%2C2%2C3%20-%20Feynman%2C%20Leighton%20and%20Sands.pdf
Resolving ia802508.us.archive.org (ia802508.us.archive.org)... 207.241.228.198
Connecting to ia802508.us.archive.org (ia802508.u

In [12]:
# load document from directory
loader = DirectoryLoader('docs/', glob="./*.pdf", loader_cls=PyPDFLoader)

documents = loader.load()

In [13]:
# number of pages in the pdfs
len(documents)

1572

In [14]:
# LLM Token Chunksize varies based on Context Window. LLaMA2 Context Window is 4096 tokens.
# For QA want to pick larger chunk size with some overlap to get context.
CHUNK_SIZE, CHUNK_OVERLAP = 1000, 200


text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE,
                                               chunk_overlap=CHUNK_OVERLAP)
texts = text_splitter.split_documents(documents)

len(texts)

764

In [15]:
# random sample

text_ID = 700
texts[text_ID]

Document(page_content='MindandMatter\n(notably intheZeeman andStarkeffects)someofthespectral\nlinesarepolarized. Tocomplete thephysical description in\nthisrespect, inwhichthehumaneyeisentirely insensitive,\nyouputapolarizer (aNicolprism)inthepathofthebeam,\nbeforedecomposing it;onslowlyrotating theNicolaroundits\naxiscertain linesareextinguished orreduced tominimal\nbrightness forcertainorientations oftheNicol,whichindicate\nthedirection (orthogonal tothebeam)oftheirtotalorpartial\npolariza tion.\nOncethiswholetechnique isdeveloped, itcanbeextended\nfarbeyond thevisibleregion.Thespectral linesofglowing\nvapours arebynomeansrestricted tothevisibleregion,which\nisnotdistinguished physically. Thelinesformlong,theoret\xad\nicallyinfinite series.Thewave-lengths ofeachseriesare\nconnected byarelatively simplemathematical law,peculiar\ntoit,thatholdsuniformly throughout theserieswithno\ndistinctionofthatpartoftheseriesthathappens tolieinthe\nvisibleregion.Theseseriallawswerefirstfoundempiric

### Alternative Data sources

#### Alternative: Extract data from News APIs

In [None]:
## Optional: If extracting News data using Google News API
# google news api

!pip install GoogleNews
!pip install newspaper3k

In [17]:
from GoogleNews import GoogleNews
from newspaper import Article
import pandas as pd

In [19]:
START_DATE = '05/01/2024'
END_DATE = '28/01/2024'

SEARCH_TERM = 'Finance'

googlenews = GoogleNews(start=START_DATE, end=END_DATE)
googlenews.search(SEARCH_TERM)

result = googlenews.result()

# save a dataframe with the results and display top rows
df = pd.DataFrame(result)
df.head()

Unnamed: 0,title,media,date,datetime,desc,link,img
0,Stock market today: US stocks drift higher to ...,Yahoo Finance,0 mins ago,2024-01-29 20:21:33.762034,,https://finance.yahoo.com/news/stock-market-to...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
1,panel urges Treasury Dept to make tech firms c...,Yahoo Finance,0 mins ago,2024-01-29 20:21:33.765791,,https://finance.yahoo.com/news/panel-urges-tre...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
2,January jobs report: One thing that may be pro...,Yahoo Finance,5 mins ago,2024-01-29 20:16:33.768993,,https://finance.yahoo.com/video/january-jobs-r...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
3,Finance & Restructuring Partner Jeremy Finkels...,The Joplin Globe,1 hour ago,2024-01-29 19:21:33.785958,,https://www.joplinglobe.com/region/national_bu...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
4,"Oil drops with OPEC+ output steady, traders aw...",Yahoo Finance,1 hour ago,2024-01-29 19:21:33.803468,,https://finance.yahoo.com/news/oil-jumps-attac...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."


In [20]:
# can extract more articles from the API by looping through

NUM_SEARCH_PAGES = 10

for i in range(2, NUM_SEARCH_PAGES):
    googlenews.getpage(i)
    result = googlenews.result()
    df = pd.DataFrame(result)

In [21]:
df

Unnamed: 0,title,media,date,datetime,desc,link,img
0,Stock market today: US stocks drift higher to ...,Yahoo Finance,0 mins ago,2024-01-29 20:21:33.762034,,https://finance.yahoo.com/news/stock-market-to...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
1,panel urges Treasury Dept to make tech firms c...,Yahoo Finance,0 mins ago,2024-01-29 20:21:33.765791,,https://finance.yahoo.com/news/panel-urges-tre...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
2,January jobs report: One thing that may be pro...,Yahoo Finance,5 mins ago,2024-01-29 20:16:33.768993,,https://finance.yahoo.com/video/january-jobs-r...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
3,Finance & Restructuring Partner Jeremy Finkels...,The Joplin Globe,1 hour ago,2024-01-29 19:21:33.785958,,https://www.joplinglobe.com/region/national_bu...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
4,"Oil drops with OPEC+ output steady, traders aw...",Yahoo Finance,1 hour ago,2024-01-29 19:21:33.803468,,https://finance.yahoo.com/news/oil-jumps-attac...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
...,...,...,...,...,...,...,...
85,Three Ways Equipment Finance Will Evolve In 2024,Forbes,5 hours ago,2024-01-29 15:23:18.885905,,https://www.forbes.com/sites/forbesbusinesscou...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
86,Big Tech earnings: What markets are expecting ...,Yahoo Finance,5 hours ago,2024-01-29 15:23:18.906066,,https://finance.yahoo.com/video/big-tech-earni...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
87,Egypt considers issuing UAE dirham-denominated...,Ahram Online,5 hours ago,2024-01-29 15:23:18.925983,,https://english.ahram.org.eg/NewsContent/3/12/...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
88,The Zacks Analyst Blog Highlights Simmons Firs...,Yahoo Finance,6 hours ago,2024-01-29 14:23:18.951672,,https://finance.yahoo.com/news/zacks-analyst-b...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."


#### Alternative: Extract web pages/ blog articles

In [None]:
## uncomment if smooth browser functionality is required for using chrome/firefox web drivers

# !playwright install
# !playwright install-deps

In [None]:
# Alternative
# Inference over URL, using chromium driver

import nest_asyncio
nest_asyncio.apply()

# Articles to index
## Andrej Karpathy: Software 2.0 Article
articles = ["https://karpathy.medium.com/software-2-0-a64152b37c35",]

# Scrapes the blogs above
loader = AsyncChromiumLoader(articles)
docs = loader.load()

## Create embeddings and vector db

Create Retriever Embeddings - HF BGE Embeddings

BGE Embeddings are at the top of the leader board on Hugging Face (https://huggingface.co/spaces/mteb/leaderboard).


In [22]:
# BGE Embedding Model for Retrieval. Embedding Size is 768.
model_name = "BAAI/bge-base-en"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

model_embedding = HuggingFaceBgeEmbeddings(
                    model_name=model_name,
                    model_kwargs={'device': 'cuda'},
                    encode_kwargs=encode_kwargs
                  )

Create the Vector DB Store Using Chroma DB

In [23]:
%%time
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
# Creating Vector Store takes ~ 2 mins

persist_directory = 'db'

## Here is the nmew embeddings being used
embedding = model_embedding

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

CPU times: user 15.6 s, sys: 88.3 ms, total: 15.7 s
Wall time: 16.5 s


In [24]:
# Returns the Top-k chunks from vectordb. Set to 2 to check.
retriever = vectordb.as_retriever(search_kwargs={"k": 2})

### Check retrieval from Chroma db

In [25]:
# # Approach 1: Use Query to do Similarity Search

query = "What is The Physical Basis of Consciousness?"


docs = vectordb.similarity_search(query)

print("# of results: ", len(docs))

# First page
print(docs[0].page_content)
print("\n")
print(docs[0].metadata)

# Last page of results
print(docs[-1].page_content)
print("\n")
print(docs[-1].metadata)

# of results:  4
CHAPTER I
ThePhysicalBasisofConsciousness
THEPROBLEM
Theworldisaconstruct ofoursensations, perceptions,
memories. Itisconvenient toregarditasexisting objectively
onitsown.Butitcertainly doesnotbecome manifest byits
mereexistence. Itsbecoming manifest isconditional onvery
specialgoings-on inveryspecialpartsofthisveryworld,
namelyoncertaineventsthathappen inabrain.Thatisan
inordinately peculiar kindofimplication, whichprompts the
question: Whatparticular properties distinguish thesebrain
processes andenablethemtoproduce themanifestation? Can
weguesswhichmaterial processes havethispower,which
not?Orsimpler: Whatkindofmaterial process isdirectly
associated withconsciousness?
Arationalist maybeinclined todealcurtlywiththis
question, roughly asfollows.Fromourownexperience, andas
regards thehigheranimals fromanalogy, consciousness is
linkedupwithcertainkindsofeventsinorganized, living
matter,namely, withcertainnervous functions. Howfarback
or'down'intheanimalkingdom thereisst

In [26]:
# # Approach 2: Use Embedding Vector to do Similarity Search
embedding_vector = embedding.embed_query(query)

docs = vectordb.similarity_search_by_vector(embedding_vector)

print("# of results: ", len(docs))

# First page
print(docs[0].page_content)
print("\n")
print(docs[0].metadata)

# Last page of results
print(docs[-1].page_content)
print("\n")
print(docs[-1].metadata)

# of results:  4
CHAPTER I
ThePhysicalBasisofConsciousness
THEPROBLEM
Theworldisaconstruct ofoursensations, perceptions,
memories. Itisconvenient toregarditasexisting objectively
onitsown.Butitcertainly doesnotbecome manifest byits
mereexistence. Itsbecoming manifest isconditional onvery
specialgoings-on inveryspecialpartsofthisveryworld,
namelyoncertaineventsthathappen inabrain.Thatisan
inordinately peculiar kindofimplication, whichprompts the
question: Whatparticular properties distinguish thesebrain
processes andenablethemtoproduce themanifestation? Can
weguesswhichmaterial processes havethispower,which
not?Orsimpler: Whatkindofmaterial process isdirectly
associated withconsciousness?
Arationalist maybeinclined todealcurtlywiththis
question, roughly asfollows.Fromourownexperience, andas
regards thehigheranimals fromanalogy, consciousness is
linkedupwithcertainkindsofeventsinorganized, living
matter,namely, withcertainnervous functions. Howfarback
or'down'intheanimalkingdom thereisst

## Run LLM on data

### Prompt template setup

In [27]:
prompt = "What is Consciousness?"

prompt_template=f'''[INST] <>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<>
{prompt}[/INST]

'''

In [31]:
print("\n\n*** Generate:")

pprint(response_generation_pipeline(prompt_template)[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




*** Generate:
('[INST] <>\n'
 'You are a helpful, respectful and honest assistant. Always answer as '
 'helpfully as possible, while being safe.  Your answers should not include '
 'any harmful, unethical, racist, sexist, toxic, dangerous, or illegal '
 'content. Please ensure that your responses are socially unbiased and '
 'positive in nature. If a question does not make any sense, or is not '
 'factually coherent, explain why instead of answering something not correct. '
 "If you don't know the answer to a question, please don't share false "
 'information.\n'
 '<>\n'
 'What is Consciousness?[/INST]\n'
 '\n'
 "Consciousness refers to an individual's subjective experience of the world "
 'around them. It involves the ability to perceive, process, and respond to '
 'information from the environment, as well as having self-awareness and '
 'introspective abilities. The exact nature of consciousness and how it arises '
 'from physical processes in the brain is still a topic of ongoing

## Setup RAG Chain

RAG Chain = LLM + Retriever + Query Prompt

In [32]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

In [33]:
qa_chain = RetrievalQA.from_chain_type(llm=response_generation_llm,
                                  chain_type="stuff",
                                  retriever=retriever,
                                  return_source_documents=True)

### Check RAG responses

In [34]:
query = "What are the author's thoughts on Determinism and Free Will?"

llm_response = qa_chain(query)
llm_response['result'].split('\n')

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[" Schrödinger expresses his concern about the Western world's lack of appreciation for the importance of the mind and soul, which he believes have been overshadowed by the focus on external objects of cognizance. He acknowledges that psychologists like Jung may be more sensitive to this issue due to their field of study. Schrödinger also recognizes the danger of a rapid withdrawal from the ancient position that the mind and soul are interconnected with matter. However, he does not directly discuss determinism and free will in this passage. Instead, he emphasizes the need for the relatively new science of psychology to have living-space and reconsider the initial gambit, which implies a consideration of both material and non-material aspects of reality."]

In [35]:
query = "Is Life Based on the Laws of Physics? Give your own thoughts about this in the end."

llm_response = qa_chain(query)
llm_response['result'].split('\n')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[' According to Erwin Schrödinger, life is based on the laws of physics but it requires a new type of physical law to explain its behavior. He believes that living matter has a unique structure that cannot be fully understood through the ordinary laws of physics alone. While the foundation of biology may not have been absolutely essential to dig up the deep roots and find the picture on quantum mechanics, Schrödinger strongly insists on the quantum-mechanical point of view because he believes that it provides a more complete understanding of the biological world. However, he also acknowledges that this new type of physical law is not yet fully understood and may require non-physical or super-physical elements to explain certain aspects of life. In summary, while the laws of physics provide a foundation for understanding life, they do not fully explain its complex behaviors and require a new type of physical law to provide a complete understanding.',
 '',
 'My Thoughts: I agree with Sch

## RAG with better prompting

In [37]:
## Default LLaMA-2 prompt style // taken from example that used Llama2 and not Mistral-7b
## refs:
## https://github.com/jai-llm/RAG_Docs_LLaMA2/blob/main/RAG_HastieBooks_chromaDB_V3.ipynb
## model_id = "TheBloke/Llama-2-7b-Chat-GPTQ"

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<>\n", "\n<>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

def get_prompt(instruction, new_system_prompt=DEFAULT_SYSTEM_PROMPT ):
    SYSTEM_PROMPT = B_SYS + new_system_prompt + E_SYS
    prompt_template =  B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt_template

In [38]:
sys_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. """

instruction = """CONTEXT:/n/n {context}/n

Question: {question}"""

get_prompt(instruction, sys_prompt)

"[INST]<>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. \n<>\n\nCONTEXT:/n/n {context}/n\n\nQuestion: {question}[/INST]"

In [39]:
prompt_template = get_prompt(instruction, sys_prompt)

mistral_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [40]:
chain_type_kwargs = {"prompt": mistral_prompt}

In [41]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

In [42]:
# create the chain to answer questions
qa_chain = RetrievalQA.from_chain_type(llm=response_generation_llm,
                                       chain_type="stuff",
                                       retriever=retriever,
                                       chain_type_kwargs=chain_type_kwargs,
                                       return_source_documents=True)

In [43]:
## Cite sources
def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text

def process_llm_response(llm_response):
    print(wrap_text_preserve_newlines(llm_response['result']))
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])


#### Use Prompted RAG to Answer Some contextual Questions

RAG with better prompting gives us good responses. It figures out we are talking about specifically.

It also provides the source chunks we used to provide the answer which makes it easier to verify the response.


In [44]:
# Example 1

query = "What is Entropy? Explain in detail what Schrodinger is talking about in this context."

llm_response = qa_chain(query)
process_llm_response(llm_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 Entropy is a physical concept described as the measure of disorder or randomness in a system. In this
context, Erwin Schrödinger is discussing the statistical meaning of entropy, which was revealed through the
investigations of Ludwig Boltzmann and Gibbs in statistical physics. Entropy is quantitatively expressed as
klogD, where k is the Boltzmann constant and D is a quantitative measure of atomistic disorder of the body in
question. The disorder it indicates is partly that of heat, such as when a solid melts and its entropy
increases due to the heat absorbed. However, the striking difference between an organism and other systems
lies in their behavior at absolute zero temperature, where molecular disorder has no bearing on physical
events. This fact was discovered experimentally by Walther Nernst and is known as Nernst's Theorem or the
Third Law of Thermodynamics.


Sources:
docs/what-is-life.pdf
docs/what-is-life.pdf
docs/what-is-life.pdf
docs/what-is-life.pdf
docs/what-is-life.pdf


In [45]:
# Example 2
## relevant question for the Feynman Lectures textbook

query = "What is Conservation of Energy?"

llm_response = qa_chain(query)
process_llm_response(llm_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 The Conservation of Energy is a fundamental law of physics stating that the total energy of a closed system
remains constant, regardless of the changes that occur within the system. It is a mathematical principle that
can be expressed as a numerical quantity, which does not change when something happens. Energy can exist in
various forms such as gravitational, kinetic, heat, elastic, electrical, chemical, radiant, nuclear, and mass
energy. The total energy of a system is calculated by adding up the formulas for each type of energy, with the
exception of energy going in or out of the system.


Sources:
docs/feynman-lectures.pdf
docs/archive-feynman-lectures.pdf
docs/archive-feynman-lectures.pdf
docs/feynman-lectures.pdf
docs/archive-feynman-lectures.pdf


## Alternative approach

Create PromptTemplate and LLMChain

refs:

[blog/guide: (Part 2) Build a Conversational RAG with Mistral-7B and LangChain: Madhav Thaker](https://medium.com/@thakermadhav/part-2-build-a-conversational-rag-with-langchain-and-mistral-7b-6a4ebe497185)

[github: madhavthaker1 / llm/rag/conversational_rag.ipynb](https://github.com/madhavthaker1/llm/blob/main/rag/conversational_rag.ipynb)

In [46]:
# imports

from langchain.schema import format_document
from langchain_core.messages import get_buffer_string
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain_core.prompts.chat import ChatPromptTemplate

from operator import itemgetter

### Prompt template for instructing with follow up questions

In [47]:
_template = """
[INST]
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language, that can be used to query a Chroma DB vector index. This query will be used to retrieve documents with additional context.

Let me share a couple examples that will be important.

If you do not see any chat history, you MUST return the "Follow Up Input" as is:

```
Chat History:

Follow Up Input: What is Entropy?
Standalone Question:
What is Entropy?
```

If this is the second question onwards, you should properly rephrase the question like this:

```
Chat History:
Human: What is Entropy?
AI:
Entropy is a physical property measured in calories per degree Celsius (cal/oC) that quantifies the disorder or randomness of a system.

Follow Up Input: How is it measured?
Standalone Question:
How is Entropy measured?
```

Now, with those examples, here is the actual chat history and input question.

Chat History:
{chat_history}

Follow Up Input: {question}
Standalone question:
[your response here]
[/INST]
"""

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [48]:
template = """
[INST]
Answer the question based only on the following context:
{context}

Question: {question}
[/INST]
"""

ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

In [49]:
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")


def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

Chain with conversational memory, and Standalone Questions Generation Chain

In [50]:
# Instantiate ConversationBufferMemory
memory = ConversationBufferMemory(
 return_messages=True, output_key="answer", input_key="question"
)

# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = {
    "standalone_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: get_buffer_string(x["chat_history"]),
    }
    | CONDENSE_QUESTION_PROMPT
    | standalone_query_generation_llm,
}
# Now we retrieve the documents
retrieved_documents = {
    "docs": itemgetter("standalone_question") | retriever,
    "question": lambda x: x["standalone_question"],
}
# Now we construct the inputs for the final prompt
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}
# And finally, we do the part that returns the answers
answer = {
    "answer": final_inputs | ANSWER_PROMPT | response_generation_llm,
    "question": itemgetter("question"),
    "context": final_inputs["context"]
}
# And now we put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer

In [51]:
def call_conversational_rag(question, chain, memory):
    """
    Calls a conversational RAG (Retrieval-Augmented Generation) model to generate an answer to a given question.

    This function sends a question to the RAG model, retrieves the answer, and stores the question-answer pair in memory
    for context in future interactions.

    Parameters:
    question (str): The question to be answered by the RAG model.
    chain (LangChain object): An instance of LangChain which encapsulates the RAG model and its functionality.
    memory (Memory object): An object used for storing the context of the conversation.

    Returns:
    dict: A dictionary containing the generated answer from the RAG model.
    """

    # Prepare the input for the RAG model
    inputs = {"question": question}

    # Invoke the RAG model to get an answer
    result = chain.invoke(inputs)

    # Save the current question and its answer to memory for future context
    memory.save_context(inputs, {"answer": result["answer"]})

    # Return the result
    return result

### Ask yer' questions

In [52]:
# Initial Question

question = "What is the Hereditary Mechanism?"

result = call_conversational_rag(question, final_chain, memory)

pprint(result)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'answer': 'The hereditary mechanism referred to in genetics is the '
           'chromosomes, which carry genetic information from parents to '
           'offspring. However, the text discusses how behavior and skills can '
           'also be inherited and contribute to adaptive evolution, even if '
           'they are not directly transmitted through physical changes in the '
           'chromosomes. Instead, these behaviors and skills can be learned '
           'and passed down through teaching or other means, opening the door '
           'for future inheritable mutations to be selected and integrated '
           'into the species.',
 'context': 'bythehereditary substance, thechromosomes. Atfirst,\n'
            'therefore, itiscertainly notfixedgenetically anditisdifficult\n'
            'toseehowitshouldevercometobeincorporated inthe\n'
            'hereditary treasure. Thisisanimportant problem initself.\n'
            'Forwedoknowthathabitsareinherited as,forinstance,\n'
 

In [54]:
# follow up question with generic mention of parents to check if model understands context

question = "What is the siginificance of the chromosomes of the parents?"

pprint(call_conversational_rag(question, final_chain, memory))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'answer': 'The text discusses the role of chromosomes in heredity and how '
           'they are inherited from parents to offspring during reproduction. '
           'The author explains that each parent contributes half of their '
           'genetic material to their child through the process of meiosis, '
           'where chromosomes are separated and mixed randomly. This results '
           'in a unique combination of genetic information in each offspring. '
           'The text also mentions the concept of crossing-over, where '
           'segments of chromosomes can exchange between homologous '
           'chromosomes during meiosis, further adding to the genetic '
           'diversity of offspring. The author emphasizes that the important '
           'event in heredity is not fertilization but meiosis, as the '
           'specific combination of chromosomes an individual receives is '
           'determined at this stage and cannot be altered afterwards.',
 'context': "

## Ask questions with user input

In [59]:
print("Ask what you seek...")
user_input_question = input()
print("You asked: ", user_input_question)

print("Generated Response: \n")
pprint(call_conversational_rag(user_input_question, final_chain, memory))

Ask what you seek...
What are the authors thoughts on science and religion?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


You asked:  What are the authors thoughts on science and religion?
Generated Response: 



Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'answer': 'You are correct that the provided chat history does not directly '
           'discuss the relationship between science and religion. The passage '
           "primarily focuses on the author's autobiographical experiences and "
           'his philosophical conversations with his friend. While there are '
           'references to religious topics and the influence they had on the '
           "author's life, there is no explicit discussion about how science "
           'relates to these beliefs or if science provides any insights into '
           'religious matters.',
 'context': 'Autobiographical Sketches\n'
            'Ilivedfarapartfrommybestfriend,actually theonlyclose\n'
            'friendIeverhad,forthegreaterpartofmylife.(Maybe that\n'
            'iswhyIhaveoftenbeenaccused offlirtatiousness insteadof\n'
            'truefriendship.) Hestudied biology (botany tobeexact);I\n'
            'physics. Andmanyanightwewouldstrollbackandforth\n'
            'between G

## Credits

references:

### Code:


https://github.com/jai-llm/RAG_Docs_LLaMA2/blob/main/RAG_HastieBooks_chromaDB_V3.ipynb


https://github.com/madhavthaker1/llm/blob/main/rag/conversational_rag.ipynb
- https://medium.com/@thakermadhav/part-2-build-a-conversational-rag-with-langchain-and-mistral-7b-6a4ebe497185


https://blog.llamaindex.ai/introducing-rags-your-personalized-chatgpt-experience-over-your-data-2b9d140769b1

### Data:

http://strangebeautiful.com/other-texts/schrodinger-what-is-life-mind-matter-auto-sketches.pdf

https://archive.org/details/feynman-lectures-on-physics-volumes-1-2-3-feynman-leighton-and-sands

https://karpathy.medium.com/software-2-0-a64152b37c35
