## Introduction

We will continue building on LangChain with RAG & Prompt Templates

https://python.langchain.com/docs/integrations/llms/

- Ollama
- GPT4All

<img src="RAG_advanced.png" alt="Drawing" style="width: 1000px;"/>

In [1]:
%pip install langchain chromadb gpt4all langchainhub



## Example 1: GPT4All

- Website RAG

In [8]:
# Load web page
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

In [9]:
# Split into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
all_splits = text_splitter.split_documents(data)

In [10]:
# Embed and store
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())

In [11]:
# Retrieve
question = "How can Task Decomposition be done?"
docs = vectorstore.similarity_search(question)
len(docs)

4

In [12]:
# RAG prompt
from langchain import hub

QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-llama")

In [None]:
## COMMENT the following if not running for the first time
# Install gpt4all library and necessary dependencies
!pip install gpt4all
!apt install libvulkan1
!apt install libnvidia-gl-525-server


In [None]:
# download one small model
# This is medium sized model; only get this on a decent GPU (not Colab)
#!wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin -O llama-2-7b-chat.bin

In [9]:
!wget https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf -O mistral-7b-openorca.Q4_0.gguf

--2023-11-11 10:58:03--  https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf
Resolving gpt4all.io (gpt4all.io)... 172.67.71.169, 104.26.0.159, 104.26.1.159, ...
Connecting to gpt4all.io (gpt4all.io)|172.67.71.169|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4108927744 (3.8G)
Saving to: ‘mistral-7b-openorca.Q4_0.gguf.1’


2023-11-11 11:00:59 (22.4 MB/s) - ‘mistral-7b-openorca.Q4_0.gguf.1’ saved [4108927744/4108927744]



In [2]:
!wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin?download=true -O llama-2-7b-chat.ggmlv3.q8_0.bin

--2023-11-11 11:21:12--  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin?download=true
Resolving huggingface.co (huggingface.co)... 13.33.33.102, 13.33.33.20, 13.33.33.110, ...
Connecting to huggingface.co (huggingface.co)|13.33.33.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/30/e3/30e3aca7233f7337633262ff6d59dd98559ecd8982e7419b39752c8d0daae1ca/3bfdde943555c78294626a6ccd40184162d066d39774bd2c98dae24943d32cc3?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-7b-chat.ggmlv3.q8_0.bin%3B+filename%3D%22llama-2-7b-chat.ggmlv3.q8_0.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1699960872&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5OTk2MDg3Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy8zMC9lMy8zMGUzYWNhNzIzM2Y3MzM3NjMzMjYyZmY2ZDU5ZGQ5ODU1OWVjZDg5ODJlNzQxOW

In [2]:
# Import the GPT4All library and initialize the model
from gpt4all import GPT4All

In [3]:
# Initialize the model:
llm = GPT4All(model_name="mistral-7b-openorca.Q4_0.gguf", model_path="/content/", device='gpu')
# Can also try 'orca-mini-3b-gguf2-q4_0.gguf'

In [15]:
# Generate text using the model
output = llm.generate("The capital of France is ")
print(output)

 a city that never sleeps. With its bustling streets, world-famous landmarks and rich history, Paris has always been one of the most visited cities in Europe. But did you know there are many hidden gems waiting to be discovered? Here are five offbeat things to do in Paris:



In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [18]:
llm

<gpt4all.gpt4all.GPT4All at 0x7a2535a4d690>

In [21]:
# QA chain
# This needs to be debugged to run on Colab/Kaggle
if False:
  from langchain.chains import RetrievalQA

  qa_chain = RetrievalQA.from_chain_type(
      llm,
      retriever=vectorstore.as_retriever(),
      chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
  )

  question = "What are the various approaches to Task Decomposition for AI Agents?"
  result = qa_chain({"query": question})

## LLaMA-2

- Huggingface repo
  - https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main

In [None]:
import os
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain.embeddings.bedrock import BedrockEmbeddings
from langchain.prompts import PromptTemplate
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA


### [OPTIONAL] Set your OpenAI API key if you want to use their embedding model

If you are on Kaggle, go to "Add-Ons" in the Toolbar and add your OPENAI_API_KEY
Copy the code sample and modify it below if needed

In [None]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
os.environ["OPENAI_API_KEY"] = user_secrets.get_secret("OPENAI")

In [None]:
## Alternative, if not on Kaggle
import os
import getpass

OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Enter your OpenAI API Key:  ····


## Embedding model

We will use the Mistral 7b model and Ollama embeddings

In [24]:
# Ollama embeddings
embeddings_open = OllamaEmbeddings(model="mistral")
# OpenAI embeddings
#embedding = OpenAIEmbeddings()

llm_open = Ollama(  model="mistral",
                    #model='Llama2',
                    callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]))

## Loading the data

LangChain documentation offers multiple ways to load your private data.
- We will use the LangChain documentation

...meta


In [25]:
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

metadata = [
    dict(year=2022, source=filenames[0]),
    dict(year=2021, source=filenames[1]),
    dict(year=2020, source=filenames[2]),
    dict(year=2019, source=filenames[3])]

In [26]:
from tqdm.notebook import tqdm

data_root = "./data/"

for idx, url in tqdm(enumerate(urls), total=len(urls)):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)

  0%|          | 0/4 [00:00<?, ?it/s]

Remove the last three pages of each PDF (good exercise!) since they're repetitive

In [28]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-3.17.0-py3-none-any.whl (277 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.4/277.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.17.0


In [29]:
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

documents = []

for idx, file in tqdm(enumerate(filenames), total=len(urls)):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]

    print(f'{len(document)} {document}\n')
    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)

  0%|          | 0/4 [00:00<?, ?it/s]

10 [Document(page_content='Dear shareholders:\nAs I sit down to write my second annual shareholder letter as CEO, I find myself optimistic and energized\nby what lies ahead for Amazon. Despite 2022 being one of the harder macroeconomic years in recent memory,and with some of our own operating challenges to boot, we still found a way to grow demand (on top ofthe unprecedented growth we experienced in the first half of the pandemic). We innovated in our largestbusinesses to meaningfully improve customer experience short and long term. And, we made importantadjustments in our investment decisions and the way in which we’ll invent moving forward, while stillpreserving the long-term investments that we believe can change the future of Amazon for customers,\nshareholders, and employees.\nWhile there were an unusual number of simultaneous challenges this past year, the reality is that if you\noperate in large, dynamic, global market segments with many capable and well-funded competitors (thec

Document statistics

In [30]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')
print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')


Average length among 37 documents loaded is 3889 characters.
After the split we have 202 documents as opposed to the original 37.
Average length among 202 documents (after split) is 725 characters.


Sample embeding

## Chroma Collection

- New version requires a different format (https://docs.trychroma.com/migration)

In [31]:
import chromadb
persist_directory = 'vectordb_aws_letters'
chroma_client = chromadb.PersistentClient(path=persist_directory)

#[Optional]
#client = chromadb.HttpClient(host="localhost", port="8000")

In [32]:
collection_name = "aws_pdf"

# If you have created the collection before, you need delete the collection first
if len(chroma_client.list_collections()) > 0 and collection_name in [
    chroma_client.list_collections()[0].name
]:
    chroma_client.delete_collection(name=collection_name)
else:
    print(f"Creating collection: '{collection_name}'")
    collection = chroma_client.create_collection(name=collection_name)

Creating collection: 'aws_pdf'


In [33]:
embeddings_open

OllamaEmbeddings(base_url='http://localhost:11434', model='mistral', embed_instruction='passage: ', query_instruction='query: ', mirostat=None, mirostat_eta=None, mirostat_tau=None, num_ctx=None, num_gpu=None, num_thread=None, repeat_last_n=None, repeat_penalty=None, temperature=None, stop=None, tfs_z=None, top_k=None, top_p=None, model_kwargs=None)

In [35]:
# This fails on Colab but should run on a server / notebook.
# Looks for localhost:11434 port (see above)
if False:
  sample_embedding = np.array(embeddings_open.embed_query(docs[0].page_content))
  print("Sample embedding of a document chunk: ", sample_embedding)
  print("Size of the embedding: ", sample_embedding.shape)

# Prompt Template

In [42]:
!pip install openai

Collecting openai
  Downloading openai-1.2.3-py3-none-any.whl (220 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/220.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/220.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.3/220.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.25.1-py3-none-any.whl (75 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[

In [43]:
import os
import openai
from getpass import getpass

In [40]:
os.environ['OPENAI_API_KEY'] = getpass("Enter your OpenAI API Key: ")

Enter your OpenAI API Key: ··········


In [45]:
from langchain import OpenAI, PromptTemplate
from langchain.chains import LLMChain, LLMMathChain, TransformChain, SequentialChain, SimpleSequentialChain

# This is an LLMChain to write a synopsis given a title of a play.
playwright_llm = OpenAI(temperature=.9, openai_api_key = os.environ['OPENAI_API_KEY'])

playwright_template = """

You are a playwright. Given the title of play, write a synopsis for that title.
Your style is witty, humorous, light-hearted. All your plays are written using
concise language, to the point, and are brief.

Title: {title}

Playwright: This is a synopsis for the above play:

"""

playwright_prompt_template = PromptTemplate(input_variables=["title"], template=playwright_template)

synopsis_chain = LLMChain(llm=playwright_llm, prompt=playwright_prompt_template)

# This is an LLMChain to write a review of a play given a synopsis.
critic_llm = OpenAI(temperature=.5, openai_api_key = os.environ['OPENAI_API_KEY'])

synopsis_template = """

You are a play critic from the New York Times.

Given the synopsis of play, it is your job to write a review for that play.
You're the Simon Cowell of play critics and always deliver scathing reviews.

Play Synopsis: {synopsis}

Review from a New York Times play critic of the above play:"""

critic_prompt_template = PromptTemplate(input_variables=["synopsis"], template=synopsis_template)

review_chain = LLMChain(llm=critic_llm, prompt=critic_prompt_template)

# This is the overall chain where we run these two chains in sequence.

overall_chain = SimpleSequentialChain(chains= [synopsis_chain, review_chain],
                                      verbose=True)


play_title = "Abigail Aryan and the Motley Crew of Multi-Agent Systems"

review = overall_chain.run(play_title)




[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mAbigail Aryan is a computer hacker and tech genius who is tasked with infiltrating a shady group of shady, powerful, and influential people known as the “Multi-Agent Systems.” With the help of her quirky team of hackers and computer experts, Abigail and her motley crew of multi-agent systems battle their way through danger and intrigue to take down the nefarious Multi-Agent Systems and their connections. Along the way, Abigail and her team discover secrets, face off against challenges, and find humor and camaraderie in the crazy world of computer hacking.[0m
[33;1m[1;3m

"Abigail Aryan and her motley crew of hackers and computer experts are an interesting concept for a play, but unfortunately the execution falls flat. The characters are one-dimensional and the plot is predictable and uninspired. The action sequences are too over-the-top and the dialogue is clunky and stale. The play fails to capture the complexity 