<a target="_blank" href="https://colab.research.google.com/github/vanderbilt-data-science/ai_summer/blob/main/2_2-solns-langchain-rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# AI Solutions with Langchain and RAG
> For Vanderbilt University AI Summer 2024<br>Prepared by Dr. Charreau Bell

_Code versions applicable: May 14, 2024_

## Learning Outcomes:
* Participants will be able to articulate the essential steps and components of a retrieval-augmented generation (RAG) system and implement a standard RAG system using langchain.
* Participants will gain familiarity in inspecting the execution pathways of LLM-based systems.
* Participants will gain familiarity in approaches for the evaluation of LLM-based systems.

### Computing Environment Setup

In [None]:
! pip install langchain==0.1.20 langchain_openai grandalf sentence-transformers
! pip install pypdf chromadb faiss-cpu

In [None]:
# Best practice is to do all imports at the beginning of the notebook, but we have separated them here for learning purposes.
import os

In [None]:
# auth replicated here for reference just in case you choose to do something similar
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

## Langchain Introduction

### Overview of System

[Overview of Langchain](https://python.langchain.com/v0.1/docs/get_started/introduction/)

<figure>
<img src='https://python.langchain.com/v0.1/svg/langchain_stack.svg' height=600/>
    <figcaption>
        Langchain Overview, from <a href=https://python.langchain.com/v0.1/docs/get_started/introduction>Langchain Introduction</a>
    </figcaption>
</figure>

### Quick Start
To, as it says - start quickly - get started using the [Quick Start](https://python.langchain.com/v0.1/docs/get_started/quickstart/) page.

### Details of Individual Composition Components
To learn more about any of the individual components used below, use the [Components Page](https://python.langchain.com/v0.1/docs/modules/)

## Review of python formatted strings
To prepare ourselves for langchain, we'll first review formatted strings.

In [None]:
# basic functionality of print
print('Tell me a story about cats')

# with variables
prompt_string = 'Tell me a story about cats'
print('As string ', prompt_string)

# as formatted string
prompt_string = 'Tell me a story about cats'
print(f"With formatted string: {prompt_string}")

Tell me a story about cats
As string  Tell me a story about cats
With formatted string: Tell me a story about cats


Motivating example: you are building a GPT that tells stories. The user just needs to provide the topic.

In [None]:
# as a template string
string_prompt_template = f"Tell me a story about {{topic}}"
string_prompt_template

'Tell me a story about {topic}'

In [None]:
# you can fill in the template at a later time
string_prompt_template.format(topic='cats')

'Tell me a story about cats'

## Langchain Prompt Templates
> Formatting and arranging prompt strings

Langchain prompt templates work just like this, but with additional functionality targeted towards LLM interaction. There are lots of different prompt templates, but here, we'll focus on two: `PromptTemplate`, and `ChatPromptTemplate`.

**Additional resources**: [Guide on Prompt Templates](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/)

In [None]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser

In [None]:
# create system messsage for shorter responses
brief_system_message = 'You are a helpful assistant. Be brief, succinct, and clear in your responses. Only answer what is asked.'

### PromptTemplate

In [None]:
# create a langchain prompt template
lc_prompt = PromptTemplate.from_template("tell me a story about {topic}")

# has standard behavior of f-strings
lc_prompt.format(topic='cats')

# but also has additional functionality
lc_prompt.invoke({'topic':'cats'})

StringPromptValue(text='tell me a story about cats')

### ChatPromptTemplate

In [None]:
# create prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_template("tell me a story about {topic}")

# has invocation functionality resulting to chat-style messages
lc_chat_prompt_template.invoke({'topic':'cats'})

ChatPromptValue(messages=[HumanMessage(content='tell me a story about cats')])

In [None]:
# create message-based chat prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_messages(
    [
        ('system', brief_system_message),
        ('user', 'Tell me a story about {topic}'),
    ]
)

# invoke the chat prompt template
lc_chat_prompt_template.invoke({'topic':'cats'})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant. Be brief, succinct, and clear in your responses. Only answer what is asked.'), HumanMessage(content='Tell me a story about cats')])

## Langchain Expression Language (LCEL)
**Resource:** [LCEL Overview](https://python.langchain.com/v0.1/docs/expression_language/)
Main Points:
* Runnable Protocol
* Known inputs and outputs on invoke
* Flexibility in chain assembly
* [Standard Interface](https://python.langchain.com/v0.1/docs/expression_language/interface/)

# Basic Model Chains/ Model I/O

**Resource**: [Detailed Guide](https://python.langchain.com/v0.1/docs/modules/)

## Basic Prompt/Model Chain
See [Prompt+LLM](https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser) for more information

In [None]:
from langchain_openai import ChatOpenAI

In [None]:
prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")
model = ChatOpenAI()
chain = prompt | model

In [None]:
# Observe what the prompt looks like when we substitute words into it
prompt.invoke({'foo':"cats"})

ChatPromptValue(messages=[HumanMessage(content='tell me a joke about cats')])

In [None]:
# Now, actually call the entire chain on it
res = chain.invoke({'foo':"cats"})
print(res)

content='Why was the cat sitting on the computer?\n\nBecause it wanted to keep an eye on the mouse!' response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 13, 'total_tokens': 33}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-a30ceed4-66f1-41f7-8ef3-df9501ed0158-0'


A little helper visualization:

In [None]:
chain.get_graph().print_ascii()

    +-------------+    
    | PromptInput |    
    +-------------+    
           *           
           *           
           *           
+--------------------+ 
| ChatPromptTemplate | 
+--------------------+ 
           *           
           *           
           *           
    +------------+     
    | ChatOpenAI |     
    +------------+     
           *           
           *           
           *           
 +------------------+  
 | ChatOpenAIOutput |  
 +------------------+  


## Even more simplified prompt chains

In [None]:
# Create total user prompt chain
prompt = ChatPromptTemplate.from_template("{text}")

# Add output parser
chain = prompt | model | StrOutputParser()

In [None]:
# Now, the user can submit literally whatever
res = chain.invoke({'text':"Briefly and succintly summarize Episodes 4-6 of Star Wars."})
print(res)

Episode 4: A New Hope - Luke Skywalker joins forces with Princess Leia, Han Solo, and Obi-Wan Kenobi to defeat the evil Galactic Empire and destroy the Death Star.

Episode 5: The Empire Strikes Back - The Empire launches an attack on the Rebel Alliance, leading to a climactic battle on the ice planet Hoth. Luke trains with Yoda to become a Jedi, while Darth Vader reveals he is Luke's father.

Episode 6: Return of the Jedi - Luke confronts Darth Vader and Emperor Palpatine in an attempt to turn his father back to the light side of the Force. The Rebel Alliance defeats the Empire in the Battle of Endor, leading to the downfall of the Sith and the redemption of Anakin Skywalker.


## What just happened? Inspecting model behavior
Several ways to do this:
* `langchain` verbosity/debugging
* `langsmith`

### Langchain
Resource: [Guides -> Langchain Debugging](https://python.langchain.com/v0.1/docs/guides/development/debugging/)

In [None]:
from langchain.globals import set_debug, set_verbose

In [None]:
set_debug(True)
set_verbose(True)

In [None]:
# Basic prompt -> model -> parser chain
chain = prompt | model | StrOutputParser()
chain.invoke('What is a python f-string?')

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "What is a python f-string?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "input": "What is a python f-string?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [0ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatOpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: What is a python f-string?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[chain:RunnableSequence > llm:ChatOpenAI] [2.19s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "An f-string is a way to format strings in Python, introduced in Python 3.6. It allows for easier string interpolation by embedding variables directly into the string itself. To create an f-string, simply prefix the str

'An f-string is a way to format strings in Python, introduced in Python 3.6. It allows for easier string interpolation by embedding variables directly into the string itself. To create an f-string, simply prefix the string with \'f\' or \'F\' and place the variables inside curly braces within the string. Here\'s an example:\n\n```python\nname = "Alice"\nage = 30\nf_string = f"My name is {name} and I am {age} years old."\nprint(f_string)\n```\n\nThis will output:\n```\nMy name is Alice and I am 30 years old.\n```'

### Langsmith
Resource: [Tracing Langchain with Langsmith](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain)

Don't have a langsmith API Key yet? You'll need a user account on [Langsmith](https://smith.langchain.com/). Then, follow these [instructions provided by langsmith](https://docs.smith.langchain.com/#2-create-an-api-key).

In [None]:
# reset this
set_debug(False)

In [None]:
# enable tracing and set project name
os.environ['LANGCHAIN_TRACING_V2'] = "false"

# uncomment the following two lines before running the cell if you have a Langchain/Langsmith API Key
#os.environ['LANGCHAIN_API_KEY'] = userdata.get('LANGCHAIN_API_KEY')
#os.environ['LANGCHAIN_TRACING_V2'] = "true"

# set langchain project
os.environ['LANGCHAIN_PROJECT'] = 'May15'

In [None]:
# use a the basic chain from above
chain = (prompt | model | StrOutputParser()).with_config(run_name = 'Basic LLM Chain')
response = chain.invoke("What is a python f-string?")
response

"An f-string is a formatted string literal in Python, introduced in Python 3.6. It allows for easy interpolation of variables and expressions inside a string by prefixing the string with 'f' or 'F' and using curly braces {} to insert the variables or expressions. This makes it easier to create readable and concise strings with embedded variables."

: 

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

## Adding Memory
Adapted from: [LCEL Adding Message History](https://python.langchain.com/v0.1/docs/expression_language/how_to/message_history/)
Also see:
- [Langchain -> Use Cases -> Chatbots -> Memory Management](https://python.langchain.com/v0.1/docs/use_cases/chatbots/memory_management/)
- [Components -> More -> Memory](https://python.langchain.com/v0.1/docs/modules/memory/)

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

In [None]:
# create chat template with standard elements
model = ChatOpenAI(name='gpt-3.5-turbo')
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", brief_system_message),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{most_recent_user_message}"),
    ]
)

turns_chain = prompt | model | StrOutputParser()

In [None]:
# quickly try out chain, pretending we've already said something to the system
first_chat_turn_messages = [("human", "Tell me a joke about cats"),
                            ("ai", "Cats jump on beds")]

next_user_message = "What was funny about that joke?"
turns_chain.invoke({'most_recent_user_message': next_user_message,
                    'chat_history': first_chat_turn_messages})

'The humor in that joke comes from the unexpected behavior of cats jumping on beds, which is a common occurrence for cat owners.'

In [None]:
# all saved conversations
chat_conversation_threads = {}

# define function to create new conversation or load old one based on session_id
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in chat_conversation_threads:
        chat_conversation_threads[session_id] = ChatMessageHistory()
    return chat_conversation_threads[session_id]

# create chat history enabled chain
chat_with_message_history = RunnableWithMessageHistory(
    turns_chain,
    get_session_history,
    input_messages_key="most_recent_user_message",
    history_messages_key="chat_history", 
).with_config(run_name = 'Chat with Message History')

Let's try it!

In [None]:
# send first message
user_message_1 = "Tell me a joke about cats"
session_id_1 = "convo_1"
chat_with_message_history.invoke({'most_recent_user_message': user_message_1},
                                config={"configurable": {"session_id": session_id_1}})        

Parent run 6cb7b7e8-a725-4f9c-9937-37106c5b25cf not found for run 65b5cfa7-c7e1-4676-a3c3-6311973eed63. Treating as a root run.


'Sure! Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!'

In [None]:
# send second message
chat_with_message_history.invoke({'most_recent_user_message': "I don't get it - can you explain?"},
                                    config={"configurable": {"session_id": session_id_1}})

Parent run 1330fff0-1590-4365-aa1d-135c05d3f2b8 not found for run 3eb0b3a7-258d-4083-b5da-a109a10ee9fd. Treating as a root run.


'In the joke, the cat sits on the computer to watch the "mouse," which can refer to both the computer mouse and the small rodent. It\'s a play on words that adds humor to the situation.'

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

# Retrieval Augmented Generation (RAG)
## Review
* Conceptual and step-by-step guide about [RAG](https://python.langchain.com/v0.1/docs/use_cases/question_answering/)
* Learn more about implementing [RAG](https://python.langchain.com/docs/expression_language/cookbook/retrieval)

**Data Ingestion (Creating a Vector Store of Documents)**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png' height=300/>
    <figcaption>
        Source: Data Ingestion (Preparing Embeddings), from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>

**Retrieval and Generation**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png' height=300/>
    <figcaption>
        Source: Retrieval and Generation, from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>



## Document Loaders and Splitters
[Data Ingestion/Vector Store Preparation Guide ](https://python.langchain.com/docs/modules/data_connection/)
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg' height=300/>
    <figcaption>
        Langchain Retrieval Component, from <a href=https://python.langchain.com/docs/modules/data_connection/>Langchain Components</a>
    </figcaption>
</figure>

**Other extremely useful resources**:
* **[Components -> Retrieval -> Document Loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/)**: Use the sidebar to navigate through different types of document loaders. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/integrations/document_loaders/)
* **[Components -> Retrieval -> Text Splitters](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)**: Use the sidebar to navigate through different types of text splitters. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)

In [None]:
# example pdf links
doc_1 = 'https://registrar.vanderbilt.edu/documents/Undergraduate_School_Catalog_2023-24_UPDATED2.pdf'
doc_2 = 'https://www.tnmd.uscourts.gov/sites/tnmd/files/Pro%20Se%20Nonprisoner%20Handbook.pdf'
doc_3 = 'https://www.uscis.gov/sites/default/files/document/guides/M-654.pdf'

### Example: pdfloader and recursive character splitter

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
loader = PyPDFLoader(doc_3)

# Add the kind of text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=250,
)

# use the text splitter to split the document
chunks = loader.load_and_split(text_splitter=text_splitter)

In [None]:
# see how many chunks were made
print(len(chunks))

138


In [None]:
# inspect a single chunk
chunks[0]

Document(page_content='The Declaration of \nIndependence \n& the  \nConstitution  \nof the United States\nM-654 (rev. 07/08)', metadata={'source': 'https://www.uscis.gov/sites/default/files/document/guides/M-654.pdf', 'page': 0})

In [None]:
# view first 3 chunks
for chunk_index, chunk in enumerate(chunks[:3]):
    print(f'****** Chunk {chunk_index} ******\n{chunk.page_content}\n')

****** Chunk 0 ******
The Declaration of 
Independence 
& the  
Constitution  
of the United States
M-654 (rev. 07/08)

****** Chunk 1 ******
The Declaration of 
Independence 
& the  
Constitution  
of the United States

****** Chunk 2 ******
“The sacred rights of mankind are not to be 
rummaged for, among old parchments, or musty 
records. They are written, as with a sun beam 
in the whole volume of human nature, by the 
hand of the divinity itself; and can never be 
erased or obscured by mortal power. ” 
— Alexander Hamilton, 1775
“The basis of our political systems is the 
right of the people to make and to alter 
their Constitutions of Government. But the 
Constitution which at any time exists, ‘till 
changed by an explicit and authentic act of the 
whole People is sacredly obligatory upon all. ”
— George Washington, 1796
“The Declaration of Independence...[is the] 
declaratory charter of our rights, and of the 
rights of man. ” 
— Thomas Jefferson, 1819MeSSage froM The DIreCT or 


### Example: Loading website data and splitting

In [None]:
from bs4 import SoupStrainer
from langchain_community.document_loaders import WebBaseLoader

In [None]:
constitution_website = "https://constitutioncenter.org/the-constitution/full-text"

# load using WebBaseLoader
web_loader = WebBaseLoader(constitution_website,
                       bs_kwargs = {'parse_only':SoupStrainer(['article'])})

# read the document from the website (without splitting)
web_document = web_loader.load()

In [None]:
# only the first few characters
print(web_document[0].page_content[:330])


We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.




Now, we'll split in a slightly different way. Since we've already scraped the website, we will just directly use the splitter. Note that after we load the website, we have a data type of (list of) `Document`.

In [None]:
website_splitter = RecursiveCharacterTextSplitter(chunk_size=330, chunk_overlap=100, add_start_index=True)
website_chunks = website_splitter.split_documents(web_document)
len(website_chunks)

210

In [None]:
website_chunks[:3]

[Document(page_content='We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.', metadata={'source': 'https://constitutioncenter.org/the-constitution/full-text', 'start_index': 1}),
 Document(page_content='Section 1: Congress\nAll legislative Powers herein granted shall be vested in a Congress of the United States, which shall consist of a Senate and House of Representatives.', metadata={'source': 'https://constitutioncenter.org/the-constitution/full-text', 'start_index': 330}),
 Document(page_content='Section 2: The House of Representatives\nThe House of Representatives shall be composed of Members chosen every second Year by the People of the several States, and the Electors in each State shall have the Qualificatio

If you know less about the constitution and more about Star wars (or another topic available on Wikipedia), feel free to run the cells below to use that text moving forward. It will replace the `website_chunks` variable. You may need to adjust the `chunk_size` and `chunk_overlap` options. Uncomment and run these cells.

In [None]:
# alternate data
webloader = WebBaseLoader('https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope',
                       bs_kwargs = {'parse_only':SoupStrainer('div', id='bodyContent')})
web_chunks = webloader.load_and_split(RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, add_start_index=True))
print('Number of chunks generated: ', len(web_chunks))
print('\n\nSample: ')
web_chunks[:5]

Number of chunks generated:  17


Sample: 


[Document(page_content='From Simple English Wikipedia, the free encyclopedia', metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 4}),
 Document(page_content='Star Wars: Episode IV -A New HopeSpecial Edition LogoDirected byGeorge LucasWritten byGeorge LucasProduced byGary KurtzRick McCallum(Special Edition)StarringMark HamillHarrison FordCarrie FisherPeter CushingAlec GuinnessCinematographyGilbert Taylor, BSCEdited byRichard ChewPaul HirschMarcia LucasMusic byJohn WilliamsDistributed by20th Century FoxLucasfilmRelease datesMay 25, 1977 (USA)October 27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special', metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 59}),
 Document(page_content='27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special Edition)CountryUnited StatesLanguageEnglish Budget$11,000,00

## Vector Stores: A way to store embeddings (hidden states) of your data
The choice of vector store influences how "relevant" documents can be identified, speed of document retrieval, and organization.

Helpful resources:
* **[Brief Langchain Reference](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)**
* **[Vector Store Integrations](https://python.langchain.com/v0.1/docs/integrations/vectorstores/)**

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [None]:
# create the vector store
db = Chroma.from_documents(web_chunks, OpenAIEmbeddings())

### Similarity Search for Documents

In [None]:
# query the vector store
query = 'When was a new hope released?'

# use a similarity search between the vectors
relevant_docs = db.similarity_search(query)
relevant_docs


[Document(page_content="Star Wars: Episode IV: A New Hope (initially named Star Wars) is a science fiction movie. It is the first film made in the Star Wars saga, but is the fourth film in the story's timeline. The movie was released in 1977 and also incorporates adventure, action and drama.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 614}),
 Document(page_content='Star Wars: Episode IV -A New HopeSpecial Edition LogoDirected byGeorge LucasWritten byGeorge LucasProduced byGary KurtzRick McCallum(Special Edition)StarringMark HamillHarrison FordCarrie FisherPeter CushingAlec GuinnessCinematographyGilbert Taylor, BSCEdited byRichard ChewPaul HirschMarcia LucasMusic byJohn WilliamsDistributed by20th Century FoxLucasfilmRelease datesMay 25, 1977 (USA)October 27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special', metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_I

In [None]:
# get cosine distance alongside results
relevant_docs = db.similarity_search_with_score(query)
relevant_docs

[(Document(page_content="Star Wars: Episode IV: A New Hope (initially named Star Wars) is a science fiction movie. It is the first film made in the Star Wars saga, but is the fourth film in the story's timeline. The movie was released in 1977 and also incorporates adventure, action and drama.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 614}),
  0.2804918587207794),
 (Document(page_content='Star Wars: Episode IV -A New HopeSpecial Edition LogoDirected byGeorge LucasWritten byGeorge LucasProduced byGary KurtzRick McCallum(Special Edition)StarringMark HamillHarrison FordCarrie FisherPeter CushingAlec GuinnessCinematographyGilbert Taylor, BSCEdited byRichard ChewPaul HirschMarcia LucasMusic byJohn WilliamsDistributed by20th Century FoxLucasfilmRelease datesMay 25, 1977 (USA)October 27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special', metadata={'source': 'https://simple.wikipedia.org

In [None]:
# another query, but instead use normalized score
query = 'What is the plot of A New Hope?'
relevant_docs = db.similarity_search_with_relevance_scores(query)
relevant_docs

[(Document(page_content="Star Wars: Episode IV: A New Hope (initially named Star Wars) is a science fiction movie. It is the first film made in the Star Wars saga, but is the fourth film in the story's timeline. The movie was released in 1977 and also incorporates adventure, action and drama.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 614}),
  0.8113249121939496),
 (Document(page_content="Plot[change | change source]\nThe film takes place years after Revenge of the Sith, which was made in 2005 as a prequel. The Rebellion has stolen the secret plans for the Galactic Empire's superweapon known as the Death Star. Darth Vader and his stormtroopers capture Princess Leia Organa, a leader of the Rebellion but she secretly gives the Death Star plans to two droids (robots), C-3PO and R2-D2.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 885}),
  0.8017252298757446),
 (Docu

## Retrievers: How we select the most relevant data

In [None]:
# or use the db as a retriever with lcel
retriever = db.as_retriever()
retrieved_docs = retriever.invoke(query)
retrieved_docs

[Document(page_content="Star Wars: Episode IV: A New Hope (initially named Star Wars) is a science fiction movie. It is the first film made in the Star Wars saga, but is the fourth film in the story's timeline. The movie was released in 1977 and also incorporates adventure, action and drama.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 614}),
 Document(page_content="Plot[change | change source]\nThe film takes place years after Revenge of the Sith, which was made in 2005 as a prequel. The Rebellion has stolen the secret plans for the Galactic Empire's superweapon known as the Death Star. Darth Vader and his stormtroopers capture Princess Leia Organa, a leader of the Rebellion but she secretly gives the Death Star plans to two droids (robots), C-3PO and R2-D2.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 885}),
 Document(page_content='Wikiquote has a collection of 

## RAG
For when we want to actually do generation, but want there to be retrieved documents included in the generation. For this, we're going to switch to a different embedding model which will be downloaded on our machine (or if on Google Colab, there).

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_core.runnables import RunnableLambda, RunnablePassthrough, RunnableSequence, RunnableParallel

In [None]:
# use different embedding model
embeddings_fn = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") #, model_kwargs={"device":'mps'})
hf_db = FAISS.from_documents(web_chunks, embeddings_fn)
hf_retriever = hf_db.as_retriever(search_kwargs={"k":1})

# make sure it works
query = 'What is the plot of A New Hope?'
hf_retriever.invoke(query)

[Document(page_content="Star Wars: Episode IV: A New Hope (initially named Star Wars) is a science fiction movie. It is the first film made in the Star Wars saga, but is the fourth film in the story's timeline. The movie was released in 1977 and also incorporates adventure, action and drama.", metadata={'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 614})]

### Default RAG: Question Answering

In [None]:
# Basic question answering template
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

# compose prompt
rag_prompt = ChatPromptTemplate.from_template(template)

# create model (so we don't have to depend on the model definition at the top of the notebook)
model = ChatOpenAI(model_name='gpt-3.5-turbo')

In [None]:
# We need to format the retrieved documents better
def format_docs(docs):
    return "\n\n".join([f'Reference text:\n{doc.page_content}\n\Citation Info: {doc.metadata}' for doc in docs])

In [None]:
# inspect behavior of format_docs
format_docs(web_chunks[:3])

"Reference text:\nFrom Simple English Wikipedia, the free encyclopedia\n\\Citation Info: {'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 4}\n\nReference text:\nStar Wars: Episode IV -A New HopeSpecial Edition LogoDirected byGeorge LucasWritten byGeorge LucasProduced byGary KurtzRick McCallum(Special Edition)StarringMark HamillHarrison FordCarrie FisherPeter CushingAlec GuinnessCinematographyGilbert Taylor, BSCEdited byRichard ChewPaul HirschMarcia LucasMusic byJohn WilliamsDistributed by20th Century FoxLucasfilmRelease datesMay 25, 1977 (USA)October 27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special\n\\Citation Info: {'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 59}\n\nReference text:\n27, 1977 (Australia)December 27, 1977 (UK)Running time121 minutes (original)125 minutes (Special Edition)CountryUnited StatesLanguageEnglish Budget$11,000,000\n\

In [None]:
# compose the chain
rag_chain = (
    {"context": hf_retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

In [None]:
# run the chain
rag_chain.with_config(run_name = 'basic_rag_chain').invoke('What is the plot of a new hope')

'The plot of "A New Hope" follows the journey of Luke Skywalker, a young farm boy who discovers his destiny as a Jedi Knight after obtaining a message from Princess Leia asking for help in the rebellion against the evil Galactic Empire. With the help of Han Solo and Obi-Wan Kenobi, Luke sets out to rescue Princess Leia and ultimately destroy the Empire\'s powerful weapon, the Death Star.'

### RAG with Sources
Resource: [Langchain: Returning Sources](https://python.langchain.com/v0.1/docs/use_cases/question_answering/sources/)

In [None]:
from langchain_core.runnables import RunnableParallel

In [None]:
# Basic prompt -> model -> parser chain
single_turn_chain = (
    rag_prompt
    | model
    | StrOutputParser()
)

# Break previous chain in half to access context and question in returned response
rag_chain_with_source = RunnableParallel(
    {"context": hf_retriever | format_docs, "question": RunnablePassthrough()}
).assign(answer=single_turn_chain)

In [None]:
# invoke
response = rag_chain_with_source.with_config(run_name = 'sources_rag_chain').invoke("What happened to Princess Leia in a New Hope?")

# print full response
for key, value in response.items():
    print(f"{key}: {value}\n")

context: Reference text:
Plot[change | change source]
The film takes place years after Revenge of the Sith, which was made in 2005 as a prequel. The Rebellion has stolen the secret plans for the Galactic Empire's superweapon known as the Death Star. Darth Vader and his stormtroopers capture Princess Leia Organa, a leader of the Rebellion but she secretly gives the Death Star plans to two droids (robots), C-3PO and R2-D2.
\Citation Info: {'source': 'https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope', 'start_index': 885}

question: What happened to Princess Leia in a New Hope?

answer: Princess Leia was captured by Darth Vader and his stormtroopers, but she secretly gave the Death Star plans to two droids, C-3PO and R2-D2.



### RAG with Chat History?

We will have a one-turn system with our RAG system. How do we add chat memory? See below for implementation guides:
- [Use cases: Q&A with Rag: Add Chat History.](https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/)  Builds on a RAG system, so will be of interest.

## LLM System Metrics
Resource: [Guides -> Evaluation](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

In [None]:
from langchain.evaluation import load_evaluator

In [None]:
# configure what we want to evaluate
rs_question = "What happened to Princess Leia in a New Hope?"
rs_answer = rag_chain.with_config(run_name = 'basic_rag_chain').invoke(rs_question)

In [None]:
# load an evaluator that uses the conciseness criteria
evaluator = load_evaluator("criteria", criteria="conciseness", llm=ChatOpenAI(model_name='gpt-4-turbo'))

# evaluate whether our model was concise or not
eval_result = evaluator.evaluate_strings(
    prediction = rs_answer,
    input = rs_question,
)

# print result
eval_result



[1m> Entering new CriteriaEvalChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are assessing a submitted answer on a given task or input based on a set of criteria. Here is the data:
[BEGIN DATA]
***
[Input]: What happened to Princess Leia in a New Hope?
***
[Submission]: Princess Leia was captured by Darth Vader and his stormtroopers in A New Hope.
***
[Criteria]: conciseness: Is the submission concise and to the point?
***
[END DATA]
Does the submission meet the Criteria? First, write out in a step by step manner your reasoning about each criterion to be sure that your conclusion is correct. Avoid simply stating the correct answers at the outset. Then print only the single character "Y" or "N" (without quotes or punctuation) on its own line corresponding to the correct answer of whether the submission meets all criteria. At the end, repeat just the letter again by itself on a new line.[0m

[1m> Finished chain.[0m


{'reasoning': '1. **Review the Submission**: The submission states, "Princess Leia was captured by Darth Vader and his stormtroopers in A New Hope."\n\n2. **Understanding the Criterion - Conciseness**: The criterion of conciseness demands that the response should be direct, with no unnecessary details or filler words. It should directly address the question without veering off-topic.\n\n3. **Analyzing the Submission Against the Criterion**:\n   - The phrase "Princess Leia was captured by Darth Vader and his stormtroopers" is a succinct description of Leia\'s primary situation in "A New Hope."\n   - The submission does not include any excessive details or extraneous information that deviates from the answer required by the question.\n   - The answer is directly related to the question asked about Leia\'s fate in the movie.\n\n4. **Conclusion**: The submission directly answers what happened to Princess Leia in "A New Hope" by stating the key event (her capture) without adding unnecessary

View other criteria available through LangChain

In [None]:
from langchain.evaluation import Criteria
list(Criteria)

[<Criteria.CONCISENESS: 'conciseness'>,
 <Criteria.RELEVANCE: 'relevance'>,
 <Criteria.CORRECTNESS: 'correctness'>,
 <Criteria.COHERENCE: 'coherence'>,
 <Criteria.HARMFULNESS: 'harmfulness'>,
 <Criteria.MALICIOUSNESS: 'maliciousness'>,
 <Criteria.HELPFULNESS: 'helpfulness'>,
 <Criteria.CONTROVERSIALITY: 'controversiality'>,
 <Criteria.MISOGYNY: 'misogyny'>,
 <Criteria.CRIMINALITY: 'criminality'>,
 <Criteria.INSENSITIVITY: 'insensitivity'>,
 <Criteria.DEPTH: 'depth'>,
 <Criteria.CREATIVITY: 'creativity'>,
 <Criteria.DETAIL: 'detail'>]

# Homework
The following exercises are designed to help you gain depth in what you've learned about RAG today.

## [Required] Learning more about RAG
### Splitting Text (Conceptual)
There are so many ways to split the text, and each has an impact on the resultant RAG system. Below is a resource (with sidebar dropdown) for you to read over and then answer the following question for the text splitting approaches (as relevant to your application):
* What is the proposed value in adopting this text splitting approach? What are some drawbacks?

### Splitting Text (Programmatic)
Above, we have adopted specific chunk sizes and splitting approaches. Choose one of the documents (or use your own) and:
* Modify the chunk size. How does this impact the resulting RAG performance? The cost?
* Implement a different type of text splitter (as applicable, i.e., not code text splitters if you're not splitting code). How does this impact the resulting RAG performance? The cost?

### Customizing RAG
There are many, many, many ways to improve results with RAG. Below are some resources for you to read over then complete the following:
1. What is the proposed value in adopting this approach? In other words, what is the expected performance improvement by using this method?
2. How might it apply to your work?

* [**Query Analysis**](https://python.langchain.com/v0.1/docs/use_cases/query_analysis/). Make sure to peruse subtopics.
* [**Synthetic Data Generation**](https://python.langchain.com/v0.1/docs/use_cases/data_generation/). 
* [**Tagging**](https://python.langchain.com/v0.1/docs/use_cases/tagging/).
* [**Routing Chain Logic Based on inputs****](https://python.langchain.com/v0.1/docs/expression_language/how_to/routing/). 
* [**Chain Composition**](https://python.langchain.com/v0.1/docs/modules/chains/). Of particular interest here are the Legacy chains. Although they will probably be completely removed in the future, consider their behavior. In what cases might these behaviors be useful?

** This is highly recommended reading, but may not be suitable for those who are novices in programming. Although there is text, the code demonstrates concretely by the text. For novices, it may be better to copy/paste the code as well to understand the behavior, although it is noted that such a task may be outside of the the time constraints of for some participants.

## [Required] Learning more about Evaluation
Read the following text and answer these questions:
1. What is the purpose of the individual criterion? Does it require and external LLM for evaluation?
2. In what cases might this criteria be useful?

Depth Text: [**Evaluation, by Langchain**](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

## [Highly recommended] Learning more about the LLM System Lifecycle
There is more to an LLM-based system than a user interface and the LLM chain. There is a whole framework around inspecting, testing, and evaluating these systems. Read the following and answer the questions below:
1. Summarize the purpose of the individual components of the langsmith system (they generalize to all LLM systems).
2. Consider your favorite LLM UI (i.e., ChatGPT, Gemini, Claude, etc). Describe how you think these components are utilized the LLM system.

Depth Text: [**LangSmith User Guide**](https://docs.smith.langchain.com/old/user_guide)

## [Recommended] Practicing with RAG and Langchain
### Exercise 1: Modify the RAG system
Modify or create a new chain which:
1. Uses a different LLM than the one used in this notebook.
2. Uses a different document loader
3. Uses a different splitter than the one used in this notebook.
4. Uses a different vector store/retriever than the one used in this notebook.

Use the resources provided in the relevant sections of the notebook for other options.

### Exercise 2: Implement a new RAG system
1. [More challenging] Add chat history to one of your RAG chains. Make sure to enable tracing and inspect langsmith to ensure that the chat history is used.
2. Create a gradio user interface to use your chain in a more user-friendly way.
3. [Challenging] Implement an additional chain which uses one of the strategies you read about in the "Learning more about RAG" section.




