# Lang Chain

## What is LangChain
LangChain is an open-source framework that helps developers connect LLMs, data sources, and other functionality under a single, unified syntax. With LangChain, developers can create scalable, modular LLM applications with greater ease. This course will cover LangChain in Python, but libraries also exist for JavaScript.

LangChain encompasses an entire ecosystem of tools, but in this tutorial, we'll focus on the core components of the LangChain library: 
* LLMs, including open-source and proprietary models, 
* prompts, 
* chains, 
* agents, and 
* document retrievers. 
 
To install LangChain please run the following code in the Terminal: `conda install conda-forge::langchain`

Once installed you can import it using the following import statement:

In [None]:
from venv import create

import langchain

# 1. Prompt Templates

Prompt templates help to translate user input and parameters into instructions for a language model. This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.

Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.

Prompt Templates output a `PromptValue`. This `PromptValue` can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages. The reason why PromptValue exists is to make it easy to switch between strings and messages.

There are a few different types of prompt templates:

* StringPromptTemplates
* ChatPromptTemplates
* MessagesPlaceholder

## 1.1. String Prompt Templates

These prompt templates are used to format a single string, and generally are used for simpler inputs. For example, a common way to construct and use a PromptTemplate is as follows:

## 1.2. ChatPromptTemplates

These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves. For example, a common way to construct and use a `ChatPromptTemplate` is as follows:

In the above example, this ChatPromptTemplate will construct two messages when called. The first is a system message, that has no variables to format. The second is a HumanMessage, and will be formatted by the topic variable the user passes in.

## 1.3. MessagesPlaceholder

This prompt template is responsible for adding a list of messages in a particular place. In the above `ChatPromptTemplate`, we saw how we could format two messages, each one a string. But what if we wanted the user to pass in a list of messages that we would slot into a particular spot? This is how you use `MessagesPlaceholder`.

This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in. If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in). This is useful for letting a list of messages be slotted into a particular spot.

## 2. How to use few shot examples

You can create a simple prompt template that provides the model with example inputs and outputs when generating. Providing the LLM with a few such examples is called few-shotting, and is a simple yet powerful way to guide generation and in some cases drastically improve model performance.

> Few-shotting is a technique in LLM prompting where you provide a small number of examples within the prompt itself to guide the model towards the desired output.  Essentially, you're demonstrating the task you want the model to perform by showing it a few input-output pairs. This helps the LLM understand the pattern and generate more accurate and relevant responses

Below is an example of few-shot prompt:

```
Generate a creative product name for a new line of sunglasses.

Examples:

Input: Sleek, futuristic design with polarized lenses
Output: "Solaris Eclipse"

Input: Classic aviator style with a modern twist
Output: "Skybound Voyager" 

Input: Oversized frames with vibrant colors
Output: "Chromatic Dream" 
```

We will cover few-shotting with string prompt templates.

Create  a formatter that will format the few-shot examples into a string. This formatter should be a `PromptTemplate` object.


Next, we'll create a list of few-shot examples. Each example should be a dictionary representing an example input to the formatter prompt we defined above.

In [None]:
examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]

Let's test the formatting prompt with one of our examples:

Finally, create a `FewShotPromptTemplate` object. This object takes in the few-shot examples and the formatter for the few-shot examples. When this `FewShotPromptTemplate` is formatted, it formats the passed examples using the `example_prompt`, then and adds them to the final prompt before suffix:

Now, let's create a `FewShotPromptTemplate` object. This object takes in the example selector and the formatter prompt for the few-shot examples.

`FewShotPromptTemplate` is a more general-purpose template. It simply formats the examples and appends the user input as a string. This is suitable for models that don't necessarily expect a chat-like interaction. The output is a single string containing the model's response.

`FewShotChatMessagePromptTemplate` is designed specifically for chat-based language models. It structures the prompt as a conversation, with alternating "human" and "AI" messages. This mimics the way users interact with chatbots, making the prompt more natural and potentially leading to better responses. The output is also expected to be in the form of chat messages.

Let's rewrite the code above with `FewShotChatMessagePromptTemplate`:



And we can pass this few-shot chat message prompt template into another chat prompt template:

Then, we can use the few shot examples in chat models. Let's first create a simple model:

In [None]:
from langchain_huggingface import HuggingFaceEndpoint
HF_API = 'xyz'
llm = HuggingFaceEndpoint(
    repo_id='tiiuae/falcon-7b-instruct',
    huggingfacehub_api_token =HF_API,
)

Finally, you can connect your model to the few-shot prompt.

In [None]:
print(response)

Let's try this example with Groq:

In [None]:
from dotenv import load_dotenv, find_dotenv
import os

load_dotenv(find_dotenv())

GROQAPI = os.getenv("GROQ_API_KEY")
GROQAPI

In [None]:
from langchain_groq import ChatGroq

llm = ChatGroq(

    model="llama-3.1-70b-versatile",
    temperature=0.0,
)

In [None]:
response

# 3. Managing Chat Model Memory

Managing memory is important for conversations with chat models; it opens up the possibility of providing follow-up questions, of building and iterating on model responses, and for adaptation to the user's preferences and behaviors. Although LangChain allows us to customize and optimize in-conversation chatbot memory, it is still limited by the model's context window. An LLM's context window is the amount of input text the model can consider at once when generating a response, and the length of this window varies for different models. 

ChatMessageHistory stores the full history of messages between the user and model. By providing this to the model, we can provide follow-up questions and iterate on the response message. 

Let's implement this message history into a Hugging Face model. We first import the `ChatMessageHistory` and ChatOpenAI classes and define the LLM. To begin the conversation history, instantiate `ChatMessageHistory` and store it as a variable. 

We'll start our conversation with an AI message, specified using the `.add_ai_message()` method, which can help set the tone and direction of the conversation. We can add user messages to the history with the `.add_user_message()` method. To provide these messages to the model, invoke the model on the messages attribute of the history. The text response is stored under the `.content` attribute.

In [None]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history.add_user_message("Hello! Who are you?")


When additional user messages are provided, the model bases its response on the full context stored in the conversation history.

ChatMessageHistory provides a simple way to store and manage conversation history. It's handy if you just need a straightforward way to keep track of past messages. You have less direct control over how the history is formatted within the prompt.

You can use `ChatPromptTemplate` to explicitly define how the conversation history and the current user input are presented to the language model. This separation makes the prompt engineering process more transparent.

You can provide a `SystemMessage` to guide the LLM's overall behavior (e.g., "You are a helpful assistant"). This helps set the context for the conversation.

Also, you have direct control over the history list. You can manipulate it (add, remove, modify messages) before passing it to the LLM, giving you fine-grained control over what the model sees.

Often, you need to handle multiple sessions or users. With the above approach, you'd have to manage the history list yourself, potentially saving it to a database or file. You can implement session-based history management using `RunnableWithMessageHistory`.

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    # This function retrieves the chat history for a given session_id.
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]



The `get_session_history` function allows you to manage history on a per-session basis. This means you can have separate conversation histories for different users or different conversation threads.

Organization with `RunnableWithMessageHistory`: This class handles the logic of:

* Retrieving the correct history for a session.
* Adding new messages to the history.
* Passing the history to your chain.

Let's continue conversation in the same session:

Let's ask something new in a new session:

## 4. Sequential Chains
Some problems can only be solved sequentially. Consider a chatbot used to create a travel itinerary. We need to tell the chatbot our destination, receive suggestions on what to see on our trip, and tell the model which activities to select to compile the itinerary.

This is a sequential problem, as it requires more than one user input: one to specify the destination, and another to select the activities. Let's code this out!

In sequential chains, the output from one chain becomes the input to another. We'll create two prompt templates: one to generate suggestions for activities from the input destination, and another to create an itinerary for one day of activities from the model's top three suggestions.


In [None]:
destination_prompt = PromptTemplate(
    input_variables = ["destination"],
    template="I am planning a trip to {destination}. Can you suggest some activities to do there?"
)

activities_prompt = PromptTemplate(
    input_variables = ["activities"],
    template="I only have one day, so can you create an itinerary from your top three activities: {activities}."
)

 We define our model, and begin our sequential chain. We start by defining a dictionary that passes our destination prompt template to the LLM and parses the output to a string, all using **LangChain Expression Language (LCEL)** pipe. This gets assigned to the "activities" key, which is important, as this is the input variable to the second prompt template. We pipe the first chain into the second prompt template, then into the LLM, and again, parse to a string. We also wrap the sequential chain in parentheses so we can split this code across multiple lines. 

In [None]:
from langchain_core.output_parsers import StrOutputParser


 To summarize: the `destination_prompt` is passed to the LLM to generate the activity suggestions, and the output is parsed to a string and assigned to "activities". This is passed to the second `activities_prompt`, which is passed to the LLM to generate the itinerary, which is parsed as a string.

Let's invoke the chain, passing Rome as our input destination. The model considered that we only had one day to explore, and wove in it's top suggestions of the Colosseum and Vatican City.

# 6. Gentle Introduction to LangChain Agents

In LangChain, agents use language models to determine actions. Agents often use tools, which are functions called by the agent to interact with the system. These tools can be high-level utilities to transform inputs, or they can be task-specific. Agents can even use chains and other agents as tools. In this tutorial, we'll discuss a type of agent called **ReAct** agents.

ReAct stands for *reasoning and acting*, and this is exactly how the agent operates. It prompts the model using a repeated loop of thinking, acting, and observing. If we were to ask a ReAct agent that had access to a weather tool, "What is the weather like in Kingston, Jamaica?", it would start by thinking about the task and which tool to call, call that tool using the information, and observe the results from the tool call.

To implement agents, we'll be using **LangGraph**, which is branch of the LangChain ecosystem specifically for designing agentic systems, or systems including agents. Like LangChain's core library, it's is built to provide a unified, tool-agnostic syntax. 

We'll create a ReAct agent that can solve math problems - something most LLMs struggle with. We import `create_react_agent` from `langgraph` and the `load_tools()` function. We initialize our LLM, and load the llm-math tool using the `load_tools()` function. To create the agent, we pass the LLM and tools to `create_react_agent()`, Just like chains, agents can be executed with the `.invoke()` method. Here, we pass the chat model a message to find the square root of 101, which isn't a whole number. Let's see how the agent approaches the problem!


In [None]:
from langgraph.prebuilt import create_react_agent
from langchain.agents import load_tools, AgentExecutor

tools = load_tools(["llm-math"], llm=llm)


In [None]:
# Extract the final answer


In [None]:
# Define the tools

# Define the agent

# Invoke the agent

# 5. Custom Tools for Agents

Now that we've created our first agent, let's take a closer a look at tools so we can design our own. **Tools** in LangChain must be formatted in a specific way to be compatible with agents. They must have a name, accessible via the `.name` attribute. A description, which is used by the LLM to determine when to call the tool. In this case, if the LLM interprets the task as being a math problem, it will likely call this tool based on its description.

Let's say we want to define a Python function to generate a financial report for a company. It takes three arguments: the company_name, revenue, and expenses, and outputs a string containing the net_income. We make the use of this function clear in the docstring, defined using triple quotes.

In [None]:
def financial_report(company_name:str, revenue: int, expenses: int) -> str:
    """Generate a financial report for a company that calculates net income."""

    net_income = revenue - expenses

    report = f"Financial Report for {company_name}:\n"
    report += f"Revenue: {revenue}\n"
    report += f"Expenses: {expenses}\n"
    report += f"Net Income: {net_income}\n"

    return report

Here's what the report looks like. 

Let's convert this function into a tool our agent can call. To do this, we import the `@tool` decorator and add it before the function definition. Don't worry if you're not familiar with Python decorators; the `@tool` modifies the function so it's in the correct format to be used by a tool.

In [None]:
from langchain_core.tools import tool


def financial_report(company_name:str, revenue: int, expenses: int) -> str:
    """Generate a financial report for a company that calculates net income."""

    net_income = revenue - expenses

    report = f"Financial Report for {company_name}:\n"
    report += f"Revenue: {revenue}\n"
    report += f"Expenses: {expenses}\n"
    report += f"Net Income: {net_income}\n"

    return report

Like with the built-in tool we were looking at, we can now examine the various attributes of our tool. These include its name, which is the function name by default, its description, which is the function's docstring, and return_direct, which is set to False by default.

When a tool is called with `return_direct=True`, the agent immediately returns the tool's output to the user as the final answer. You should use this option mostly when a tool's output is already in a suitable format for the user and doesn't require further processing or interpretation by the agent.


 
We can also print the tools arguments, which lay out the argument names and expected data types. 

In [None]:
print(financial_report.name)
print(financial_report.description)
print(financial_report.return_direct)
print(financial_report.args)


Let's put our tool into action! We'll again use a ReAct agent, combining the chat LLM with a list of tools to use, containing our new custom tool. We invoke the agent with an input containing the required information: a company name, revenue, and expenses. The response from the agent starts with our input, then determines that the financial_report tool should be called, which returns a tool message containing the output from our function, and finally, the output is passed to the LLM, which responds to us. Let's zoom in on this final message.



Here's the final output from the LLM. Notice anything? Here's the output from the tool based on the function we defined. Notice that there's slight formatting differences between the two; the LLM received the tool output, and put it's own slight spin on it, which you may need to watch out for.

## 6. Introduction to RAG in LangChain

### Integrating Document Loaders

Pre-trained language models don't have access to external data sources - their understanding comes purely from their training data. This means that if we require our model to have knowledge that goes beyond its training data, which could be company data or knowledge of more recent world events, we need a way of integrating that data. In RAG, a user query is embedded and used to retrieve the most relevant documents from the database. Then, these documents are added to the model's prompt so that the model has extra context to inform its response.

<img src="rag-diagram.png" width=500>

There are three primary steps to RAG development in LangChain. The first is loading the documents into LangChain with document loaders. Next, is splitting the documents into chunks. Chunks are units of information that we can index and process individually. The last step is encoding and storing the chunks for retrieval, which could utilize a vector database if that meets the needs of the use case. 

LangChain document loaders are classes designed to load and configure documents for integration with AI systems. LangChain provides document loader classes for common file types such as CSV and PDFs. There are also additional loaders provided by 3rd parties for managing unique document formats, including Amazon S3 files, Jupyter notebooks, audio transcripts, and many more. We will practice loading data from three common formats: PDFs, CSVs, and HTML. LangChain has excellent documentation on all of its document loaders, and there's a lot of overlap in syntax, so explore at your leisure! https://python.langchain.com/docs/integrations/document_loaders

There are a few different types of PDF loaders in LangChain, and there is documentation available online for each. In this tutorial, we'll use the `PyPDFLoader`. We instantiate the `PyPDFLoader` class, passing in the path to the PDF file we're loading. Finally, we use the `.load()` method to load the document into memory, and assign the resulting object to the data variable. We can then check the output to confirm that we have loaded it. Note that this document loader requires installation of the `pypdf` package as a dependency. So, install it using `pip install pypdf` or `conda install pypdf` in Termina.

In [None]:
print(data[0])

When loading CSVs, the syntax is very similar, but instead we use the CSVLoader class. 

Finally, we can load HTML files using the UnstructuredHTMLLoader class. We can access the document's contents, again, with subsetting, and extract the document's metadata with the metadata attribute.

In [None]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

In [None]:
from langchain_community.document_loaders import UnstructuredHTMLLoader



## Splitting external data for retrieval

Now that we've loaded documents from different sources, let's learn how to parse the information. Document splitting splits the loaded document into smaller parts, which are also called chunks. Chunking is particularly useful for breaking up long documents so that they fit within an LLM's context window.

Let's examine the introduction from an academic paper, which is saved as a PDF. One naive splitting option would be to separate the document by-line. This would be simple to implement, but because sentences are often split over multiple lines, and because those lines are processed separately, key context might be lost.

To counteract lost context during chunk splitting, a chunk overlap is often implemented. We've selected two chunks and a chunk overlap shown in green. Having this extra overlap present in both chunks helps retain context. If a model shows signs of losing context and misunderstanding information when answering from external sources, we may need to increase this chunk overlap.

<img src="chank-overlap.webp" width=500>

There isn't one document splitting strategy that works for all situations. We should experiment with multiple methods, and see which one strikes the right balance between retaining context and managing chunk size. We will compare two document splitting methods: `CharacterTextSplitter` and `RecursiveCharacterTextSplitter`. Optimizing this document splitting is an active area of research, so keep an eye out for new developments.

As an example, let's split this quote by Elbert Hubbard, which contains 108 characters, into chunks. We'll compare how the two methods perform on this quote with a chunk_size of 24 characters and a small chunk_overlap of three.

Let's start with `CharacterTextSplitter`. This method splits based on the separator first, then evaluates `chunk_size` and `chunk_overlap` to check if it's satisfied. We call `CharacterTextSplitter`, passing the `separator` to split on, along with the `chunk_size` and `chunk_overlap`. Applying the splitter to the quote with the `.split_text()` method, and printing the output, we can see that we have a problem: each of these chunks contains more characters than our specified chunk_size. `CharacterTextSplitter` splits on the separator in an attempt to make chunks smaller than `chunk_size`, but in this case, splitting on the separator was unable to return chunks below our chunk_size. Let's take a look at a more robust splitting method!

In [None]:
quote = '''One machine can do the work of fifth ordinary humans.\nNo machine can do the work of one extraordinary human.'''

len(quote)

In [None]:
chunk_size = 24
chunk_overlap = 3

from langchain_text_splitters import CharacterTextSplitter



RecursiveCharacterSplitter takes a list of separators to split on, and it works through the list from left to right, splitting the document using each separator in turn, and seeing if these chunks can be combined while remaining under chunk_size. Let's split the quote using the same chunk_size and chunk_overlap.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

rc_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

docs = rc_splitter.split_text(quote)
print(docs)

Notice how the length of each chunk varies. The class split by paragraphs first, and found that the chunk size was too big; likewise for sentences. It got to the third separator: splitting words using the space separator, and found that words could be combined into chunks while remaining under the chunk_size character limit. However, some of these chunks are too small to contain meaningful context, but this recursive implementation may work better on larger documents.

In [None]:
chunk_size = 24
chunk_overlap = 3

from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = UnstructuredHTMLLoader(file_path="metu-regulations.html")
data = loader.load()

rc_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separators=["."],
)

docs = rc_splitter.split_documents(data)
print(docs[0])

We can also use split other file formats, like HTML. Recall that we can load HTML using `UnstructuredHTMLLoader`. Defining the splitter is the same, but for splitting documents, we use the `.split_documents()` method instead of `.split_text()` to perform the split.