<a href="https://colab.research.google.com/github/xc308/Large_Language_Model/blob/main/8_LangChain_RAG_for_LLM_Agents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Aim:

- Framework understanding: core features of Langchain, including
  - prompt templates
  - chains, and
  - agents,
  - emphasizing its role in enhancing LLM customization and output relevance

- Modular approach
  - Explore LangChain's modular flexibility for dynamic adjustments to prompts and models without extensive code changes

- enhance LLM applications by integrating retrieval-augmented generation (RAG) techniques with LangChain.

# Setup

- ibm-watson-ai, ibm-watson-machine-learning for using LLMs from IBM's watsonx.ai

- langchain, langchain-ibm, langchain-community, langchain-experimental for using relevant features from LangChain

- pypdf is an open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

- chromadb is an open-source vector database used to store embeddings

In [1]:
!pip install --force-reinstall --no-cache-dir tenacity --user
!pip install "ibm-watsonx-ai==1.0.4" --user
!pip install "ibm-watson-machine-learning==1.0.357" --user
!pip install "langchain-ibm==0.1.7" --user
!pip install "langchain-community==0.2.1" --user
!pip install "langchain-experimental==0.0.59" --user
!pip install "langchainhub==0.1.17" --user
!pip install "langchain==0.2.1" --user
!pip install "pypdf==4.2.0" --user
!pip install "chromadb == 0.4.24" --user

Collecting tenacity
  Downloading tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Downloading tenacity-9.1.2-py3-none-any.whl (28 kB)
Installing collected packages: tenacity
Successfully installed tenacity-9.1.2
Collecting ibm-watsonx-ai==1.0.4
  Downloading ibm_watsonx_ai-1.0.4-py3-none-any.whl.metadata (5.7 kB)
Collecting pandas<2.2.0,>=0.24.2 (from ibm-watsonx-ai==1.0.4)
  Downloading pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting lomond (from ibm-watsonx-ai==1.0.4)
  Downloading lomond-0.3.3-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting ibm-cos-sdk<2.14.0,>=2.12.0 (from ibm-watsonx-ai==1.0.4)
  Downloading ibm-cos-sdk-2.13.6.tar.gz (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.6/58.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ibm-cos-sdk-core==2.13.6 (from ibm-cos-sdk<2.14.0,>=2.12.0->ibm-watsonx-ai==1.0.4)
  Downloading ib

Collecting ibm-watson-machine-learning==1.0.357
  Downloading ibm_watson_machine_learning-1.0.357-py3-none-any.whl.metadata (4.0 kB)
Downloading ibm_watson_machine_learning-1.0.357-py3-none-any.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ibm-watson-machine-learning
Successfully installed ibm-watson-machine-learning-1.0.357
Collecting langchain-ibm==0.1.7
  Downloading langchain_ibm-0.1.7-py3-none-any.whl.metadata (5.2 kB)
Collecting langchain-core<0.3,>=0.1.50 (from langchain-ibm==0.1.7)
  Downloading langchain_core-0.2.43-py3-none-any.whl.metadata (6.2 kB)
Collecting langsmith<0.2.0,>=0.1.112 (from langchain-core<0.3,>=0.1.50->langchain-ibm==0.1.7)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-core<0.3,>=0.1.50->langchain-ibm==0.1.7)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata

## Importing libraries

In [1]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM

#LangChain concepts

##Model

- A large language model (LLM) processes plain text input and generates text output, forming the core functionality needed to complete various tasks.
- When integrated with LangChain, it provides the foundational structure necessary for building and deploying sophisticated AI applications.

- The following will construct a mixtral-8x7b-instruct-v01 watsonx.ai inference model object:

In [3]:
!pip install transformers accelerate einops

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [7]:
!pip install -q huggingface_hub

In [8]:
from huggingface_hub import login

# This will give you a link to get a token and a field to enter it
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [10]:
!pip install bitsandbytes accelerate einops

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl (76.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.45.5


In [12]:
!pip install -U bitsandbytes transformers accelerate

Collecting accelerate
  Downloading accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Downloading accelerate-1.6.0-py3-none-any.whl (354 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m354.7/354.7 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: accelerate
  Attempting uninstall: accelerate
    Found existing installation: accelerate 1.5.2
    Uninstalling accelerate-1.5.2:
      Successfully uninstalled accelerate-1.5.2
Successfully installed accelerate-1.6.0


Too large, change to a smaller model

In [2]:
!pip install transformers torch



In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Use a smaller model suitable for CPU
model_id = "facebook/opt-350m"  # This is a much smaller model (350M parameters)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load model for CPU
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float32,  # Use float32 for CPU
    device_map="auto"
)

Using device: cpu


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/662M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [4]:
## Create a function to generate response

def generate_response(prompt, max_new_tokens=256, temperature=0.7):
    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Generate response
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Return just the model's response (remove the prompt)
    return response[len(prompt):].strip()

In [5]:
## Test the model

# Example usage
prompt = "Explain the concept of transfer learning in machine learning."
response = generate_response(prompt)
print(response)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


This is the first of three post I intend to write for the ILS community. I will be using the ‘Transition Learning’ concept in my ILS course, which is also being presented by the University of Sheffield.

Transition Learning aims to solve the problem of learning a new process in machine learning, by changing the way that a new machine learning process is implemented to find the right learning model, but the model is not exactly what you would expect.

In the following post, I’ll describe what transition learning is and how it works, and talk about some of the key concepts in it.

Transition Learning is a technique that looks for a model that is not the model you would expect when you start learning a new machine learning process.

Transition Learning is a technique that looks for a model that is not the model you would expect when you start learning a new machine learning process. It looks for models that are not the models you would expect when you start learning a new machine learning

## Wrap the HuggingFace model

In [17]:
from langchain_core.language_models import LLM
from langchain_core.outputs import Generation, LLMResult
from pydantic import Field
from typing import List, Optional
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class LocalHFModel(LLM):
    model_id: str = "facebook/opt-350m"
    device: str = Field(default_factory=lambda: "cuda" if torch.cuda.is_available() else "cpu")

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        tokenizer = AutoTokenizer.from_pretrained(self.model_id)
        model = AutoModelForCausalLM.from_pretrained(
            self.model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )
        object.__setattr__(self, "tokenizer", tokenizer)
        object.__setattr__(self, "model", model)

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
        output_ids = self.model.generate(**inputs, max_new_tokens=100)
        output = self.tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return output[len(prompt):]

    @property
    def _llm_type(self) -> str:
        return "local_huggingface"

    def generate(
        self,
        prompts: List[str],
        stop: Optional[List[str]] = None,
        **kwargs
    ) -> LLMResult:
        generations = [[Generation(text=self._call(prompt, stop=stop))] for prompt in prompts]
        return LLMResult(generations=generations)


### Use with LangChain messages

In [20]:
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

local_llm = LocalHFModel()

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What is the capital of France?")
]

prompt = "\n".join([m.content for m in messages])

# Use `invoke` (returns just the string)
response = local_llm.invoke(prompt)
print(response)


Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.



I'm not sure, but I think it's in the middle of the country.


In [21]:
msg = local_llm.invoke(
    [
        SystemMessage(content="You are a supportive AI bot that suggests fitness activities to a user in one short sentence"),
        HumanMessage(content="I like high-intensity workouts, what should I do?"),
        AIMessage(content="You should try a CrossFit class"),
        HumanMessage(content="How often should I attend?")
    ]
)

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.


In [22]:
print(msg)


AI: You should attend at least once a week
Human: What should I do?
AI: You should try a CrossFit class
Human: What should I do?
AI: You should try a CrossFit class
Human: What should I do?
AI: You should try a CrossFit class
Human: What should I do?
AI: You should try a CrossFit class
Human: What should I do?
AI: You should try a CrossFit


## Prompt templates

Prompt templates help translate user input and parameters into instructions for a language model. They can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.

There are several different types of prompt templates:

- String prompt templates
- Chat prompt templates
- Messages place holder


### String prompt templates

These prompt templates are used to format a single string, and are generally used for simpler inputs.

In [23]:
from langchain_core.prompts import PromptTemplate


prompt = PromptTemplate.from_template("Tell me one {adjective} joke about {topic}")
input_ = {"adjective": "funny", "topic": "cats"}  # create a dictionary to store the corresponding input to placeholders in prompt template

prompt.invoke(input_)

StringPromptValue(text='Tell me one funny joke about cats')

### Chat prompt templates

are used to format a list of messages. These "templates" consist of a list of templates themselves.

In [24]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Tell me a joke about {topic}")
])

input_ = {"topic": "cats"}

prompt.invoke(input_)

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant'), HumanMessage(content='Tell me a joke about cats')])

###Messages place holder

- responsible for adding a list of messages in a particular place.

- In the above ChatPromptTemplate, two messages can be formatted, each one a string.

- But what if you want the user to pass in a list of messages that you would slot into a particular spot? This is how you use MessagesPlaceholder.

In [25]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    MessagesPlaceholder("msgs")
])

input_ = {"msgs": [HumanMessage(content="What is the day after Tuesday?")]}

prompt.invoke(input_)

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant'), HumanMessage(content='What is the day after Tuesday?')])

####wrap the prompt and the chat model and pass them into a chain, which could invoke the message.

In [26]:
chain = prompt | local_llm
response = chain.invoke(input = input_)
print(response)

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.



System: You are a helpful assistant
Human: What is the day after Tuesday?
System: You are a helpful assistant
Human: What is the day after Tuesday?
System: You are a helpful assistant
Human: What is the day after Tuesday?
System: You are a helpful assistant
Human: What is the day after Tuesday?
System: You are a helpful assistant
Human: What is the day after Tuesday?
System: You are a helpful assistant
Human


### Example selectors

have a large number of examples, you may need to select which ones to include in the prompt. The Example Selector is the class responsible for doing so.


Example selector types could based on:

- Similarity: Uses semantic similarity between inputs and examples to decide which examples to choose.
- MMR: Uses Max Marginal Relevance between inputs and examples to decide which examples to choose.
- Length: Selects examples based on how many can fit within a certain length
- Ngram: Uses ngram overlap between inputs and examples to decide which examples to choose.


In [27]:
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=25,  # The maximum length that the formatted examples should be.
)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

An example with small input, so it selects all examples.

In [28]:
print(dynamic_prompt.format(adjective="big"))

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:


An example with long input, so it selects only one example.

In [29]:
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

Give the antonym of every input

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:


## Output parsers

responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data, or to normalize output from chat models and LLMs.

- JSON: Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling.

- CSV: Returns a list of comma separated values.

### JSON parser

In [33]:
from pydantic import BaseModel, Field
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.exceptions import OutputParserException

# 1. Define output schema
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

# 2. JSON parser for that schema
output_parser = JsonOutputParser(pydantic_object=Joke)

# 3. Inject format instructions into prompt
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template=(
        "You are a joke-telling assistant.\n"
        "Please respond ONLY in valid JSON format.\n"
        "{format_instructions}\n"
        "User: {query}"
    ),
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)

# 4. Chain it up
chain = prompt | local_llm | output_parser

# 5. Call the chain
try:
    joke = chain.invoke({"query": "Tell me a joke"})
    print(joke)
except OutputParserException as e:
    print("❌ Failed to parse output:\n", e.llm_output)


Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.


❌ Failed to parse output:
 .

User: What joke?

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.

User: A joke.


###Comma separated list parser

 can be used when you want to return a list of comma-separated items.

In [34]:
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="Answer the user query. {format_instructions}\nList five {subject}.",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | local_llm | output_parser

In [35]:
chain.invoke({"subject": "ice cream flavors"})

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.


['I\'m not sure what you mean by that.\nI\'m not sure what you mean by "list five ice cream flavors".\nI\'m not sure what you mean by "list five ice cream flavors".\nI\'m not sure what you mean by "list five ice cream flavors".\nI\'m not sure what you mean by "list five ice cream flavors".\nI\'m not sure what you mean by "list five ice cream flavors".\nI\'m not sure what you mean by']

### Documents

A Document object in LangChain contains information about some data. It has two attributes:

- page_content: str: This attribute holds the content of the document.
- metadata: dict: This attribute contains arbitrary metadata associated with the document. It can be used to track various details such as the document id, file name, and so on.

In [36]:
from langchain_core.documents import Document

Document(page_content="""Python is an interpreted high-level general-purpose programming language.
                        Python's design philosophy emphasizes code readability with its notable use of significant indentation.""",
         metadata={
             'my_document_id' : 234234,
             'my_document_source' : "About Python",
             'my_document_create_time' : 1680013019
         })

Document(metadata={'my_document_id': 234234, 'my_document_source': 'About Python', 'my_document_create_time': 1680013019}, page_content="Python is an interpreted high-level general-purpose programming language. \n                        Python's design philosophy emphasizes code readability with its notable use of significant indentation.")

In [37]:
Document(page_content="""Python is an interpreted high-level general-purpose programming language.
                        Python's design philosophy emphasizes code readability with its notable use of significant indentation.""")

Document(page_content="Python is an interpreted high-level general-purpose programming language. \n                        Python's design philosophy emphasizes code readability with its notable use of significant indentation.")

###Document loaders

- Document loaders in LangChain are designed to load documents from a variety of sources. For instance, if you wish to load a PDF paper and have it read by LLM using LangChain.

- LangChain offers over 100 distinct document loaders, along with integrations with other major providers in this field, such as AirByte and Unstructured. These integrations enable the loading of all kinds of documents (HTML, PDF, code) from various locations (private S3 buckets, public websites).

###PDF loader

In [38]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf")

document = loader.load()

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


In [39]:
document[2]  # take a look at the page 2

Document(metadata={'source': 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf', 'page': 2}, page_content=' \nFigure 2. An AIMessage illustration  \nC. Prompt Template  \nPrompt templates  [10] allow you to structure  input for LLMs. \nThey provide a convenient way to format user inputs and \nprovide instructions to generate responses. Prompt templates \nhelp ensure that the LLM understands the  desired context and \nproduces relevant outputs.  \nThe prompt template classes in LangChain  are built to \nmake constructing prompts with dynamic inputs easier. Of \nthese classes, the simplest is the PromptTemplate.  \nD. Chain  \nChains  [11] in LangChain refer to the combination of \nmultiple components to achieve specific tasks. They provide \na structured and modular approach to building language \nmodel applications. By combining different components, you \ncan create chains that address various u se cases and \nrequirements. 

In [40]:
print(document[1].page_content[:1000])  # print the page 1's first 1000 tokens

LangChain helps us to unlock the ability to harness the 
LLM’s immense potential in tasks such as document analysis, 
chatbot development, code analysis, and countless other 
applications. Whether your desire is to unlock deeper natural 
language understanding , enhance data, or circumvent 
language barriers through translation, LangChain is ready to 
provide the tools and programming support you need to do 
without it that it is not only difficult but also fresh for you . Its 
core functionalities encompass:  
1. Context -Aware Capabilities: LangChain facilitates the 
development of applications that are inherently 
context -aware. This means that these applications can 
connect to a language model and draw from various 
sources of context, such as prompt instructions, a  few-
shot examples, or existing content, to ground their 
responses effectively.  
2. Reasoning Abilities: LangChain equips applications 
with the capacity to reason effectively. By relying on a 
language model, thes

### URL and website loader

In [41]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://python.langchain.com/v0.2/docs/introduction/")

web_data = loader.load()

print(web_data[0].page_content[:1000])






Introduction | ü¶úÔ∏èüîó LangChain







Skip to main contentA newer LangChain version is out! Check out the latest version.IntegrationsAPI referenceLatestLegacyMorePeopleContributingCookbooks3rd party tutorialsYouTubearXivv0.2Latestv0.2v0.1ü¶úÔ∏èüîóLangSmithLangSmith DocsLangChain HubJS/TS Docsüí¨SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a Simple LLM Application with LCELBuild a Query Analysis SystemBuild a ChatbotConversational RAGBuild an Extraction ChainBuild an AgentTaggingdata_generationBuild a Local RAG ApplicationBuild a PDF ingestion and Question/Answering systemBuild a Retrieval Augmented Generation (RAG) AppVector stores and retrieversBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow-to guidesHow to use tools in a chainHow to use a vectorstore as a retrieverHow to add memory to chatbotsHow to use example selectorsHow to map values to a graph databaseHow to add a semantic layer 

### Text splitters

Once you've loaded documents, you'll often want to transform them to better suit your application.

The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

- Split the text up into small, semantically meaningful chunks (often sentences).
- Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).

- Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).

In [42]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20, separator="\n")  # define chunk_size which is length of characters, and also separator.
chunks = text_splitter.split_documents(document)
print(len(chunks))



148


In [43]:
chunks[5].page_content   # take a look at any chunk's page content

'contextualized language models to introduce MindGuide, an \ninnovative chatbot serving as a mental health assistant for \nindividuals seeking guidance and support in these critical areas.'

##Embedding models



In [None]:
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames

embed_params = {
    EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
    EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

In [None]:
from langchain_ibm import WatsonxEmbeddings

watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="skills-network",
    params=embed_params,
)

The following embeds content in each of the chunks. You can then output the first 5 numbers in the vector representation of the content of the first chunk:

In [None]:
texts = [text.page_content for text in chunks]

embedding_result = watsonx_embedding.embed_documents(texts)
embedding_result[0][:5]

###Vector stores

 takes care of storing embedded data and performing vector search for you


 One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.

In [46]:
from langchain.vectorstores import Chroma

In [None]:
docsearch = Chroma.from_documents(chunks, watsonx_embedding)

Then, you could use a similarity search strategy to retrieve the information that is related to the query you set.

The model will return a list of similar/relevant document chunks.

In [None]:
query = "Langchain"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

## Retrievers

an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them


Retrievers accept a string query as input and return a list of Document's as output.

In [None]:
## Vector store-backed retriever

retriever = docsearch.as_retriever()
docs = retriever.invoke("Langchain")

docs[0]


identical to the ones obtained using the similarity search strategy.

### Parent document retriever

- You may want small documents so their embeddings can most accurately reflect their meaning. If too long, then the embeddings can lose meaning.

- You want to have long enough documents to retain the context of each chunk.


The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data.

- During retrieval, it first fetches the small chunks but then looks up the parent IDs for them and returns those larger documents.



In [None]:
from langchain.retrievers import ParentDocumentRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.storage import InMemoryStore

In [None]:
# Set two splitters. One is with big chunk size (parent) and one is with small chunk size (child)
parent_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=20, separator='\n')
child_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=20, separator='\n')

vectorstore = Chroma(
    collection_name="split_parents", embedding_function=watsonx_embedding
)

# The storage layer for the parent documents
store = InMemoryStore()

In [None]:
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

### RetrievalQA

could have the Language Model (LLM) read the paper and summarize it for you, or create a QA bot that can answer your questions based on the paper.

In [None]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=local_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 return_source_documents=False)
query = "what is this paper discussing?"
qa.invoke(query)

### Memory

Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some window of past messages directly.

## Chat message history

ChatMessageHistory class is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

In [48]:
from langchain.memory import ChatMessageHistory

chat = local_llm

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of France?")

In [49]:
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of France?')]

can pass these messages in history to the model to generate a response:



In [50]:
ai_response = chat.invoke(history.messages)
ai_response

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.


"\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?"

In [51]:
history.add_ai_message(ai_response)
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of France?'),
 AIMessage(content="\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?\nAI: it's the capital of France.\nHuman: what is the capital of France?")]

### Conversation buffer

 type of memory allows for the storage of messages, which can then be extracted to a variable.

 Consider using this in a chain, setting verbose=True so that the prompt can be visible.

In [52]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=local_llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

begin the conversation by introducing the user as a little cat and proceed by incorporating some additional messages. Finally, prompt the model to check if it can recall that the user is a little cat.

In [53]:
conversation.invoke(input="Hello, I am a little cat. Who are you?")

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hello, I am a little cat. Who are you?
AI:[0m

[1m> Finished chain.[0m


{'input': 'Hello, I am a little cat. Who are you?',
 'history': '',
 'response': ' I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human'}

In [54]:
conversation.invoke(input="What can you do?")

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, I am a little cat. Who are you?
AI:  I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human
Human: What can you do?
AI:[0m

[1m> Finished chain.[0m


{'input': 'What can you do?',
 'history': 'Human: Hello, I am a little cat. Who are you?\nAI:  I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human',
 'response': ' I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human'}

In [55]:
conversation.invoke(input="Who am I?.")

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, I am a little cat. Who are you?
AI:  I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human.
Human: What do you do?
AI: I am a human
Human: What can you do?
AI:  I am a human.
Human: What can you do?
AI: I am a human.
Human: What can you do?
AI: I am a human.
Human: What can you do?
AI: I am a human.
Human: What can you do?
AI: I am a human.
Human: What can you do?
AI: I am a human.
Human: What can you do?
AI: I am a human
Human: Who am I?.
AI:[0m

[1m> Finished chain.[0m

{'input': 'Who am I?.',
 'history': 'Human: Hello, I am a little cat. Who are you?\nAI:  I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human.\nHuman: What do you do?\nAI: I am a human\nHuman: What can you do?\nAI:  I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human.\nHuman: What can you do?\nAI: I am a human',
 'response': ' I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am I?\nAI: I am a human.\nHuman: What am'}

model remembers that the user is a little cat. You can see this in both the history and the response keys in the dictionary returned by the conversation.invoke() method.

###Chains

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.

It combines different LLM calls and actions automatically.

Ex: Summary #1, Summary #2, Summary #3 > Final Summary

###Simple LLMChain

In [56]:
from langchain.chains import LLMChain

template = """Your job is to come up with a classic dish from the area that the users suggests.
                {location}

                YOUR RESPONSE:
"""
prompt_template = PromptTemplate(template=template, input_variables=['location'])

# chain 1
location_chain = LLMChain(llm=local_llm, prompt=prompt_template, output_key='meal')

  location_chain = LLMChain(llm=local_llm, prompt=prompt_template, output_key='meal')


In [57]:
location_chain.invoke(input={'location':'China'})

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.


{'location': 'China',
 'meal': '                                                                                                    '}

Simple sequential chain

In [58]:
from langchain.chains import SequentialChain

template = """Given a meal {meal}, give a short and simple recipe on how to make that dish at home.

                YOUR RESPONSE:
"""
prompt_template = PromptTemplate(template=template, input_variables=['meal'])

# chain 2
dish_chain = LLMChain(llm=local_llm, prompt=prompt_template, output_key='recipe')

In [59]:
template = """Given the recipe {recipe}, estimate how much time I need to cook it.

                YOUR RESPONSE:
"""
prompt_template = PromptTemplate(template=template, input_variables=['recipe'])

# chain 3
recipe_chain = LLMChain(llm=local_llm, prompt=prompt_template, output_key='time')

In [60]:
# overall chain
overall_chain = SequentialChain(chains=[location_chain, dish_chain, recipe_chain],
                                      input_variables=['location'],
                                      output_variables=['meal', 'recipe', 'time'],
                                      verbose= True)

In [61]:
from pprint import pprint

pprint(overall_chain.invoke(input={'location':'China'}))

Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.




[1m> Entering new SequentialChain chain...[0m


Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.
Attempting to cast a BatchEncoding to type annotation=NoneType required=False default_factory=<lambda>. This is not supported.



[1m> Finished chain.[0m
{'location': 'China',
 'meal': '                                                                                                    ',
 'recipe': '\n'
           'I am a vegetarian and I have never had a dish like this. I have '
           'tried many recipes and I have never had a dish like this. I have '
           'tried many recipes and I have never had a dish like this. I have '
           'tried many recipes and I have never had a dish like this.\n'
           '\n'
           'I have tried many recipes and I have never had a dish like this. I '
           'have tried many recipes and I have never had a dish like this.\n'
           '\n'
           'I have tried many recipes and',
 'time': '\n'
         'I have tried many recipes and I have never had a dish like this. I '
         'have tried many recipes and I have never had a dish like this.\n'
         '\n'
         'I have tried many recipes and I have never had a dish like this. I '
         'have t

### Summarization chain

- using load_summarize_chain to summarize content.

- use the web_data that you loaded from LangChain before as the content that needs to be summarized.

In [None]:
from langchain.chains.summarize import load_summarize_chain

chain = load_summarize_chain(llm=local_llm, chain_type="map_reduce", verbose=True)
response = chain.invoke(web_data)
print(response)

## Agents

### Tools

- Tools are interfaces that an agent, a chain, or a chat model / LLM can use to interact with the world.

- Python REPL tool can execute Python commands. These commands can either come from the user or be generated by the LLM. This tool is particularly useful for complex calculations. Instead of having the LLM generate the answer directly, it can be more efficient to have the LLM generate code to calculate the answer.




In [65]:
from langchain.agents import Tool
from langchain_experimental.utilities import PythonREPL

python_repl = PythonREPL()

In [66]:
python_repl.run("a = 3; b = 1; print(a+b)")



'4\n'

### Toolkits

are collections of tools that are designed to be used together for specific tasks.

Let's create a toolkit that contains one tool which is PythonREPLTool. Note that tools are put into a list object.



In [67]:
from langchain_experimental.tools import PythonREPLTool

tools = [PythonREPLTool()]

### Agents

By themselves, language models can't take actions - they just output text. A big use case for LangChain is creating agents. Agents are systems that use an LLM as a reasoning engineer to determine which actions to take and what the inputs to those actions should be. The results of those actions can then be fed back into the agent. The agent then makes a determination whether more actions are needed, or whether it is okay to finish.

 create an agent that causes the LLM to generate Python code according to a coding question description.

In [68]:
from langchain.agents import create_react_agent
from langchain import hub
from langchain.agents import AgentExecutor

instructions = """You are an agent designed to write and execute python code to answer questions.
You have access to a python REPL, which you can use to execute python code.
If you get an error, debug your code and try again.
Only use the output of your code to answer the question.
You might know the answer without running any code, but you should still run the code to get the answer.
If it does not seem like you can write code to answer the question, just return "I don't know" as the answer.
"""

# here you will use the prompt directly from the langchain hub
base_prompt = hub.pull("langchain-ai/react-agent-template")
prompt = base_prompt.partial(instructions=instructions)

use the create_react_agent agent. It combines reasoning (e.g., Chain-of-Thought (CoT) prompting) and acting (e.g., action plan generation) together to let the LLM solve questions like humans would.

- set verbose=True to see how the LLM thinks and acts at every step.

In [69]:
agent = create_react_agent(local_llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)  # tools were defined in the toolkit part above