# **LangChain - A Framework for developing applications powered by LLMs**

## **What's Covered?**
1. Introduction to LangChain
    - What is LangChain?
    - Why use LangChain?
    - Architecture
    - LangChain Packages
2. Building Blocks
    - Prompt Template
    - Chat Model
    - Output Parser
    - Chains
3. Case Study
    - Building a Chat Prompt Template
    - Building a Chat Model
    - Building an Output Parser
    - Putting it all together

## **Introduction to LangChain**

### **What is LangChain?**
LangChain is an open-source framework that acts like a toolkit and orchestration layer for building applications with Large Language Models.

It simplifies the process of chaining together different components (like LLMs, data sources, and tools) to create complex, intelligent applications. It's designed to make prompt engineering more efficient and allow developers to adapt language models to specific business contexts.

Imagine you want to build a truly smart application powered by a Large Language Model (LLM), like OpenAI's GPT or Google's Gemini. Simply sending a prompt to an LLM isn't usually enough for real-world scenarios. You often need to:

1. **Get relevant information:** Your LLM might need to access external data (your documents, a database, the internet) to answer a user's question accurately.
2. **Remember past conversations:** For a coherent dialogue, the LLM needs a memory of what was said before.
3. **Perform actions:** Sometimes, the LLM needs to "do" something, like search the web, call an API, or run a calculator.
4. **Structure its output:** You might want the LLM's response to be in a specific format (e.g., JSON, a bulleted list).

This is where LangChain comes in.

### **Why use LangChain?**
- **Abstraction:** It provides high-level abstractions over common LLM functionalities, so you don't have to worry about the low-level API calls for every LLM.
- **Modularity:** It offers modular components that you can mix and match to build various LLM-powered applications (chatbots, Q&A systems, content generators, agents).
- **Integration:** It provides numerous integrations with different LLMs, vector databases, document loaders, and other tools.
- **Orchestration:** It excels at orchestrating sequences of operations, allowing you to build multi-step workflows.

### **Architecture**
LangChain is a framework that consists of a number of packages.

<img src="images/langchain_stack_updated.png" width="500" height="600">

### **LangChain Packages**
**langchain-core**  
This package contains base abstractions for different components and ways to compose them together. The interfaces for core components like **chat models**, **vector stores**, **tools** and more are defined here. No third-party integrations are defined here. The dependencies are very lightweight.

**langchain**  
The main langchain package contains chains and retrieval strategies that make up an application's cognitive architecture. These are NOT third-party integrations. All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.

**Integration packages**  
Popular integrations have their own packages (e.g. langchain-openai, langchain-anthropic, etc) so that they can be properly versioned and appropriately lightweight.

**langchain-community**  
This package contains third-party integrations that are maintained by the LangChain community. Key integration packages are separated out (see above). This contains integrations for various components (chat models, vector stores, tools, etc). All dependencies in this package are optional to keep the package as lightweight as possible.

**langgraph**  
langgraph is an extension of langchain aimed at building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.

LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for composing custom flows.

## **Building Blocks**

1. **Prompt Template**
2. **Chat Model**
3. **Output Parser**
4. **Chains**

<img src="images/langchain_LCEL.JPG">

In [9]:
# # Install langchain

# ! pip install langchain

### **Prompt Template**

- Writing static strings as prompts quickly becomes unmanageable. We need a way to inject dynamic information. This is where Prompt Template comes in.
- These are the objects that help you construct prompts dynamically by accepting input variables. Think of them as blueprints for your prompts.
- There are two types: **PromptTemplate** and **ChatPromptTemplate**.
- **Input** to the Prompt Template should be a **dictionary** containing raw user inputs.
- **Output** of a Prompt Template will be a **string** or **list of chat messages**.

In [2]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate(
                    messages = [
                    ("system", "You are a helpful AI Tutor with expertise in Data Science and Artificial Intelligence. "),
                    ("human", "What is {topic}?"),
                    ]
)

In [3]:
prompt_template.input_variables

['topic']

In [6]:
prompt_template.invoke({"topic": "LangChain"})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful AI Tutor with expertise in Data Science and Artificial Intelligence. ', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is LangChain?', additional_kwargs={}, response_metadata={})])

In [7]:
print(prompt_template.invoke({"topic": "LangChain"}).to_string())

System: You are a helpful AI Tutor with expertise in Data Science and Artificial Intelligence. 
Human: What is LangChain?


### **Chat Models**

LLMs handle various language operations such as translation, summarization, question answering, and content creation.

In [10]:
# Setup API Key

f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

In [11]:
# Import OpenAI ChatModel
from langchain_openai import ChatOpenAI

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatOpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)

# Creating a Prompt
prompt = "What is LangChain?"

# Printing the output of model
model_response = chat_model.invoke(prompt)

model_response

AIMessage(content='LangChain is an open-source framework designed to facilitate the development of applications powered by large language models (LLMs). It provides a set of tools and abstractions that enable developers to build, manage, and optimize the workflow of LLM-based applications seamlessly. Here are some key features and components of LangChain:\n\n1. **Modularity**: LangChain promotes a modular approach, allowing developers to utilize different components like LLMs, prompts, and chains in a flexible and reusable manner.\n\n2. **Chains**: It allows users to create "chains" that are sequences of operations executed in a specific order. This can be useful for scenarios where multiple steps of processing are required, such as generating text based on user input, followed by additional processing or API calls.\n\n3. **Data Loaders**: LangChain includes functionality to load and manage data from various sources, making it easier to integrate external data into LLM applications.\n\

### **Output Parser**

- LLMs primarily generate free-form text. However, in many applications, you need structured data (e.g., a list of items, a JSON object, a specific format). OutputParsers bridge this gap.
- These are the objects that take the raw string output from an LLM and transform it into a more usable Python data structure (e.g., a list, a dictionary, a Pydantic object).
- **Input** to the output parser should be an **AIMessage**.

**The output of a ChatModel (and therefore, of this chain) is a AI Message. However, it's often much more convenient to work with strings. Let's add a simple output parser to convert the chat message to a string.**

In [13]:
# Output Parsing

from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [14]:
output_parser.invoke(model_response)

'LangChain is an open-source framework designed to facilitate the development of applications powered by large language models (LLMs). It provides a set of tools and abstractions that enable developers to build, manage, and optimize the workflow of LLM-based applications seamlessly. Here are some key features and components of LangChain:\n\n1. **Modularity**: LangChain promotes a modular approach, allowing developers to utilize different components like LLMs, prompts, and chains in a flexible and reusable manner.\n\n2. **Chains**: It allows users to create "chains" that are sequences of operations executed in a specific order. This can be useful for scenarios where multiple steps of processing are required, such as generating text based on user input, followed by additional processing or API calls.\n\n3. **Data Loaders**: LangChain includes functionality to load and manage data from various sources, making it easier to integrate external data into LLM applications.\n\n4. **Memory**: Th

### **Chains**

- A chain is simply a sequence of operations where the output of one step becomes the input of the next. This allows you to build more complex and intelligent workflows than a single LLM call could achieve.
- The `|` symbol chains together the different components feeds the output from one component as input into the next component.


In [22]:
# 1.Prompt Template
prompt = prompt_template.invoke({"topic": "Linear Regression"})

# 2. Chat Model
response = chat_model.invoke(prompt)

# 3. Ouput Parser
output_parser.invoke(response)

'Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes that there is a linear relationship between the independent variable(s) and the dependent variable.\n\nIn simple linear regression, there is only one independent variable, while in multiple linear regression, there are multiple independent variables. The goal of linear regression is to find the best-fitting line or plane that describes the relationship between the independent and dependent variables.\n\nThe equation of a simple linear regression model can be written as:\n\ny = b0 + b1*x\n\nWhere:\n- y is the dependent variable\n- x is the independent variable\n- b0 is the y-intercept (the value of y when x = 0)\n- b1 is the slope (the change in y for a one-unit change in x)\n\nThe values of b0 and b1 are estimated using a method such as the least squares method, which minimizes the sum of the squared differences between the observed va

In [16]:
chain = prompt_template | chat_model | output_parser

user_input = {"topic": "NLP"}

chain.invoke(user_input)

'Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human language. It involves the development of algorithms and models that allow computers to understand, interpret, generate, and respond to human language in a meaningful way.\n\nKey components and tasks in NLP include:\n\n1. **Text Processing**: This involves breaking down text into manageable components, such as sentences and words, often through techniques like tokenization, stemming, and lemmatization.\n\n2. **Sentiment Analysis**: Determining the emotional tone behind a series of words, which is often used to understand opinions in reviews, social media, and customer feedback.\n\n3. **Named Entity Recognition (NER)**: Identifying and categorizing key entities in the text, such as people, organizations, and locations.\n\n4. **Part-of-Speech Tagging**: Assigning parts of speech to each word (nouns, verbs, adjectives, etc.), which helps in understand

## **Case Study**

**Create an AI Tutor App that uses Prompts and Chat internally to give Python Implementation tutorial for Data Science topics**

Inorder to solve this, we will first create the following three components:
1. Chat Prompt Template
2. Chat Model
3. Output Parser

In [17]:
## Building a Chat Prompt Template

from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# Constructing System Prompt
system_prompt = SystemMessagePromptTemplate.from_template("""
You are a friendly AI Tutor with expertise in Data Science and AI who tells step by step Python Implementation for topics asked by user.
""")

# Constructing Human Prompt
human_prompt = HumanMessagePromptTemplate.from_template("Tell me a python implementation for {topic_name}.")

# Compiling Chat Prompt
chat_prompt = ChatPromptTemplate(messages=[system_prompt, human_prompt])

chat_prompt

ChatPromptTemplate(input_variables=['topic_name'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='\nYou are a friendly AI Tutor with expertise in Data Science and AI who tells step by step Python Implementation for topics asked by user.\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic_name'], input_types={}, partial_variables={}, template='Tell me a python implementation for {topic_name}.'), additional_kwargs={})])

In [18]:
## Building a Chat Model

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)

In [19]:
## Building an Output Parser

from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [20]:
## Chaining All the components together

chain = chat_prompt | chat_model | output_parser

In [21]:
## Calling the chain

user_input = {"topic_name": "Logistic Regression"}

output = chain.invoke(user_input)

print(output)

Sure! Here is a step-by-step implementation of Logistic Regression in Python using a sample dataset:

```python
# Step 1: Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 2: Load the dataset
# For this example, we'll use the famous Iris dataset from sklearn
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = (iris.target == 0).astype(int)  # Converting classes to binary (1 if Iris-setosa else 0)

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Create and train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Step 5: Make predictions on the test set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model
accuracy = accuracy_score(y_test, y