# **Chat Models**

## **What's Covered?**
1. Introduction to Chat Models
    - What are Chat Models?
    - Capabilities
    - Integrations
2. Building Chat Model - GoogleAI & OpenAI
    - Installing the libraries
    - Setting up the API Key
    - Instantiating the Chat Model and Standard Parameters
    - Key Methods
    - invoke() Method
    - stream() Method
    - batch(list_of_message_lists) Method
3. HuggingFace Chat Models
    - What is HuggingFace?
    - Why use HuggingFace Chat Models with LangChain?
    - Installation
    - Hugging Face Local Pipelines
    - Huggingface Endpoints (COMING SOON)
    - ChatHuggingFace (COMING SOON)

## **Introduction to Chat Models**

LangChain provides a unified interface for interacting with various chat models from different providers (OpenAI, Google, Anthropic, Cohere, etc.).

LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., ChatOllama, ChatAnthropic, ChatOpenAI, etc.).


### **What are Chat Models?**
At their core, chat models are a specialized type of Large Language Model (LLM) designed and fine-tuned to engage in conversational interactions. Unlike older "text completion" models (which simply predict the next word given a string of text), chat models understand and operate on a concept of "messages" with associated "roles."

Modern LLMs are typically accessed through a chat model interface that takes a list of messages as input and returns an AI message as output. Chat Models are customized for conversational usage. **[Click Here](https://python.langchain.com/docs/integrations/chat/)** to check the complete list of LLMs which can be used with LangChain.

**Chat Models Input: A list of BaseMessage objects (typically SystemMessage, HumanMessage, AIMessage)**
```
[SystemMessage(content="You are a helpful assistant."), HumanMessage(content="What is the capital of France?")]
```
**Chat Models Output: A single AIMessage object**
```
AIMessage(content="The capital of France is Paris.")
```

### **Capabilities**

The newest generation of chat models offer additional capabilities:
1. [Tool calling](https://python.langchain.com/docs/concepts/tool_calling/): Many popular chat models offer a native tool calling API. This API allows developers to build rich applications that enable LLMs to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.
2. [Structured output](https://python.langchain.com/docs/concepts/structured_outputs/): A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.
3. [Multimodality](https://python.langchain.com/docs/concepts/multimodality/): The ability to work with data other than text; for example, images, audio, and video.

### **Integrations**
LangChain has many chat model integrations that allow you to use a wide variety of models from different providers. These integrations are one of two types:

1. **Official models:** These are models that are officially supported by LangChain and/or model provider. You can find these models in the **`langchain-<provider>`** packages.
2. **Community models:** There are models that are mostly contributed and supported by the community. You can find these models in the **`langchain-community`** package.

## **Building Chat Model - GoogleAI & OpenAI**

### **Installing the libraries**

```python
! pip install --upgrade --quiet langchain-google-genai
! pip install --upgrade --quiet langchain-openai
```

In [1]:
# ! pip install --upgrade --quiet langchain-google-genai
# ! pip install --upgrade --quiet langchain-openai

### **Setting up the API Key**

In [2]:
# Setup API Key

f = open('keys/.gemini.txt')

GOOGLE_API_KEY = f.read()

In [3]:
# Setup API Key

f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

### **Instantiating the Chat Model and Standard Parameters**

Standard parameters are currently only enforced on integrations that have their own integration packages (e.g. langchain-openai, langchain-anthropic, etc.), they're not enforced on models in langchain-community.

| Parameter      | Description |
|--------------|-------------|
| model        | The name or identifier of the specific AI model you want to use (e.g., "gpt-3.5-turbo" or "gpt-4"). |
| temperature  | Controls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.0) makes them more deterministic and focused. |
| timeout      | The maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesn’t hang indefinitely. |
| max_tokens   | Limits the total number of tokens (words and punctuation) in the response. This controls how long the output can be. |
| stop         | Specifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response. |
| max_retries  | The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits. |
| api_key      | The API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model. |
| base_url     | The URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests. |
| rate_limiter | An optional BaseRateLimiter to space out requests to avoid exceeding rate limits. |

In [4]:
# Import ChatModel
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_openai import ChatOpenAI

# Pass the standard parameters during initialization
google_chat_model = ChatGoogleGenerativeAI(api_key=GOOGLE_API_KEY, 
                                           model="gemini-2.0-flash", 
                                           temperature=1)

openai_chat_model = ChatOpenAI(api_key=OPENAI_API_KEY, 
                               model="gpt-4o-mini", 
                               temperature=1)

I0000 00:00:1750929757.374541 14971916 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


### **Key Methods**
The key methods of a chat model are:

1. **invoke:** The primary method for interacting with a chat model. It takes a list of messages as input and returns a list of messages as output.
2. **stream:** A method that allows you to stream the output of a chat model as it is generated, token by token. This is crucial for building responsive user interfaces.
3. **batch:** A method that allows you to batch multiple requests to a chat model together for more efficient processing.
4. **bind_tools:** A method that allows you to bind a tool to a chat model for use in the model's execution context.
5. **with_structured_output:** A wrapper around the invoke method for models that natively support structured output.

### **invoke() Method**

The primary method to send a list of messages to the model and get a single response.

In [5]:
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage(content="You are a polite assistant."),
    HumanMessage(content="Hello!"),
]

response = google_chat_model.invoke(messages)

print(response.content)

Hello! How can I help you today?


### **stream() Method** 

Allows you to receive the model's response incrementally, token by token. This is crucial for building responsive user interfaces.

In [6]:
print("\n--- Streaming Response ---")

for chunk in openai_chat_model.stream(messages):
    print(chunk.content, end="", flush=True)
    
print("\n--- End Streaming ---")


--- Streaming Response ---
Hello! How can I assist you today?
--- End Streaming ---


### **batch(list_of_message_lists) Method** 

For sending multiple sets of messages in a single API call (if the provider supports it), which can be more efficient.

In [7]:
batch_messages = [
    [HumanMessage(content="What is 1+1?")],
    [HumanMessage(content="What is the capital of India?")],
]
responses = google_chat_model.batch(batch_messages)

for res in responses:
    print(res.content)

1 + 1 = 2
The capital of India is **New Delhi**.


## **HuggingFace Chat Models**

HuggingFace is an incredibly popular platform and community that has democratized access to state-of-the-art machine learning models, especially in Natural Language Processing (NLP). When it comes to chat models, HuggingFace hosts a vast array of models, many of which can be used for conversational AI.

### **What is HuggingFace?**

HuggingFace is a giant online library and community for machine learning models. 

They provide:
1. **Transformers Library:** A powerful Python library that makes it easy to download, train, and use pre-trained NLP models (including chat models).
2. **HuggingFace Hub:** A platform where anyone can share and discover models, datasets, and demos. It's like GitHub, but for ML models.
3. **Tools & Ecosystem:** A rich set of tools for fine-tuning, deploying, and evaluating models.

Many LLMs, including those capable of chat, are available on the HuggingFace Hub. These can range from smaller, open-source models that you can run locally to larger models that might require more significant computational resources.

### **Why use HuggingFace Chat Models with LangChain?**

While cloud-based LLMs like OpenAI's GPT models or Google's Gemini are powerful, HuggingFace offers distinct advantages:

1. **Open Source & Flexibility:** Many models on HuggingFace are open-source, giving you more control, transparency, and the ability to fine-tune them for very specific tasks.
2. **Cost-Effectiveness (Potentially):** If you can run models locally or on your own infrastructure, you can potentially reduce API costs associated with commercial LLM providers.
3. **Privacy/Security:** For sensitive data, running models locally or on your private cloud can offer better privacy and security controls.
4. **Experimentation:** A vast playground for trying out different model architectures and sizes.
5. **Community Support:** A very active and helpful community.

LangChain acts as a crucial bridge here. It provides a consistent interface (`ChatHuggingFace` class) that allows you to easily integrate models from the HuggingFace ecosystem into your LangChain applications, abstracting away much of the underlying complexity of the transformers library.

### **Installation**
```python
! pip install --upgrade --quiet langchain-huggingface transformers huggingface_hub
```

In [7]:
# ! pip install --upgrade --quiet langchain-huggingface transformers huggingface_hub

### **Hugging Face Local Pipelines**

Hugging Face models can be run locally through the **HuggingFacePipeline** class.

When we use the HuggingFacePipeline, it downloads the complete model from the HFHub into our local system and infer it locally.

There are two ways in which you can use the HuggingFacePipeline:
- **Way 1:** Models can be loaded by specifying the model parameters using the `from_model_id()` method.
- **Way 2:** Models can also be loaded by passing in an existing transformers `pipeline()` directly.

**Note: HuggingFacePipeline only supports text-generation, text2text-generation, image-text-to-text, summarization and translation for now. (As per 26th June 2025)**


In [9]:
# Way 1
from langchain_huggingface import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 512},
)

hf.invoke("What should I study to become a data scientist?")

Device set to use mps:0


"What should I study to become a data scientist?\n\nI've worked at the Department of Energy, where I've been responsible for research on energy efficiency and climate change for more than 15 years. That's why I have a broad definition of what I do and what I think about.\n\nWhat I do is work on the ground floor of the energy transition, where we can deliver a new kind of energy, that will serve as a catalyst for the next generation of clean energy."

In [10]:
# Way 2
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
    task="text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=512
)

hf = HuggingFacePipeline(pipeline=pipe)

hf.invoke("What should I study to become a data scientist?")

Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'What should I study to become a data scientist?\n\nThis is a very difficult question. It is very difficult to define exactly what one should study. It is also very difficult to define what one should study. Many different types of research require a special set of skills, and many different types of research require a wide range of skills.\n\nThe main thing is to understand what is expected of the researcher. This is very important in the field of data science. There are many different types of research to address the questions you should be asking.\n\nThe types of research that you should study are:\n\nHuman Factors\n\nAnalytical Methods\n\nPhysical Studies\n\nData Mining\n\nStatistical Methods\n\nPsychological Methods\n\nA major task of most researchers is to understand how information is stored. The question of how information is stored can be very confusing. The main problem with data is that it cannot always be made available. The best way to understand how information is stored 

### **Huggingface Endpoints (COMING SOON)**

This works with any model that supports text generation (i.e. text completion) task. 

To use this class, do the following first:
1. Install the `huggingface_hub` package using this command: `! pip install huggingface_hub`
2. The environment variable `HUGGINGFACEHUB_API_TOKEN` set with your API token, or given as a named parameter to the constructor.

#### **Understanding the Inference Providers**
Hugging Face’s Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners.
[Click here](https://huggingface.co/settings/inference-providers) to get the list of all the inference providers.

In [1]:
# # Setup API Key

# f = open('keys/.hf_api_key.txt')

# HF_API_KEY = f.read()

In [2]:
# import os

# os.environ["HUGGINGFACEHUB_API_TOKEN"] = HF_API_KEY
# os.environ["HF_TOKEN"] = HF_API_KEY

Here is an example of how you can access HuggingFaceEndpoint integration of the free Serverless Endpoints API.

In [3]:
# from langchain_huggingface import HuggingFaceEndpoint

# repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

# llm = HuggingFaceEndpoint(
#     repo_id=repo_id,
#     max_length=128,
#     temperature=0.5,
#     huggingfacehub_api_token=HF_API_KEY,
# )

In [4]:
# llm.invoke("What did foo say about bar?")

In [5]:
# from langchain_huggingface import HuggingFaceEndpoint

# repo_id = "deepseek-ai/DeepSeek-R1"
# inference_provider = "nebius"

# llm = HuggingFaceEndpoint(
#     repo_id=repo_id,
#     provider=inference_provider,
#     max_length=128,
#     temperature=0.5,
#     task="conversational"
# )

In [6]:
# from langchain_huggingface import ChatHuggingFace

# chat_llm = ChatHuggingFace(
#     llm=llm,
#     repo_id=repo_id,
#     provider=inference_provider,
#     max_length=128,
#     temperature=0.5,
#     task="conversational"
# )

### **ChatHuggingFace (COMING SOON)**

Works with HuggingFaceTextGenInference, HuggingFaceEndpoint, HuggingFaceHub, and HuggingFacePipeline LLMs.

Upon instantiating this class, the model_id is resolved from the url provided to the LLM, and the appropriate tokenizer is loaded from the HuggingFace Hub.

#### **Setup**

```python
from huggingface_hub import login
login() # You will be prompted for your HF key, which will then be saved locally
```

#### **ChatHuggingFace() Args**
- llm: HuggingFaceTextGenInference, HuggingFaceEndpoint, HuggingFaceHub, or HuggingFacePipeline LLM to be used.

In [7]:
# from huggingface_hub import login
# login()

In [8]:
# from langchain_huggingface import ChatHuggingFace

# chat_llm = ChatHuggingFace(llm=llm)