# Chat Bot
***
## Table of Contents
1. [Introduction](#1-introduction)
1. [Environment Variables](#2-environment-variables)
1. [Loading Model](#3-loading-model)
1. [Conversational Memory](#4-conversational-memory)
    - [Traditioanl Methods](#traditional-methods)
    - [Modern Methods](#modern-methods)
1. [Prompt Templates](#5-prompt-templates)
1. [Trimming](#6-trimming)
1. [Streaming](#7-streaming)
1. [References](#8-references)
***

## 1. Introduction
The objective of this exercise is to build a simple chatbot with memory using LangChain and LangGraph. This tutorial is primarily based on the official LangChain documentation.

## 2. Environment Variables
The API keys and environment variables should never be hardcoded or exposed publicly. The [python-dotenv](https://pypi.org/project/python-dotenv/) library facilitates secure access to variables defined in a `.env` file.

In [1]:
import os

try:
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    raise ImportError("Error: 'python-dotenv not installed")

print(f"Project name: {os.environ['LANGSMITH_PROJECT']}")

Project name: lc_fundamentals


## 3. Loading Model

In [2]:
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, AIMessage

model_name = "gpt-4o-mini"
provider = "openai"
model = init_chat_model(model=model_name, model_provider=provider)

## 4. Conversational Memory
It is extremely important for chatbots to retain conversation history. By default, chat models do not remember any information from previous interactions. This means that every time the model is called using the .invoke() method, it generates a new response without considering the prior conversation.

In [3]:
model.invoke(input="Hi, I am John.")

AIMessage(content='Hi John! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 13, 'total_tokens': 23, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-C8Bv6RWwBHG4ZyNxeZX5CzvMnvrHs', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--4fb4cad8-fbc2-4b79-ad11-2340f21450e9-0', usage_metadata={'input_tokens': 13, 'output_tokens': 10, 'total_tokens': 23, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [4]:
model.invoke(input="What is my name?")

AIMessage(content="I'm sorry, but I don't have access to your name or personal information. If you let me know your name or how you'd like to be addressed, I'd be happy to use it!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 37, 'prompt_tokens': 12, 'total_tokens': 49, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-C8Bv6sFw9HR3iK66TRuwT7vQfzXyj', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--69379621-d9ef-44a5-9502-0f89ddf208ba-0', usage_metadata={'input_tokens': 12, 'output_tokens': 37, 'total_tokens': 49, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [5]:
model.invoke(
    [
        HumanMessage(content="Hi, I am John."),
        AIMessage(content="Hello John, how can I help you today?"),
        HumanMessage(content="What is my name?"),
    ]
)

AIMessage(content='Your name is John. How can I assist you further?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 36, 'total_tokens': 48, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-C8Bv8u95HUwxwJyCbLiXAnjFMaafs', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--06d5a94f-e477-4a18-8374-fe2d5fbe2b2a-0', usage_metadata={'input_tokens': 36, 'output_tokens': 12, 'total_tokens': 48, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

### Traditional Methods

In previous versions of LangChain (prior to 0.3), memory classes were used to store and manage conversation history internally within each chain or memory object. The following classes are **deprecated** as of August 2025 (version $\geq$ 0.3), but I include them here for learning purposes:

- **ConversationBufferMemory**: Stores the entire conversation history. This is the simplest method for managing memory.
- **ConversationBufferWindowMemory**: Retains only the last $k$ messages in the conversation.
- **ConversationSummaryMemory**: Rather than storing the entire history, this class generates and retains a summary of the conversation.
- **ConversationSummaryBufferMemory**: Summarises each exchange and retains only the most recent $k$ summaries.

These classes were later used in combination with the `RunnableWithMessageHistory` class, which wraps both the chain and the memory implementation. As of LangChain version 0.3, although `RunnableWithMessageHistory` is not explicitly deprecated, it is strongly recommended to migrate to LangGraph persistence, which offers a more robust and scalable approach to managing message history.

In [6]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages=True)

  memory = ConversationBufferMemory(return_messages=True)


In [7]:
memory.chat_memory.add_user_message(message="Hi, I am John.")
memory.chat_memory.add_ai_message(message="Hello John, how can I assist you today?")
memory.chat_memory.add_user_message(message="Who won the Champions League in 2024?")
memory.chat_memory.add_ai_message(
    message="In 2024, Manchester City won the UEFA Champions League."
)

memory.load_memory_variables(
    {}
)  # Need to load variables for memory type - none in this example

{'history': [HumanMessage(content='Hi, I am John.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Hello John, how can I assist you today?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Who won the Champions League in 2024?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='In 2024, Manchester City won the UEFA Champions League.', additional_kwargs={}, response_metadata={})]}

In [8]:
from langchain.chains import ConversationChain

chain = ConversationChain(llm=model, memory=memory, verbose=True)

chain.invoke({"input": "Remind me what we spoke about earlier."})

  chain = ConversationChain(llm=model, memory=memory, verbose=True)




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, I am John.', additional_kwargs={}, response_metadata={}), AIMessage(content='Hello John, how can I assist you today?', additional_kwargs={}, response_metadata={}), HumanMessage(content='Who won the Champions League in 2024?', additional_kwargs={}, response_metadata={}), AIMessage(content='In 2024, Manchester City won the UEFA Champions League.', additional_kwargs={}, response_metadata={})]
Human: Remind me what we spoke about earlier.
AI:[0m

[1m> Finished chain.[0m


{'input': 'Remind me what we spoke about earlier.',
 'history': [HumanMessage(content='Hi, I am John.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Hello John, how can I assist you today?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Who won the Champions League in 2024?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='In 2024, Manchester City won the UEFA Champions League.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Remind me what we spoke about earlier.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Earlier, we talked about your name, John, and I mentioned that Manchester City won the UEFA Champions League in 2024. How can I assist you further?', additional_kwargs={}, response_metadata={})],
 'response': 'Earlier, we talked about your name, John, and I mentioned that Manchester City won the UEFA Champions League in 2024. How can I assist you further?'}

### Modern Methods
The official LangChain documentation strongly recommends using LangGraph persistence instead of the deprecated memory classes mentioned earlier.

- **MemorySaver**: An in-memory checkpoint class that saves the conversation state (chat messages) during execution. It is suitable for prototypes but should be replaced by persistent backends, such as SQLite or PostgreSQL, in production environments.

- **StateGraph + MessagesState**: LangGraph's state machine and schema for managing the message state. `START` is the graph's special entry node.

This approach ensures robust and scalable conversation history management, enabling multi-turn dialogues that maintain context effectively.



In [9]:
from langchain_core.messages.base import BaseMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

StateGraph is a core graph structure in LangGraph. It has to be initialised with the schema that describes the states to keep (in this case, conversation messages).

In [10]:
workflow = StateGraph(state_schema=MessagesState)

The following functions accepts the state (including the message history). It passes the messages to the model, receives the response, and returns the new state including updates messages.

In [11]:
def call_model(state: MessagesState) -> dict[str, BaseMessage]:
    response = model.invoke(state["messages"])
    return {"messages": response}

- `add_edge`: Defines the flow; after `START`, the workflow progresses to the `model` node.
- `add_node`: Registers the `model` node and links it to the `call_model()` function.

In [12]:
workflow.add_edge(start_key=START, end_key="model")
workflow.add_node(node="model", action=call_model)

<langgraph.graph.state.StateGraph at 0x12a39b8c0>

After setting edges and nodes, we add memory and compile the app. 

In [13]:
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

The `config` dictionary can contain a `configurable` key, which holds runtime parameters for LangGraph's execution. The `thread_id` value within this key identifies each conversation. If multiple usrs are chatting, each one can be assigned a unique `thread_id` (typically their user ID).

In [14]:
config = {"configurable": {"thread_id": "user_id_123"}}

Finally, the `app.invoke()` runs the LangGraph workflow using the given initial state. The `config` parameter ensures the conversation updates and is checkpointed under the specified thread ID.

In [15]:
query = "Hi, I am John."
input_messages = [HumanMessage(query)]
output = app.invoke(input={"messages": input_messages}, config=config)
output["messages"][-1].pretty_print()


Hi John! How can I assist you today?


## 5. Prompt Templates
A prompt template provides a structured way of creating inputs for language models where parts of the prompt can be dynamically changed based on context or user input.

Prompts in LangChain can be split into three components:
- **System Prompt**: Gives instructions or a personality to the LLM model. This prompt determines the behaviour or characteristics of the model.
- **User Prompt**: Input given by a user.
- **AI Prompt**: Output generated by the model.

In [16]:
from langchain_core.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
)

prompt_template = ChatPromptTemplate.from_messages(
    messages=[
        SystemMessagePromptTemplate.from_template(
            template="You are an AI receptionist at a five-star hotel. Welcome and helps guests to the best of your ability in {language}.",
            input_variables=["language"],
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

Earlier, `MessageState` contained only a single field (`messages`). To store multiple state variables, such as language, ID, or any custom matadata, we need to define a new `State` class explicitly with all required fields. It allows the graph to manage, validate, and pass around all relevant state components clearly while reducing bugs by enabling type checking.

In [17]:
from typing import Sequence
from langchain_core.messages.base import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


class State(TypedDict):  # Custom State class
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str


workflow = StateGraph(state_schema=State)  # Pass it to StateGraph


def call_model(state: State) -> dict[str, BaseMessage]:
    prompt = prompt_template.invoke(input=state)
    response = model.invoke(input=prompt)
    return {"messages": [response]}


workflow.add_edge(start_key=START, end_key="model")
workflow.add_node(node="model", action=call_model)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

config = {"configurable": {"thread_id": "A11111"}}
query = "I want to make a reservation"
language = "French"
input_messages = [HumanMessage(query)]
output = app.invoke(
    input={"messages": input_messages, "language": language}, config=config
)
output["messages"][-1].pretty_print()


Bien sûr ! Je serais ravi de vous aider à faire une réservation. Pourriez-vous me donner les détails suivants, s'il vous plaît ?

1. Vos dates d'arrivée et de départ.
2. Le type de chambre que vous préférez (simple, double, suite, etc.).
3. Le nombre de personnes qui séjourneront.
4. Toute demande spéciale ou besoin particulier.

Merci !


## 6. Trimming
Trimming is essential to prevent the conversation history from exceeding the LLM's context window, which can cause errors or reduce performance. LangChain has a built-in trimmer function `trim_messages()` that limits the number of tokens in the conversation history passed to a model, while preserving the most relevant messages according to a specified trimming strategy. It accepts several arguments, such as:
- **max_tokens**: The maximum total number of tokens allowed in the conversation history sent to the model.
- **strategy**: Determines whether the trimmer keeps the `last` (most recent) messages or the `first` (oldest) messages.
- **token_counter**: The LLM’s tokeniser function used to accurately count tokens in each message.
- **include_system**: Controls whether system messages are preserved or removed during trimming.
- **allow_partial**: Specifies whether messages can be partially trimmed by cutting off content if too long, rather than discarding entire messages.
- **start_on**: Indicates whether the conversation history should start on a `human` message or an `ai` message to preserve conversation flow.

In [18]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=80,
    strategy="last",
    token_counter=model,
    include_system=False,
    allow_partial=True,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a good assistant, you always respond with a joke."),
    HumanMessage(content="i wonder why it's called langchain"),
    AIMessage(
        content='Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
    ),
    HumanMessage(content="and who is harrison chasing anyways"),
    AIMessage(
        content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
    ),
    HumanMessage(content="what do you call a speechless parrot"),
]

trimmer.invoke(messages)

[HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

The trimmer should be placed before passing the messages to the prompt.
`trim -> prompt -> model.invoke()`

In [19]:
from langchain_core.messages.base import BaseMessage


workflow = StateGraph(state_schema=State)


def call_model(state: State) -> dict[str, list[BaseMessage]]:
    print(f"Messages before trimming: {len(state['messages'])}")
    trimmed_messages = trimmer.invoke(state["messages"])
    print(f"Messages after trimming: {len(trimmed_messages)}")
    print("Remaining messages:")
    for msg in trimmed_messages:
        print(
            f"{type(msg).__name__}: {msg.content}"
        )  # Type (System, AI, Human) : Content
    prompt = prompt_template.invoke(
        input={"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(input=prompt)
    return {"messages": [response]}


workflow.add_edge(start_key=START, end_key="model")
workflow.add_node(node="model", action=call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

config = {"configurable": {"thread_id": "C12345"}}
query = "What did I say at first?"
language = "Spanish"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    input={"messages": input_messages, "language": language}, config=config
)
output["messages"][-1].pretty_print()

Messages before trimming: 7
Messages after trimming: 4
Remaining messages:
HumanMessage: and who is harrison chasing anyways
AIMessage: Hmmm let me think.

Why, he's probably chasing after the last cup of coffee in the office!
HumanMessage: what do you call a speechless parrot
HumanMessage: What did I say at first?

Dijiste "and who is harrison chasing anyways". ¿Necesitas ayuda con algo más relacionado con eso? Estoy aquí para ayudarte.


## 7. Streaming
Streaming allows chatbots to deliver the model's output tokens immediately as they are generated, instead of waiting for the entire response to be completed before displaying anything to the user. This creates a near real-time typing or response generation effect, significantly improving perceived responsiveness and user experience.

To enable streaming in LangGraph or LangChain, instead of using `app.invoke()`, `app.stream()` should be used with the parameter `stream_mode`. This will yield tokens as they are produced by the model, allowing the application to process and display them incrementally.

In [20]:
config = {"configurable": {"thread_id": "C54321"}}
query = "Tell me a joke"
language = "English"
input_messages = [HumanMessage(query)]

for chunk, metadata in app.stream(
    input={"messages": input_messages, "language": language},
    config=config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):
        print(chunk.content, end="|")


Messages before trimming: 1
Messages after trimming: 1
Remaining messages:
HumanMessage: Tell me a joke
|Sure|!| Here's| one| for| you|:

|Why| don|’t| scientists| trust| atoms|?

|Because| they| make| up| everything|!| 

|I| hope| that| brings| a| smile| to| your| day|!| How| can| I| assist| you| further|?||

## 8. References

1. Briggs, J. (2025). *LangChain Mastery in 2025 | Full 5 Hour Course [LangChain v0.3]* [Video]. Youtube.<br>
https://www.youtube.com/watch?v=Cyv-dgv80kE

1. LangChain. (n.d.). *Build a Chatbot* [Tutorial].<br>
https://python.langchain.com/docs/tutorials/chatbot/

1. LangChain. (n.d.). *Build a simple LLM application with chat models and prompt templates* [Tutorial].<br>
https://python.langchain.com/docs/tutorials/llm_chain/