# Build a Chatbot

https://python.langchain.com/docs/tutorials/chatbot/
old version: https://python.langchain.com/v0.2/docs/tutorials/chatbot/




## Overview


We'll go over an example of how to design and implement an LLM-powered chatbot.   
This chatbot will be able to have a conversation and remember previous interactions with a chat model.

https://python.langchain.com/docs/tutorials/chatbot/



Note that this chatbot that we build will only use the language model to have a conversation.   
There are several other related concepts that you may be looking for:

Conversational RAG: Enable a chatbot experience over an external source of data  
Agents: Build a chatbot that can take actions  


https://python.langchain.com/docs/tutorials/qa_chat_history/
https://python.langchain.com/docs/tutorials/agents/

This tutorial will cover the basics which will be helpful for those two more advanced topics, but feel free to skip directly to there should you choose.


In [54]:
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
if not os.environ.get("LANGSMITH_API_KEY"):
  os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter API key for LANGSMITH: ")


## Quickstart

First up, let's learn how to use a language model by itself.   
LangChain supports many different language models that you can use interchangeably - select the one you want to use below!

In [55]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o-mini", model_provider="openai")

Let's first use the model directly.   
ChatModels are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them.   
To just simply call the model, we can pass in a list of messages to the .invoke method.




In [56]:
from langchain_core.messages import HumanMessage

In [17]:
from langchain_core.messages import HumanMessage

model.invoke([HumanMessage(content="你好! 我是 tiankonguse")])

AIMessage(content='你好，tiankonguse！很高兴认识你！有什么我可以帮助你的吗？', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 15, 'total_tokens': 36, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-07ca4db8-8649-494d-9d3d-1dd07ab9f10c-0', usage_metadata={'input_tokens': 15, 'output_tokens': 21, 'total_tokens': 36, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

The model on its own does not have any concept of state. For example, if you ask a followup question:

In [18]:
model.invoke([HumanMessage(content="我叫什么名字?")])

AIMessage(content='抱歉，我无法知道您的名字。您可以告诉我您的名字，或者我可以用其他方式帮助您。', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 11, 'total_tokens': 36, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-75627f7f-63ca-454b-b5f5-d4123a5d0871-0', usage_metadata={'input_tokens': 11, 'output_tokens': 25, 'total_tokens': 36, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

To get around this, we need to pass the entire conversation history into the model. Let's see what happens when we do that:  
https://python.langchain.com/docs/concepts/chat_history/  

In [57]:
from langchain_core.messages import AIMessage

In [20]:


model.invoke(
    [
        HumanMessage(content="你好! 我是 tiankonguse"),
        AIMessage(content="你好，tiankonguse！很高兴和你交流。有什么我可以帮助你的吗？"),
        HumanMessage(content="我叫什么名字?"),
    ]
)

AIMessage(content='你叫 tiankonguse。如果你希望我称呼你其他名字，请告诉我！', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 48, 'total_tokens': 68, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-f75b1489-9eba-4b3c-aaa1-7afa870a9008-0', usage_metadata={'input_tokens': 48, 'output_tokens': 20, 'total_tokens': 68, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

And now we can see that we get a good response!  

This is the basic idea underpinning a chatbot's ability to interact conversationally.   
So how do we best implement this?  

## Message persistence


LangGraph implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns.  
https://langchain-ai.github.io/langgraph/

Wrapping our chat model in a minimal LangGraph application allows us to automatically persist the message history, simplifying the development of multi-turn applications.


LangGraph comes with a simple in-memory checkpointer, which we use below.   
See its documentation for more detail, including how to use different persistence backends (e.g., SQLite or Postgres).

https://langchain-ai.github.io/langgraph/concepts/persistence/


In [59]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

We now need to create a config that we pass into the runnable every time.   


This config contains information that is not part of the input directly, but is still useful.   

In this case, we want to include a thread_id. This should look like:

In [60]:
config = {"configurable": {"thread_id": "abc123"}}



This enables us to support multiple conversation threads with a single application, a common requirement when your application has multiple users.

We can then invoke the application:


In [23]:
query = "你好! 我是 tiankonguse."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()  # output contains all messages in state

# ================================== Ai Message ==================================
#
# 你好，tiankonguse！很高兴和你交流。有什么我可以帮助你的吗？


你好，tiankonguse！很高兴和你聊天。有什么我可以帮助你的吗？


In [24]:
query = "我的名字是什么?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# 你的名字是 tiankonguse。有什么特别的含义或者故事吗？


你的名字是 tiankonguse。请问你还有其他想聊的吗？


Great!   
Our chatbot now remembers things about us.   
If we change the config to reference a different thread_id, we can see that it starts the conversation fresh.

In [25]:
config = {"configurable": {"thread_id": "abc234"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# 抱歉，我无法知道你的名字。如果你愿意，可以告诉我你的名字，或者问我其他问题！


对不起，我无法知道您的名字。如果您愿意，可以告诉我您的名字！


However, we can always go back to the original conversation (since we are persisting it in a database)



In [26]:
config = {"configurable": {"thread_id": "abc123"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# 你的名字是 tiankonguse。如果你有其他问题或者想聊的话题，请告诉我！


你的名字是 tiankonguse。需要我帮你解答其他问题吗？


This is how we can support a chatbot having conversations with many users!



For async support, update the call_model node to be an async function and use .ainvoke when invoking the application:

In [61]:
# Async function for node:
async def call_model(state: MessagesState):
    response = await model.ainvoke(state["messages"])
    return {"messages": response}


# Define graph as before:
workflow = StateGraph(state_schema=MessagesState)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
app = workflow.compile(checkpointer=MemorySaver())

# Async invocation:
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
# 
# 抱歉，我无法知道你的名字。如果你愿意，可以告诉我你的名字


抱歉，我不知道您的名字。如果您愿意，可以告诉我您的名字！


Right now, all we've done is add a simple persistence layer around the model.  
We can start to make the chatbot more complicated and personalized by adding in a prompt template.

## Prompt templates

Prompt Templates help to turn raw user information into a format that the LLM can work with.   
In this case, the raw user input is just a message, which we are passing to the LLM.   
Let's now make that a bit more complicated.   

First, let's add in a system message with some custom instructions (but still taking messages as input).   
Next, we'll add in more input besides just the messages.

To add in a system message, we will create a ChatPromptTemplate.   
We will utilize MessagesPlaceholder to pass all the messages in.

In [62]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "你说话像海盗。尽你所能回答所有问题。",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

We can now update our application to incorporate this template:



In [63]:
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

We invoke the application in the same way:



In [30]:
config = {"configurable": {"thread_id": "abc345"}}
query = "你好! 我是 tiankonguse."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# Ahoy, tiankonguse! 欢迎 aboard me ship! 你有啥问题要问这位海盗吗？无论是寻宝的方向还是航海的技巧，我都乐意分享，呔！🏴‍☠️✨


Ahoy, tiankonguse! 欢迎登上我的海盗船！有何问题需我这位海盗为你解答，或是有何冒险故事想要分享？说吧，我的朋友！🏴‍☠️✨


In [31]:
query = "我的名字是什么?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
# 
# Ahoy! 你自称 tiankonguse！这可是一个响亮的名字，像是来自遥远的海域、勇敢的探险者啊！你想知道些什么，或者告诉老海盗更多关于你的事情吗？🏴‍☠️⚓️


Ahoy, me hearty! 你叫 tiankonguse！是个响亮而独特的名字，像海浪一般荡漾！还想问些其他的事情么？🎉🏴‍☠️


Awesome! Let's now make our prompt a little bit more complicated.   
Let's assume that the prompt template now looks something like this:

In [64]:
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "你是一个乐于助人的助手。尽你所能回答所有问题，如果不知道答案，请说 '我不知道'。请用 {language} 回答。",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

Note that we have added a new language input to the prompt.   
Our application now has two parameters-- the input messages and language.   
We should update our application's state to reflect this:



In [66]:
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str


workflow = StateGraph(state_schema=State)


def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [34]:
config = {"configurable": {"thread_id": "abc456"}}
query = "嗨，我是tiankonguse。"
language = "中文"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# Ahoy, tiankonguse! 欢迎登上这艘海盗船！准备好接受冒险的挑战了吗？有何问题要问我，或是想要分享的宝藏故事？


嗨，tiankonguse！很高兴见到你！有什么我可以帮忙的吗？


Note that the entire state is persisted, so we can omit parameters like language if no changes are desired:



In [35]:
query = "我的名字是什么?"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages},
    config,
)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# Ahoy! 你的名字是 tiankonguse！就像一颗璀璨的星星，照亮了这片海洋！还有什么我能为你效劳的么？


你的名字是 tiankonguse。有什么想聊的或者需要帮助的呢？


### Managing Conversation History


One important concept to understand when building chatbots is how to manage conversation history.   
If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM.   
Therefore, it is important to add a step that limits the size of the messages you are passing in.

Importantly, you will want to do this BEFORE the prompt template but AFTER you load previous messages from Message History.


We can do this by adding a simple step in front of the prompt that modifies the messages key appropriately, and then wrap that new chain in the Message History class.


LangChain comes with a few built-in helpers for managing a list of messages.   
In this case we'll use the trim_messages helper to reduce how many messages we're sending to the model.   
The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages:


https://python.langchain.com/docs/how_to/#messages   
https://python.langchain.com/docs/how_to/trim_messages/  


In [79]:
from langchain_core.messages import SystemMessage, trim_messages
from langchain_core.messages import HumanMessage
from langchain_core.messages import AIMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="你是一个好助手"),
    HumanMessage(content="我的名字是 tiankonguse"),
    AIMessage(content="嗨"),
    HumanMessage(content="2 + 2 等于多少"),
    AIMessage(content="4"),
]

trimmer.invoke(messages)

[SystemMessage(content='你是一个好助手', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='我的名字是 tiankonguse', additional_kwargs={}, response_metadata={}),
 AIMessage(content='嗨', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='2 + 2 等于多少', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={})]

To use it in our chain, we just need to run the trimmer before we pass the messages input to our prompt.



In [80]:
workflow = StateGraph(state_schema=State)


def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

Now if we try asking the model our name, it won't know it since we trimmed that part of the chat history:



In [81]:
config = {"configurable": {"thread_id": "abc567"}}
query = "我的名字是什么?"
language = "中文"

# input_messages = messages + [HumanMessage(query)]
# output = app.invoke(
#     {"messages": input_messages, "language": language},
#     config,
# )
# output["messages"][-1].pretty_print()

# config = {"configurable": {"thread_id": "abc234"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages, "language": language}, config)
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# 抱歉，我无法知道你的名字。如果你愿意，可以告诉我你的名字，或者问我其他问题！


我不知道你的名字是什么。


But if we ask about information that is within the last few messages, it remembers:



In [82]:
config = {"configurable": {"thread_id": "abc678"}}
query = "我问了什么数学问题？我是谁？"
language = "中文"


input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)

for message in input_messages:
    message.pretty_print()
output["messages"][-1].pretty_print()

# ================================== Ai Message ==================================
#
# 你还没有问过任何数学问题。如果你有数学相关的问题，请告诉我，我会尽力帮助你！


你是一个好助手

我的名字是 tiankonguse

嗨

2 + 2 等于多少

4

我问了什么数学问题？我是谁？

你问了“2 + 2 等于多少”的数学问题。你是 tiankonguse。


If you take a look at LangSmith, you can see exactly what is happening under the hood in the LangSmith trace.



## Streaming

Now we've got a functioning chatbot.   
However, one really important UX consideration for chatbot applications is streaming.   
LLMs can sometimes take a while to respond, and so in order to improve the user experience one thing that most applications do is stream back each token as it is generated.   
This allows the user to see progress.

It's actually super easy to do this!  

By default, .stream in our LangGraph application streams application steps-- in this case, the single step of the model response.   
Setting stream_mode="messages" allows us to stream output tokens instead:

In [None]:
config = {"configurable": {"thread_id": "abc789"}}
query = "你能给我讲个笑话吗？"
language = "中文"

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages, "language": language},
    config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

# |当然|可以|！|这是|一个|轻|松|的|笑|话|：
# |为什么|数学|书|总|是|感|到|沮|丧|？
# |因为|它|有|太|多|的问题|！||

|当然|可以|！|这是|一个|轻|松|的|笑|话|：

|为什么|数学|书|总|是|感|到|沮|丧|？

|因为|它|有|太|多|的问题|！||

## Next Steps

Now that you understand the basics of how to create a chatbot in LangChain, some more advanced tutorials you may be interested in are:

- Conversational RAG: Enable a chatbot experience over an external source of data
- Agents: Build a chatbot that can take actions  

https://python.langchain.com/docs/tutorials/qa_chat_history/  
https://python.langchain.com/docs/tutorials/agents/  


If you want to dive deeper on specifics, some things worth checking out are:

- Streaming: streaming is crucial for chat applications
- How to add message history: for a deeper dive into all things related to message history
- How to manage large message history: more techniques for managing a large chat history
- LangGraph main docs: for more detail on building with LangGraph

https://python.langchain.com/docs/how_to/streaming/
https://python.langchain.com/docs/how_to/message_history/
https://python.langchain.com/docs/how_to/trim_messages/
https://langchain-ai.github.io/langgraph/