#### LangChain Essentials Course

# Streaming With Langchain

LangChain is one of the most popular open source libraries for AI Engineers. It's goal is to abstract away the complexity in building AI software, provide easy-to-use building blocks, and make it easier when switching between AI service providers.

In this example, we will introduce LangChain's Async Streaming, collecting data gradually and understanding how it will work. We'll provide examples for both OpenAI's `gpt-4o-mini` *and* Meta's `llama3.2` via Ollama!

## Choosing your Model

---

> ⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, please see the [Ollama version](https://github.com/aurelio-labs/langchain-course/blob/main/notebooks/ollama/06-lcel-ollama.ipynb) of this example.

---

For the LLM, we'll start by initializing our connection to the OpenAI API. We do need an OpenAI API key, which you can get from the [OpenAI platform](https://platform.openai.com/api-keys).

We will use the `gpt-4o-mini` model with a `temperature` of `0.0`:

In [2]:
import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
    or getpass("Enter your OpenAI API key: ")

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    temperature=0.0,
    streaming=True
)

In [3]:
llm_out = llm.invoke("Hello there")
llm_out

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_bd83329f63'}, id='run-3e8340ed-f353-4bf9-b3b0-12431bb6e993-0')

### Streaming Chunks

We will start by creating a aysnc stream from our LLM, we will do this within a for loop so that we can print each chunk as it is being generated. Whilst we do this we are also printing a pipe or '|' to the end of each chunk, and by setting 'flush' equal to True, it ensures that the output is immediately written to the console.

In [70]:
chunks = []
async for chunk in llm.astream("What does NLP mean?"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

|N|LP| stands| for| Natural| Language| Processing|.| It| is| a| field| of| artificial| intelligence| (|AI|)| that| focuses| on| the| interaction| between| computers| and| humans| through| natural| language|.| The| goal| of| NLP| is| to| enable| computers| to| understand|,| interpret|,| and| generate| human| language| in| a| way| that| is| both| meaningful| and| useful|.| This| involves| various| tasks| such| as| language| translation|,| sentiment| analysis|,| text| summar|ization|,| speech| recognition|,| and| more|.| NLP| combines| techniques| from| lingu|istics|,| computer| science|,| and| machine| learning| to| process| and| analyze| large| amounts| of| natural| language| data|.||

Since we appended each chunk to the 'chunks' list, we can also see what is inside each and every chunk. 

In [5]:
chunks[0]

AIMessageChunk(content='', additional_kwargs={}, response_metadata={}, id='run-5763c463-7173-429a-8d35-b26af354a507')

In [6]:
chunks[0] + chunks[1] + chunks[2] + chunks[3] + chunks[4]

AIMessageChunk(content="Hello! I'm an", additional_kwargs={}, response_metadata={}, id='run-5763c463-7173-429a-8d35-b26af354a507')

In [71]:
chunks[0] + chunks[3]

AIMessageChunk(content=' stands', additional_kwargs={}, response_metadata={}, id='run-1893abaf-281c-4ce5-9c92-112c8064c3eb')

Now we will make some tools to use on a async agent executor to show that functions do not stream the process and only the output, meaning if you are doing an entire LLM search within these functions, the entire output will be shown rather then chunks on an output.

In [7]:
from langchain_core.tools import tool

@tool
def add(x: float, y: float) -> float:
    """Add 'x' and 'y'."""
    return x + y

@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' and 'y'."""
    return x * y

@tool
def exponentiate(x: float, y: float) -> float:
    """Raise 'x' to the power of 'y'."""
    return x ** y

@tool
def subtract(x: float, y: float) -> float:
    """Subtract 'x' from 'y'."""
    return y - x

We will create a prompt to use.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "you're a helpful assistant"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

Now we will generating outputs for the tools to the agent, this is also where we connect the llm and the prompt.

In [9]:
from langchain.agents import create_tool_calling_agent

tools = [add, multiply, exponentiate, subtract]

agent = create_tool_calling_agent(
    llm=llm, tools=tools, prompt=prompt
)

Now we want to connect the tool exectutional functionality to the overseer agent.

In [10]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
)

Now we can use the astream function as opposed to the invoke function, this allows the output of the function to be shown to us as chunks

In [65]:
import pprint

chunks = []

async for chunk in agent_executor.astream(
    {"input": "in the following order, what is 5+5, * 10, to the power of 3, minus 99?"}
):
    chunks.append(chunk)
    print("------")
    pprint.pprint(chunk, depth=1)

------
{'actions': [...], 'messages': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...],
 'output': "Let's break down the calculations step by step:\n"
           '\n'
           '1. **5 + 5 = 10**\n'
           '2. **10 * 10 = 100**\n'
           '3. **10 to the power of 3 = 1000**\n'
           '4. **1000 - 99 = 901**\n'
           '\n'
           'So, t

Inside of each chunk we can see different attributes credited to each chunk.

In [66]:
chunks[0]["actions"]

[ToolAgentAction(tool='add', tool_input={'x': 5, 'y': 5}, log="\nInvoking: `add` with `{'x': 5, 'y': 5}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_5LY91QynPiCypmkWL6t95m8p', 'function': {'arguments': '{"x": 5, "y": 5}', 'name': 'add'}, 'type': 'function'}, {'index': 1, 'id': 'call_e7O8LAZf32zNU5ThM6uUMwx3', 'function': {'arguments': '{"x": 10, "y": 10}', 'name': 'multiply'}, 'type': 'function'}, {'index': 2, 'id': 'call_ZVGzvK1x2egCZRAA08ewHPK7', 'function': {'arguments': '{"x": 10, "y": 3}', 'name': 'exponentiate'}, 'type': 'function'}, {'index': 3, 'id': 'call_xO1zz7Jsl6PATzJW5ttMLh0O', 'function': {'arguments': '{"x": 99, "y": 99}', 'name': 'subtract'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_bd83329f63'}, id='run-0a931944-a303-48a2-9cbe-dfe323593d0c', tool_calls=[{'name': 'add', 'args': {'x': 5, 'y': 5}, 'id': 'call_5LY9

Within the 'messages' attribute we can see the content of the output.

In [67]:
for chunk in chunks:
    print(chunk["messages"])

[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_5LY91QynPiCypmkWL6t95m8p', 'function': {'arguments': '{"x": 5, "y": 5}', 'name': 'add'}, 'type': 'function'}, {'index': 1, 'id': 'call_e7O8LAZf32zNU5ThM6uUMwx3', 'function': {'arguments': '{"x": 10, "y": 10}', 'name': 'multiply'}, 'type': 'function'}, {'index': 2, 'id': 'call_ZVGzvK1x2egCZRAA08ewHPK7', 'function': {'arguments': '{"x": 10, "y": 3}', 'name': 'exponentiate'}, 'type': 'function'}, {'index': 3, 'id': 'call_xO1zz7Jsl6PATzJW5ttMLh0O', 'function': {'arguments': '{"x": 99, "y": 99}', 'name': 'subtract'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_bd83329f63'}, id='run-0a931944-a303-48a2-9cbe-dfe323593d0c', tool_calls=[{'name': 'add', 'args': {'x': 5, 'y': 5}, 'id': 'call_5LY91QynPiCypmkWL6t95m8p', 'type': 'tool_call'}, {'name': 'multiply', 'args': {'x': 10, 'y': 10}, 'id': 'call_e7O8LAZf32zNU5ThM6u

Here we can also see the astream events in action, this shows the chunks within each function, when it starts, ends, produces an output etc...

In [68]:
import asyncio
from typing import Any, Dict, List, Optional

from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
from langchain_core.messages import HumanMessage, BaseMessage
from langchain_core.outputs import LLMResult
from langchain_openai import ChatOpenAI

import uuid

class MyCustomSyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")


class MyCustomAsyncHandler(AsyncCallbackHandler):
    """Async callback handler that can be used to handle callbacks from langchain."""

    async def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Run when chain starts running."""
        await asyncio.sleep(0.3)
        class_name = serialized["name"]
        yield "Hi! I just woke up. Your llm is starting"

    async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Run when chain ends running."""
        await asyncio.sleep(0.3)
        yield "Hi! I just woke up. Your llm is ending"


# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(
    max_tokens=25,
    streaming=True,
    callbacks=[MyCustomAsyncHandler()],
)

response = chat.astream([HumanMessage(content="Tell me a joke")])

async for token in response:
    print(token)


content='' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content='Why' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content=' don' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content="'t" additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content=' scientists' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content=' trust' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content=' atoms' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content='?\n\n' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content='Because' additional_kwargs={} response_metadata={} id='run-53e12577-999c-491e-937f-26a9444a3f94'
content=' they' additional_kwargs={} response_metadata={} id='run