# [How to stream runnables](https://python.langchain.com/docs/how_to/streaming/)

Streaming is critical in making applications based on LLMs feel responsive to end-users.

Important LangChain primitives like `chat models`, `output parsers`, `prompts`, `retrievers`, and `agents` implement the LangChain `Runnable Interface`.

This interface provides two general approaches to stream content:

1. sync `stream` and async `astream`: a default implementation of streaming that streams the final output from the chain.
2. async astream_events and async astream_log: these provide a way to stream both intermediate steps and final output from the chain.

Let's take a look at both approaches, and try to understand how to use them.

## Using Stream
All `Runnable` objects implement a sync method called `stream` and an async variant called `astream`.

These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.

Streaming is only possible if all steps in the program know how to process an input stream; i.e., process an input chunk one at a time, and yield a corresponding output chunk.

The complexity of this processing can vary, from straightforward tasks like emitting tokens produced by an LLM, to more challenging ones like streaming parts of JSON results before the entire JSON is complete.

The best place to start exploring streaming is with the single most important components in LLMs apps-- the LLMs themselves!

## LLMs and Chat Models
Large language models and their chat variants are the primary bottleneck in LLM based apps.

Large language models can take several seconds to generate a complete response to a query. This is far slower than the ~200-300 ms threshold at which an application feels responsive to an end user.

The key strategy to make the application feel more responsive is to show intermediate progress; viz., to stream the output from the model token by token.

In [3]:
from dotenv import load_dotenv
load_dotenv()

from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini")

Let's start with the sync stream API:

In [4]:
chunks = []
for chunk in model.stream("what color is the sky?"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

|The| color| of| the| sky| can| vary| depending| on| several| factors|,| including| the| time| of| day|,| weather| conditions|,| and| atmospheric| particles|.| During| a| clear| day|,| the| sky| typically| appears| blue| due| to| Ray|leigh| scattering|,| where| shorter| blue| wavelengths| of| sunlight| are| scattered| more| than| other| colors|.| At| sunrise| and| sunset|,| the| sky| can| display| a| range| of| colors|,| including| orange|,| pink|,| and| red|,| due| to| the| longer| path| of| sunlight| through| the| atmosphere|,| which| scat|ters| shorter| wavelengths| and| allows| longer| wavelengths| to| become| more| prominent|.| On| cloudy| or| over|cast| days|,| the| sky| may| appear| gray|.||

Alternatively, if you're working in an async environment, you may consider using the async astream API:

In [5]:
chunks = []
async for chunk in model.astream("what color is the sky?"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

|The| color| of| the| sky| appears| blue| during| the| day| due| to| a| phenomenon| called| Ray|leigh| scattering|,| where| shorter| wavelengths| of| light| (|blue| and| violet|)| scatter| more| than| longer| wavelengths| (|red| and| yellow|).| However|,| the| sky| can| also| appear| in| different| colors| at| sunrise| and| sunset|,| displaying| hues| of| orange|,| pink|,| and| red|.| At| night|,| the| sky| can| appear| black| or| dark| blue|,| dotted| with| stars|.| Weather| conditions|,| pollution|,| and| other| factors| can| also| affect| the| sky|'s| color|.||

In [6]:
chunks

[AIMessageChunk(content='', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content='The', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' color', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' of', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' the', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' sky', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' appears', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMessageChunk(content=' blue', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad'),
 AIMess

We got back something called an `AIMessageChunk`. This chunk represents a part of an `AIMessage`.

Message chunks are additive by design -- one can simply add them up to get the state of the response so far!

In [7]:
chunks[0] + chunks[1] + chunks[2] + chunks[3] + chunks[4]

AIMessageChunk(content='The color of the', additional_kwargs={}, response_metadata={}, id='run-aba33435-f21d-46a4-88e8-a00ae701d4ad')

## Chains
Virtually all LLM applications involve more steps than just a call to a language model.

Let's build a simple chain using `LangChain Expression Language (LCEL)` that combines a prompt, model and a parser and verify that streaming works.

We will use `StrOutputParser` to parse the output from the model. This is a simple parser that extracts the `content` field from an `AIMessageChunk`, giving us the `token` returned by the model.

```
LCEL is a declarative way to specify a "program" by chainining together different LangChain primitives. Chains created using LCEL benefit from an automatic implementation of stream and astream allowing streaming of the final output. In fact, chains created with LCEL implement the entire standard Runnable interface.
```

In [10]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
parser = StrOutputParser()
chain = prompt | model | parser

async for chunk in chain.astream({"topic": "parrot"}):
    print(chunk, end="|", flush=True)

|Why| did| the| par|rot| wear| a| rain|coat|?

|Because| it| wanted| to| be| a| poly|-uns|aturated| bird|!| 🦜|☔|️||

Note that we're getting streaming output even though we're using `parser` at the end of the chain above. The `parser` operates on each streaming chunk individidually. Many of the `LCEL primitives` also support this kind of transform-style passthrough streaming, which can be very convenient when constructing apps.

Custom functions can be `designed to return generators`, which are able to operate on streams.

Certain runnables, like `prompt templates` and `chat models`, cannot process individual chunks and instead aggregate all previous steps. Such runnables can interrupt the streaming process.


The LangChain Expression language allows you to separate the construction of a chain from the mode in which it is used (e.g., sync/async, batch/streaming etc.). If this is not relevant to what you're building, you can also rely on a standard imperative programming approach by caling invoke, batch or stream on each component individually, assigning the results to variables and then using them downstream as you see fit.

## Working with Input Streams
What if you wanted to stream JSON from the output as it was being generated?

If you were to rely on json.loads to parse the partial json, the parsing would fail as the partial json wouldn't be valid json.

You'd likely be at a complete loss of what to do and claim that it wasn't possible to stream JSON.

Well, turns out there is a way to do it -- the parser needs to operate on the input stream, and attempt to "auto-complete" the partial json into a valid state.

Let's see such a parser in action to understand what this means.

In [11]:
from langchain_core.output_parsers import JsonOutputParser

chain = (
    model | JsonOutputParser()
)  # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models
async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`"
):
    print(text, flush=True)

{}
{'countries': []}
{'countries': [{}]}
{'countries': [{'name': ''}]}
{'countries': [{'name': 'France'}]}
{'countries': [{'name': 'France', 'population': 652}]}
{'countries': [{'name': 'France', 'population': 652735}]}
{'countries': [{'name': 'France', 'population': 65273511}]}
{'countries': [{'name': 'France', 'population': 65273511}, {}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain'}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain', 'population': 467}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain', 'population': 467547}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain', 'population': 46754778}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain', 'population': 46754778}, {}]}
{'countries': [{'name': 'France', 'population': 65273511}, {'name': 'Spain', 'population': 467

Now, let's break streaming. We'll use the previous example and append an extraction function at the end that extracts the country names from the finalized JSON.

#### Warning
Any steps in the chain that operate on finalized inputs rather than on input streams can break streaming functionality via stream or astream.

Later, we will discuss the astream_events API which streams results from intermediate steps. This API will stream results from intermediate steps even if the chain contains steps that only operate on finalized inputs.



In [15]:
from langchain_core.output_parsers import (
    JsonOutputParser,
)


# A function that operates on finalized inputs
# rather than on an input_stream
def _extract_country_names(inputs):
    """A function that does not operates on input streams and breaks streaming."""
    if not isinstance(inputs, dict):
        return ""

    if "countries" not in inputs:
        return ""

    countries = inputs["countries"]

    if not isinstance(countries, list):
        return ""

    country_names = [
        country.get("name") for country in countries if isinstance(country, dict)
    ]
    return country_names


chain = model | JsonOutputParser() | _extract_country_names

async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`"
):
    print(text, end="|", flush=True)

['France', 'Spain', 'Japan']|

### Generator Functions
Let's fix the streaming using a generator function that can operate on the input stream.

#### tip
A generator function (a function that uses yield) allows writing code that operates on input streams

In [20]:
from langchain_core.output_parsers import JsonOutputParser


async def _extract_country_names_streaming(input_stream):
    """A function that operates on input streams."""
    country_names_so_far = set()

    async for input in input_stream:
        if not isinstance(input, dict):
            continue

        if "countries" not in input:
            continue

        countries = input["countries"]

        if not isinstance(countries, list):
            continue

        for country in countries:
            name = country.get("name")
            if not name:
                continue
            if name not in country_names_so_far:
                yield name
                country_names_so_far.add(name)


chain = model | JsonOutputParser() | _extract_country_names_streaming

async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`",
):
    print(text, end="|", flush=True)

France|Spain|Japan|

#### note
Because the code above is relying on JSON auto-completion, you may see partial names of countries (e.g., Sp and Spain), which is not what one would want for an extraction result!

We're focusing on streaming concepts, not necessarily the results of the chains.

## Non-streaming components
Some built-in components like Retrievers do not offer any streaming. What happens if we try to stream them? 🤨

In [21]:
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
    ["harrison worked at kensho", "harrison likes spicy food"],
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

chunks = [chunk for chunk in retriever.stream("where did harrison work?")]
chunks

[[Document(metadata={}, page_content='harrison worked at kensho'),
  Document(metadata={}, page_content='harrison likes spicy food')]]

Stream just yielded the final result from that component.

This is OK 🥹! Not all components have to implement streaming -- in some cases streaming is either unnecessary, difficult or just doesn't make sense.

### tip
An LCEL chain constructed using non-streaming components, will still be able to stream in a lot of cases, with streaming of partial output starting after the last non-streaming step in the chain.

In [22]:
retrieval_chain = (
    {
        "context": retriever.with_config(run_name="Docs"),
        "question": RunnablePassthrough(),
    }
    | prompt
    | model
    | StrOutputParser()
)

In [23]:
for chunk in retrieval_chain.stream(
    "Where did harrison work? " "Write 3 made up sentences about this place."
):
    print(chunk, end="|", flush=True)

|H|arrison| worked| at| Kens|ho|.| Kens|ho| is| a| vibrant| tech| company| known| for| its| innovative| approach| to| data| analysis|.| The| office| is| filled| with| bright| colors| and| open| spaces|,| fostering| creativity| and| collaboration| among| the| team|.| Employees| often| enjoy| team| lunches| featuring| diverse| cuisines|,| reflecting| their| appreciation| for| global| flavors|.||

Now that we've seen how stream and astream work, let's venture into the world of streaming events.

# Using Stream Events

In [25]:
import langchain_core
langchain_core.__version__

'0.3.10'

For the `astream_events` API to work properly:

* Use `async` throughout the code to the extent possible (e.g., async tools etc)
* Propagate callbacks if defining custom functions / runnables
* Whenever using runnables without LCEL, make sure to call `.astream()` on LLMs rather than `.ainvoke` to force the LLM to stream tokens.

### Event Reference
Below is a reference table that shows some events that might be emitted by the various Runnable objects.

#### note
When streaming is implemented properly, the inputs to a runnable will not be known until after the input stream has been entirely consumed. This means that inputs will often be included only for end events and rather than for start events.

## Chat Model
Let's start off by looking at the events produced by a chat model.

In [26]:
events = []
async for event in model.astream_events("hello", version="v2"):
    events.append(event)

In [27]:
events[:3]

[{'event': 'on_chat_model_start',
  'data': {'input': 'hello'},
  'name': 'ChatOpenAI',
  'tags': [],
  'run_id': '2153b57f-1920-41a7-99fc-6b4fa1a3ff75',
  'metadata': {'ls_provider': 'openai',
   'ls_model_name': 'gpt-4o-mini',
   'ls_model_type': 'chat',
   'ls_temperature': 0.7},
  'parent_ids': []},
 {'event': 'on_chat_model_stream',
  'run_id': '2153b57f-1920-41a7-99fc-6b4fa1a3ff75',
  'name': 'ChatOpenAI',
  'tags': [],
  'metadata': {'ls_provider': 'openai',
   'ls_model_name': 'gpt-4o-mini',
   'ls_model_type': 'chat',
   'ls_temperature': 0.7},
  'data': {'chunk': AIMessageChunk(content='', additional_kwargs={}, response_metadata={}, id='run-2153b57f-1920-41a7-99fc-6b4fa1a3ff75')},
  'parent_ids': []},
 {'event': 'on_chat_model_stream',
  'run_id': '2153b57f-1920-41a7-99fc-6b4fa1a3ff75',
  'name': 'ChatOpenAI',
  'tags': [],
  'metadata': {'ls_provider': 'openai',
   'ls_model_name': 'gpt-4o-mini',
   'ls_model_type': 'chat',
   'ls_temperature': 0.7},
  'data': {'chunk': AIMe

## Chain
Let's revisit the example chain that parsed streaming JSON to explore the streaming events API.

In [29]:
chain = (
    model | JsonOutputParser()
)  # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models

events = [
    event
    async for event in chain.astream_events(
        "output a list of the countries france, spain and japan and their populations in JSON format. "
        'Use a dict with an outer key of "countries" which contains a list of countries. '
        "Each country should have the key `name` and `population`",
        version="v2",
    )
]

If you examine at the first few events, you'll notice that there are 3 different start events rather than 2 start events.

The three start events correspond to:

1. The chain (model + parser)
2. The model
3. The parser

In [30]:
events[:3]

[{'event': 'on_chain_start',
  'data': {'input': 'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`'},
  'name': 'RunnableSequence',
  'tags': [],
  'run_id': '36450748-7210-4ed4-8ebc-7ace2eb9a299',
  'metadata': {},
  'parent_ids': []},
 {'event': 'on_chat_model_start',
  'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of countries. Each country should have the key `name` and `population`', additional_kwargs={}, response_metadata={})]]}},
  'name': 'ChatOpenAI',
  'tags': ['seq:step:1'],
  'run_id': '29e5e39e-d534-4b9c-92ca-0a490f531071',
  'metadata': {'ls_provider': 'openai',
   'ls_model_name': 'gpt-4o-mini',
   'ls_model_type': 'chat',
   'l

In [31]:
events[-2:]

[{'event': 'on_parser_end',
  'data': {'output': {'countries': [{'name': 'France', 'population': 65273511},
     {'name': 'Spain', 'population': 46754778},
     {'name': 'Japan', 'population': 126476461}]},
   'input': AIMessageChunk(content='Here is the JSON representation of the countries France, Spain, and Japan along with their populations:\n\n```json\n{\n  "countries": [\n    {\n      "name": "France",\n      "population": 65273511\n    },\n    {\n      "name": "Spain",\n      "population": 46754778\n    },\n    {\n      "name": "Japan",\n      "population": 126476461\n    }\n  ]\n}\n```\n\nPlease note that the population figures are based on estimates and may vary over time.', additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0ba0d124f1'}, id='run-29e5e39e-d534-4b9c-92ca-0a490f531071')},
  'run_id': '644158af-b75a-494c-ad3d-02391a38c679',
  'name': 'JsonOutputParser',
  'tags': ['seq:step:2'],
  'me

What do you think you'd see if you looked at the last 3 events? what about the middle?

Let's use this API to take output the stream events from the model and the parser. We're ignoring start events, end events and events from the chain.

In [34]:
num_events = 0

async for event in chain.astream_events(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`",
    version="v2",
):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        print(
            f"Chat model chunk: {repr(event['data']['chunk'].content)}",
            flush=True,
        )
    if kind == "on_parser_stream":
        print(f"Parser chunk: {event['data']['chunk']}", flush=True)
    num_events += 1
    if num_events > 50:
        # Truncate the output
        print("...")
        break

Chat model chunk: ''
Chat model chunk: 'Here'
Chat model chunk: ' is'
Chat model chunk: ' the'
Chat model chunk: ' JSON'
Chat model chunk: ' representation'
Chat model chunk: ' of'
Chat model chunk: ' the'
Chat model chunk: ' countries'
Chat model chunk: ' France'
Chat model chunk: ','
Chat model chunk: ' Spain'
Chat model chunk: ','
Chat model chunk: ' and'
Chat model chunk: ' Japan'
Chat model chunk: ' along'
Chat model chunk: ' with'
Chat model chunk: ' their'
Chat model chunk: ' populations'
Chat model chunk: ':\n\n'
Chat model chunk: '```'
Chat model chunk: 'json'
Chat model chunk: '\n'
Chat model chunk: '{\n'
Parser chunk: {}
Chat model chunk: ' '
Chat model chunk: ' "'
Chat model chunk: 'countries'
Chat model chunk: '":'
Chat model chunk: ' [\n'
Parser chunk: {'countries': []}
Chat model chunk: '   '
Chat model chunk: ' {\n'
Parser chunk: {'countries': [{}]}
Chat model chunk: '     '
Chat model chunk: ' "'
Chat model chunk: 'name'
Chat model chunk: '":'
Chat model chunk: ' "'
Pa