# [How to track token usage in ChatModels](https://python.langchain.com/v0.2/docs/how_to/chat_token_usage_tracking/)

Using AIMessage.usage_metadata
A number of model providers return token usage information as part of the chat generation response. When available, this information will be included on the AIMessage objects produced by the corresponding model.

LangChain AIMessage objects include a usage_metadata attribute. When populated, this attribute will be a UsageMetadata dictionary with standard keys (e.g., "input_tokens" and "output_tokens").

Examples:

OpenAI:

In [1]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
openai_response = llm.invoke("hello")
openai_response.usage_metadata

{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

Using AIMessage.response_metadata
Metadata from the model response is also included in the AIMessage response_metadata attribute. These data are typically not standardized. Note that different providers adopt different conventions for representing token counts:

In [2]:
print(f'OpenAI: {openai_response.response_metadata["token_usage"]}\n')

OpenAI: {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}



In [3]:
openai_response

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_25624ae3a5', 'finish_reason': 'stop', 'logprobs': None}, id='run-403cf831-0fde-4d96-a403-37c8c47d9cd3-0', usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17})

### Streaming
Some providers support token count metadata in a streaming context.

OpenAI
For example, OpenAI will return a message chunk at the end of a stream with token usage information. This behavior is supported by langchain-openai >= 0.1.9 and can be enabled by setting stream_usage=True. This attribute can also be set when ChatOpenAI is instantiated.

In [4]:
llm = ChatOpenAI(model="gpt-4o")

aggregate = None
for chunk in llm.stream("hello", stream_usage=True):
    print(chunk)
    aggregate = chunk if aggregate is None else aggregate + chunk

content='' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content='Hello' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content='!' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' How' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' can' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' I' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' assist' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' you' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content=' today' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content='?' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_25624ae3a5'} id='run-ea334cc7-6875-4925-8871-5a403fb1c69d'
content='' id='run-ea334cc7-6875-4925-8871-5a403fb1c69d' usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}


In [5]:
aggregate

AIMessageChunk(content='Hello! How can I assist you today?', response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_25624ae3a5'}, id='run-ea334cc7-6875-4925-8871-5a403fb1c69d', usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17})

In [6]:
print(aggregate.content)
print(aggregate.usage_metadata)

Hello! How can I assist you today?
{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}


To disable streaming token counts for OpenAI, set stream_usage to False, or omit it from the parameters:

In [7]:
aggregate = None
for chunk in llm.stream("hello"):
    print(chunk)

content='' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content='Hello' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content='!' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' How' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' can' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' I' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' assist' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' you' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content=' today' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content='?' id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_25624ae3a5'} id='run-154157e1-21d7-4e61-94a1-08c689cd0f99'


See the below example, where we return output structured to a desired schema, but can still observe token usage streamed from intermediate steps.

In [9]:
from langchain_core.pydantic_v1 import BaseModel, Field


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


llm = ChatOpenAI(
    model="gpt-4o-mini",
    stream_usage=True,
)
# Under the hood, .with_structured_output binds tools to the
# chat model and appends a parser.
structured_llm = llm.with_structured_output(Joke)


async for event in structured_llm.astream_events("Tell me a joke", version="v2"):
    if event["event"] == "on_chat_model_end":
        print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
    elif event["event"] == "on_chain_end":
        print(event["data"]["output"])
    else:
        pass

Token usage: {'input_tokens': 74, 'output_tokens': 26, 'total_tokens': 100}

setup='Why did the scarecrow win an award?' punchline='Because he was outstanding in his field!'


## Using callbacks
There are also some API-specific callback context managers that allow you to track token usage across multiple calls. It is currently only implemented for the OpenAI API and Bedrock Anthropic API.

### OpenAI
Let's first look at an extremely simple example of tracking token usage for a single Chat model call.

In [10]:
# !pip install -qU langchain-community wikipedia

from langchain_community.callbacks.manager import get_openai_callback

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    stream_usage=True,
)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

Tokens Used: 28
	Prompt Tokens: 11
	Completion Tokens: 17
Successful Requests: 1
Total Cost (USD): $1.185e-05


Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequence.

In [11]:
with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print(cb.total_tokens)

58


In [12]:
with get_openai_callback() as cb:
    for chunk in llm.stream("Tell me a joke"):
        pass
    print(cb)

Tokens Used: 29
	Prompt Tokens: 11
	Completion Tokens: 18
Successful Requests: 1
Total Cost (USD): $1.2449999999999998e-05


If a chain or agent with multiple steps in it is used, it will track all those steps.

In [14]:
from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You're a helpful assistant"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)
tools = load_tools(["wikipedia"])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [15]:
with get_openai_callback() as cb:
    response = agent_executor.invoke(
        {
            "input": "What's a hummingbird's scientific name and what's the fastest bird species?"
        }
    )
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Hummingbird'}`


[0m[36;1m[1;3mPage: Hummingbird
Summary: Hummingbirds are birds native to the Americas and comprise the biological family Trochilidae. With approximately 366 species and 113 genera, they occur from Alaska to Tierra del Fuego, but most species are found in Central and South America. As of 2024, 21 hummingbird species are listed as endangered or critically endangered, with numerous species declining in population.
Hummingbirds have varied specialized characteristics to enable rapid, maneuverable flight: exceptional metabolic capacity, adaptations to high altitude, sensitive visual and communication abilities, and long-distance migration in some species. Among all birds, male hummingbirds have the widest diversity of plumage color, particularly in blues, greens, and purples. Hummingbirds are the smallest mature birds, measuring 7.5–13 cm (3–5 in) in length. The smallest