Skip to content

Clarify and Specify measureInputUsage Token Counting Behavior vs. Actual Quota Consumption #164

@isaacahouma

Description

@isaacahouma

Chromium Issue Tracker
https://issuetracker.google.com/460808117

There's an observed difference in token counting between LanguageModelSession.measureInputUsage(), LanguageModelSession.prompt(), and the values reported in QuotaExceededError exceptions.

Observations

measureInputUsage(input) appears to return a token count that includes not only the input itself but also tokens representing a model response prefix/delimiter (e.g., ). This matches the token count used when the same input is passed to prompt(). For example, measureInputUsage("hello") might return 9.
append(input) adds to the session context, and the resulting session.inputUsage reflects the size of the input with user-role delimiters (e.g., hello), but without the model response prefix. This might be 6 tokens for "hello".
QuotaExceededError.requested seems to be based on the context size before the model response prefix is added, aligning with the append() behavior, not the measureInputUsage() or prompt() execution count.
This means measureInputUsage() currently predicts the cost of a prompt() call, but quota errors are triggered based on a smaller token count.

Example

lm = await LanguageModel.create()
await lm.measureInputUsage("hello")
// Returns 9 - Includes response delimiter like <model>

await lm.append("hello")
lm.inputUsage
// Is 6 - Represents <user>hello<end>

// prompt("hello") will execute with 9 tokens: <user>hello<end><model>

// If "hello" (6 tokens) was under quota, but adding <model> (3 tokens) exceeds it,
// an error might occur in prompt().

// If initialPrompts caused an error, QuotaExceededError.requested seems to reflect
// the size *without* the <model> delimiter.

Impact

This inconsistency could make it difficult for developers to accurately predict when a QuotaExceededError will occur. Using measureInputUsage() to check against inputQuota before calling prompt() or when using initialPrompts might not be reliable, as the error is thrown based on a different tokenization count.

Request for Specification

The Prompt API specification should clearly define:

What exactly measureInputUsage() is intended to measure. Should it include model-specific response delimiters?
Should the token count used for triggering QuotaExceededError include response delimiters or not? Ideally, this should be consistent.
Whether measureInputUsage() should have an option to include/exclude the response delimiter tokens, allowing developers to check against either the append context size or the prompt execution size.
Resolving this will ensure developers can reliably use measureInputUsage() to manage context size and avoid quota errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions