-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Chromium Issue Tracker
https://issuetracker.google.com/460808117
There's an observed difference in token counting between LanguageModelSession.measureInputUsage(), LanguageModelSession.prompt(), and the values reported in QuotaExceededError exceptions.
Observations
measureInputUsage(input) appears to return a token count that includes not only the input itself but also tokens representing a model response prefix/delimiter (e.g., ). This matches the token count used when the same input is passed to prompt(). For example, measureInputUsage("hello") might return 9.
append(input) adds to the session context, and the resulting session.inputUsage reflects the size of the input with user-role delimiters (e.g., hello), but without the model response prefix. This might be 6 tokens for "hello".
QuotaExceededError.requested seems to be based on the context size before the model response prefix is added, aligning with the append() behavior, not the measureInputUsage() or prompt() execution count.
This means measureInputUsage() currently predicts the cost of a prompt() call, but quota errors are triggered based on a smaller token count.
Example
lm = await LanguageModel.create()
await lm.measureInputUsage("hello")
// Returns 9 - Includes response delimiter like <model>
await lm.append("hello")
lm.inputUsage
// Is 6 - Represents <user>hello<end>
// prompt("hello") will execute with 9 tokens: <user>hello<end><model>
// If "hello" (6 tokens) was under quota, but adding <model> (3 tokens) exceeds it,
// an error might occur in prompt().
// If initialPrompts caused an error, QuotaExceededError.requested seems to reflect
// the size *without* the <model> delimiter.
Impact
This inconsistency could make it difficult for developers to accurately predict when a QuotaExceededError will occur. Using measureInputUsage() to check against inputQuota before calling prompt() or when using initialPrompts might not be reliable, as the error is thrown based on a different tokenization count.
Request for Specification
The Prompt API specification should clearly define:
What exactly measureInputUsage() is intended to measure. Should it include model-specific response delimiters?
Should the token count used for triggering QuotaExceededError include response delimiters or not? Ideally, this should be consistent.
Whether measureInputUsage() should have an option to include/exclude the response delimiter tokens, allowing developers to check against either the append context size or the prompt execution size.
Resolving this will ensure developers can reliably use measureInputUsage() to manage context size and avoid quota errors.