Clarify and Specify measureInputUsage Token Counting Behavior vs. Actual Quota Consumption

**Chromium Issue Tracker**
https://issuetracker.google.com/460808117

There's an observed difference in token counting between LanguageModelSession.measureInputUsage(), LanguageModelSession.prompt(), and the values reported in QuotaExceededError exceptions.

**Observations**

measureInputUsage(input) appears to return a token count that includes not only the input itself but also tokens representing a model response prefix/delimiter (e.g., <model>). This matches the token count used when the same input is passed to prompt(). For example, measureInputUsage("hello") might return 9.
append(input) adds to the session context, and the resulting session.inputUsage reflects the size of the input with user-role delimiters (e.g., <user>hello<end>), but without the model response prefix. This might be 6 tokens for "hello".
QuotaExceededError.requested seems to be based on the context size before the model response prefix is added, aligning with the append() behavior, not the measureInputUsage() or prompt() execution count.
This means measureInputUsage() currently predicts the cost of a prompt() call, but quota errors are triggered based on a smaller token count.

**Example**

```
lm = await LanguageModel.create()
await lm.measureInputUsage("hello")
// Returns 9 - Includes response delimiter like <model>

await lm.append("hello")
lm.inputUsage
// Is 6 - Represents <user>hello<end>

// prompt("hello") will execute with 9 tokens: <user>hello<end><model>

// If "hello" (6 tokens) was under quota, but adding <model> (3 tokens) exceeds it,
// an error might occur in prompt().

// If initialPrompts caused an error, QuotaExceededError.requested seems to reflect
// the size *without* the <model> delimiter.
```

**Impact**

This inconsistency could make it difficult for developers to accurately predict when a QuotaExceededError will occur. Using measureInputUsage() to check against inputQuota before calling prompt() or when using initialPrompts might not be reliable, as the error is thrown based on a different tokenization count.

**Request for Specification**

The Prompt API specification should clearly define:

What exactly measureInputUsage() is intended to measure. Should it include model-specific response delimiters?
Should the token count used for triggering QuotaExceededError include response delimiters or not? Ideally, this should be consistent.
Whether measureInputUsage() should have an option to include/exclude the response delimiter tokens, allowing developers to check against either the append context size or the prompt execution size.
Resolving this will ensure developers can reliably use measureInputUsage() to manage context size and avoid quota errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify and Specify measureInputUsage Token Counting Behavior vs. Actual Quota Consumption #164

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify and Specify measureInputUsage Token Counting Behavior vs. Actual Quota Consumption #164

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions