Skip to content

Cap max_output tokens to 15%-20% of input tokens #251927

Closed
@isidorn

Description

@isidorn

As proposed by Logan: "if we think 64k is too much we should come up with some heuristics as to what to reserve for output. Maybe output should be at max 20% input?"

15%-20% are good numbers.
For example, sonnet 4 has 200k, 16k = 12.5%
Gemini 2.5pro has 1M input, 64K output = 15%

Thus my recommendation is that we cap max_output at 15% of input.

fyi @roblourens

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions