Closed
Description
As proposed by Logan: "if we think 64k is too much we should come up with some heuristics as to what to reserve for output. Maybe output should be at max 20% input?"
15%-20% are good numbers.
For example, sonnet 4 has 200k, 16k = 12.5%
Gemini 2.5pro has 1M input, 64K output = 15%
Thus my recommendation is that we cap max_output at 15% of input.
fyi @roblourens