Skip to content

[MOSH-2632]: Add OpenAPI support for judge max_tokens#268

Merged
jli-together merged 1 commit intomainfrom
jli/eval-worker
May 8, 2026
Merged

[MOSH-2632]: Add OpenAPI support for judge max_tokens#268
jli-together merged 1 commit intomainfrom
jli/eval-worker

Conversation

@jli-together
Copy link
Copy Markdown
Contributor

@jli-together jli-together commented May 7, 2026

Expose three optional fields on EvaluationJudgeModelConfig:

  • max_tokens: lets users override the default (32768) for judge models. Critical for reasoning models (e.g. Gemini, o-series) that consume output token budget on chain-of-thought before emitting visible content, causing truncated JSON and parse failures at the default limit.
  • temperature: lets users override the judge sampling temperature (default 0.05).

Also add num_workers and max_tokens to EvaluationModelRequest for the models being evaluated.

Expose three optional fields on EvaluationJudgeModelConfig:
- max_tokens: lets users override the default (32768) for judge models.
  Critical for reasoning models (e.g. Gemini, o-series) that consume
  output token budget on chain-of-thought before emitting visible content,
  causing truncated JSON and parse failures at the default limit.
- temperature: lets users override the judge sampling temperature (default 0.05).
- num_workers: concurrent workers for judge inference requests, useful for
  proxy endpoints like OpenRouter.

Also add num_workers and max_tokens to EvaluationModelRequest for the
models being evaluated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jli-together jli-together requested review from VProv and connermanuel May 7, 2026 16:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

✱ Stainless preview builds

This PR will update the togetherai SDKs with the following commit messages.

go

feat(api): add max_tokens and temperature parameters to eval judge

openapi

feat(api): add max_tokens and temperature fields to evaluation model

python

feat(api): add max_tokens and temperature to eval judge parameters

terraform

chore(internal): regenerate SDK with no functional changes

typescript

feat(api): add max_tokens and temperature params to evals judge model config
togetherai-openapi studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅

⚠️ togetherai-go studio · code

Your SDK build had a failure in the test CI job, which is a regression from the base state.
generate ✅build ⏭️lint ✅test ❗

go get github.com/stainless-sdks/togetherai-go@c6194463afd28dac5df67be10ed2d60f195572cf
⚠️ togetherai-python studio · code

Your SDK build had at least one "warning" diagnostic.
generate ⚠️build ✅lint ✅test ⏭️

pip install https://pkg.stainless.com/s/togetherai-python/d35fb643b2cd5eff5ccb2b8b2c0eb4fbc8d30734/together-2.12.0-py3-none-any.whl
⚠️ togetherai-typescript studio · conflict

Your SDK build had at least one warning diagnostic.

togetherai-terraform studio · code

Your SDK build had at least one "note" diagnostic.
generate ✅lint ✅test ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-05-08 02:31:11 UTC

@jli-together jli-together merged commit ed132d4 into main May 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants