Only require max_tokens when token rate limits apply#3771
Conversation
…fix/issue-3648-max-tokens-rate-limit
There was a problem hiding this comment.
Pull Request Overview
This PR fixes issue #3648 by making the max_tokens parameter only required when token rate limits are actually active. The change introduces conditional validation that checks which rate limit resources are configured before requiring specific parameters.
Key changes:
- Added resource-aware validation for rate limiting requirements
- Replaced fixed resource usage calculation with conditional estimation based on active rate limits
- Updated the trait interface to pass rate-limited resources to estimation methods
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tensorzero-core/src/rate_limiting/mod.rs | Adds method to get active rate-limited resources and updates resource usage calculation to be conditional |
| tensorzero-core/src/model.rs | Adds test coverage for max_tokens validation with different rate limiting configurations |
| tensorzero-core/src/inference/types/mod.rs | Updates ModelInferenceRequest to conditionally estimate token usage only when token rate limits are active |
| tensorzero-core/src/embeddings.rs | Updates EmbeddingRequest to conditionally estimate resource usage based on active rate limits |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…into gb/fix-3648
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Fix #3648
Important
Modify rate limiting to require
max_tokensonly when token rate limits apply, updatingRateLimitedRequestimplementations and adding relevant tests.RateLimitedRequesttrait'sestimated_resource_usagemethod now takesresourcesparameter to determine ifmax_tokensis required.EmbeddingRequestandModelInferenceRequestimplementations updated to conditionally requiremax_tokensbased onRateLimitResource::Tokenpresence.RateLimitingConfig::get_rate_limited_resourcesmethod added to determine active rate-limited resources.test_model_provider_infer_max_tokens_checkinmodel.rsto validate behavior whenmax_tokensis missing or provided.test_max_tokens_validation_with_rate_limitsinrate_limiting/mod.rsto ensure correct resource inclusion based on rate limits.RateLimitResourceUsagetoEstimatedRateLimitResourceUsagein several places for clarity.This description was created by
for d7da1ef. You can customize this summary. It will automatically update as commits are pushed.