feat(reranker): per-provider HTTP timeout env vars#1810
Merged
Conversation
Closes #1807. The HTTP-based rerankers (cohere, openrouter, zeroentropy, siliconflow, alibaba, litellm proxy/SDK, google) all hardcoded a 60s timeout, forcing users with slower self-hosted models or large batches to patch the source. Each provider now reads its own HINDSIGHT_API_RERANKER_<PROVIDER>_TIMEOUT env var (default 60.0s, so unset envs keep current behavior). TEI already had its own knob.
r266-tech
added a commit
to r266-tech/hindsight
that referenced
this pull request
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1807.
Every HTTP-based reranker provider in
cross_encoder.py(cohere, openrouter, zeroentropy, siliconflow, alibaba, litellm proxy/SDK, google) had a hardcoded 60.0s timeout on itshttpx.AsyncClient/ Cohere SDK /litellm.arerankcall. The reporter was patching the source via a Docker volume mount to lift it; this PR exposes the knob properly.HINDSIGHT_API_RERANKER_<PROVIDER>_TIMEOUT(defaults to60.0so unset envs keep current behavior).HindsightConfig.from_env()intocreate_cross_encoder_from_env().HINDSIGHT_API_RERANKER_TEI_HTTP_TIMEOUT— unchanged.hindsight-docs/docs/developer/configuration.md.Test plan
uv run pytest tests/test_reranker_timeouts.py— new parametrized test asserts each provider's instance (or its inner HTTP client) ends up with the configured timeout; a second test asserts unset envs still yield 60.0.uv run pytest tests/test_cohere_cross_encoder.py tests/test_google_cross_encoder.py tests/test_litellm_sdk_cross_encoder.py tests/test_tei_cross_encoder.py tests/test_reranker_error_handling.py tests/test_reranker_score_normalization.py— 76 passed, 6 skipped, no regressions../scripts/hooks/lint.sh— all lints pass.uv run ty check hindsight_api/— clean.