Skip to content

Switch Vertex AI provider to native genai SDK#242

Merged
nicoloboschi merged 1 commit into
mainfrom
feat/vertexai-native-sdk
Jan 30, 2026
Merged

Switch Vertex AI provider to native genai SDK#242
nicoloboschi merged 1 commit into
mainfrom
feat/vertexai-native-sdk

Conversation

@cdbartholomew
Copy link
Copy Markdown
Contributor

Summary

  • Replace the OpenAI-compatible endpoint with the native google-genai SDK (genai.Client(vertexai=True)) for the vertexai LLM provider
  • Delete vertexai_token_refresher.py — the SDK handles credential refresh internally
  • Remove the 8192 max output token limitation that the OpenAI-compatible endpoint enforced
  • Strip markdown code fences in consolidation JSON parsing (Flash Lite wraps JSON in ```json blocks)

Motivation

The vertexai provider originally used the Vertex AI OpenAI-compatible endpoint to reuse the AsyncOpenAI client code path. This required a custom TokenInjectingTransport, a background token refresher with async lifecycle management, and hit an undocumented 8192 max output token cap on the endpoint. The native genai SDK (already a dependency for the gemini provider) handles auth automatically and doesn't have the output token cap.

Changes

  • llm_wrapper.py: vertexai now creates genai.Client(vertexai=True, project=..., location=...) and routes through _call_gemini/_call_with_tools_gemini. Service account key auth preserved via credentials parameter. Model names with google/ prefix are auto-stripped.
  • vertexai_token_refresher.py: Deleted (no longer needed)
  • consolidator.py: Strip ``` code fences before JSON parsing
  • test_vertexai_provider.py: Rewritten for native SDK (mock genai.Client instead of token refresher)

Test plan

  • pytest tests/test_vertexai_provider.py — all 6 pass, 1 skipped (integration)
  • Local end-to-end: retain + recall + consolidation working with vertexai provider
  • Verified output tokens are no longer capped at 8192

Replace the OpenAI-compatible endpoint approach with the native
google-genai SDK for Vertex AI. This eliminates the custom token
refresher, TokenInjectingTransport, and async lifecycle complexity
while also removing the 8192 output token cap that the OpenAI
endpoint enforced.

Changes:
- vertexai provider now uses genai.Client(vertexai=True) instead of
  AsyncOpenAI with token-injecting transport
- Routes through existing _call_gemini/_call_with_tools_gemini paths
- Strips google/ prefix from model names (native SDK uses bare names)
- Preserves service account key auth via credentials parameter
- Delete vertexai_token_refresher.py (no longer needed)
- Strip markdown code fences in consolidator JSON parsing
- Rewrite vertexai tests for native SDK integration
@nicoloboschi nicoloboschi merged commit 49ae55a into main Jan 30, 2026
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants