Run Claude Code using Gemma 4, Gemini, or any Google GenAI model — completely local, no Anthropic API key needed.
Drop-in local proxy that lets you use Gemma 4, Gemini, or any Google GenAI model with Claude Code. No changes to your Claude Code workflow — just point it at the proxy.
- Anthropic Messages API compatible — works transparently with Claude Code and any Anthropic SDK client
- Dynamic model passthrough — use any model you specify in the request (Gemma 4, Gemini, etc.)
- Streaming support — full SSE streaming with proper block indexing
- Tool / function calling — Claude Code's built-in tools (read, write, edit, bash) work via function calling translation
- Google Search grounding — optional web search via Google Search (enabled by default)
- Thinking config — control reasoning depth via
thinking.budget_tokens - Retry logic — automatic retries with exponential backoff on transient failures
- Connection handling — detects client disconnects mid-stream, avoids wasted API calls
- Python 3.10+
- Google AI Studio API key — get one free
# 1. Clone the repo
git clone https://github.com/yourusername/genai-code.git
cd genai-code
# 2. Install dependencies
pip install -r requirements.txt
# 3. Set your API key
cp .env.example .env
# Edit .env and replace with your key: GEMINI_API_KEY=your_key_here
# 4. Start the proxy
uvicorn main:app --host 0.0.0.0 --port 8000Verify it's running:
curl http://localhost:8000/health
# {"status":"ok","gemini_api_key_set":true,"model":"gemma-4-26b-a4b-it",...}Create a .claude/settings.json file in your project root (or global ~/.claude/settings.json):
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"env": {
"ANTHROPIC_AUTH_TOKEN": "sk-my-local-proxy-key",
"ANTHROPIC_BASE_URL": "http://localhost:8000",
"ANTHROPIC_MODEL": "gemma-4-26b-a4b-it",
"ANTHROPIC_SMALL_FAST_MODEL": "gemma-4-26b-a4b-it"
}
}Then run claude normally — all requests route through Gemma 4.
Tip: Change
ANTHROPIC_MODELto any GenAI model (e.g.,gemini-2.5-flash,gemma-4-29b-a4b-it). The proxy forwards the model name directly to the GenAI API.
All endpoints are Anthropic Messages API compatible.
| Method | Path | Description |
|---|---|---|
GET |
/ |
Status check |
GET |
/health |
Health + API key status |
GET |
/v1/models |
List available models |
POST |
/v1/messages |
Chat endpoint (supports stream: true/false) |
POST |
/v1/messages/stream |
Always-streaming alias |
GET |
/v1/tools |
List proxy built-in tools |
{
"model": "gemma-4-26b-a4b-it",
"messages": [
{ "role": "user", "content": "Write a Python function to sort a list." }
],
"max_tokens": 4096,
"stream": false,
"system": "You are an expert Python developer. Write clean, typed code.",
"temperature": 0.2,
"enable_google_search": true,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
},
"tools": [
{
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": { "type": "string", "description": "City name" }
},
"required": ["location"]
}
}
]
}| Field | Type | Default | Description |
|---|---|---|---|
model |
string |
gemma-4-26b-a4b-it |
Passed through to GenAI — use any supported model |
enable_google_search |
bool |
true |
Enables Google Search grounding for factual queries |
thinking |
object |
— | Maps budget_tokens to GenAI thinking level: <2k→LOW, 2k-8k→MEDIUM, ≥8k→HIGH |
tools |
array |
— | Custom function-calling tools in Anthropic tool format |
Note: When
toolsare provided, Google Search is automatically disabled to avoid SDK conflicts.
Non-streaming:
curl http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: dummy" \
-d '{
"model": "gemma-4-26b-a4b-it",
"messages": [{"role":"user","content":"What is 2+2?"}],
"max_tokens": 256
}'Streaming:
curl http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-4-26b-a4b-it",
"messages": [{"role":"user","content":"Tell me a joke"}],
"stream": true
}'| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
Yes | Your Google AI Studio API key |
Set via .env file or export directly:
export GEMINI_API_KEY=your_key_hereAny model available through the Google GenAI SDK should work, including:
gemma-4-26b-a4b-it(default)gemma-4-29b-a4b-itgemini-2.5-flashgemini-2.5-pro
Pass the model name in your request or via ANTHROPIC_MODEL in Claude Code settings.
Claude Code ──POST /v1/messages──▶ FastAPI Proxy ──generateContent──▶ Google GenAI API
▲ │ │
│ SSE stream / JSON │ Gemma 4 / Gemini │
└────────────────────────────────────└────────────────────────────────┘
- Claude Code sends Anthropic-format requests to the proxy
- Proxy converts messages/tools to GenAI
ContentandToolformats - GenAI API response is converted back to Anthropic format (
tool_use/textblocks) - Responses include proper
anthropic-versionheaders and error shapes
| Symptom | Likely Cause | Fix |
|---|---|---|
GEMINI_API_KEY env var not set |
Missing API key | Set GEMINI_API_KEY in .env or export it |
502 error in Claude Code |
GenAI API failure | Check logs for the exception type; retry logic handles most transient errors |
| Tools not being called | Model doesn't support function calling well | Try gemini-2.5-flash which has stronger tool-use capabilities |
| Empty responses | max_tokens too low or model error |
Increase max_tokens (capped at 8192) |
| Slow responses | Thinking config or large context | Reduce budget_tokens or use a faster model like gemini-2.5-flash |
The proxy logs to stdout. Set log level via uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --log-level debugIssues and pull requests welcome. Areas that could use help:
- Better tool schema translation edge cases (anyOf, oneOf)
- Token usage extraction from GenAI responses
- Multi-model
/v1/modelslisting - Docker image for easier deployment
MIT