| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->
High-performance proxy server converting OpenAI API requests to Vertex AI (Anthropic Claude) format.
Many models → One unified interface.
ModelMux is a production-ready, async Rust proxy that acts as a drop-in replacement for the OpenAI API. It translates OpenAI-compatible requests into Google Vertex AI (Anthropic Claude) calls while preserving streaming, tool/function calling, and error semantics. Designed for performance, safety, and clean architecture, ModelMux is ideal for teams standardizing on OpenAI APIs while running on Vertex AI infrastructure.
ModelMux v1.0.0 adds service management and Linux packaging:
- 🍺 Brew services:
brew services start modelmux— run as a background service (macOS) - 🐧 systemd daemon: Linux system and user service units — see
packaging/systemd/ - 📦 .deb packages: Install on Ubuntu/Debian with
dpkg -i modelmux_*.deb - 🏗️ Multi-layered configuration: CLI args > env vars > user config > system config > defaults
- 📝 TOML configuration: Human-readable config files;
modelmux config initfor quick setup
Quick setup: modelmux config init creates your configuration interactively!
Installation • Quick Start • Features • Configuration • Roadmap
"The internet is like a vast electronic library. But someone has scattered all the books on the floor." — Lao Tzu
ModelMux is a high-performance Rust proxy server that seamlessly converts OpenAI-compatible API requests to Vertex AI (Anthropic Claude) format. Built with Rust Edition 2024 for maximum performance and type safety.
- 🔁 Drop-in OpenAI replacement — zero client changes
- ⚡ High performance — async Rust with Tokio
- 🧠 Full tool/function calling support
- 📡 Streaming (SSE) compatible
- 🛡 Strong typing & clean architecture
- ☁️ Built for Vertex AI (Claude)
Use ModelMux to standardize on the OpenAI API while keeping full control over your AI backend.
Stop rewriting API glue code. Start muxing.
- 🔌 OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
- 🛠️ Tool/Function Calling: Full support for OpenAI tool calling format
- 📡 Smart Streaming: Server-Sent Events (SSE) with intelligent client detection
- 🎯 Client Detection: Automatically adjusts behavior for IDEs, browsers, and CLI tools
- ⚡ High Performance: Async Rust with Tokio for maximum concurrency
- 🔒 Type Safety: Leverages Rust's type system for compile-time guarantees
- 🔄 Retry Logic: Configurable retry mechanisms with exponential backoff
- 📊 Observability: Structured logging and health monitoring
- 🧩 Clean Architecture: SOLID principles with modular design
- ⚙️ Professional Config: Multi-layered configuration with CLI management tools
brew tap yarenty/tap
brew install modelmuxcargo install modelmuxgit clone https://github.com/yarenty/modelmux
cd modelmux
cargo build --release
./target/release/modelmuxAdd to your Cargo.toml:
[dependencies]
modelmux = "1.0"Use the interactive configuration wizard:
modelmux config initOr create a configuration file manually. On macOS: ~/Library/Application Support/com.SkyCorp.modelmux/config.toml (or ~/.config/modelmux/config.toml on Linux):
[server]
port = 3000
log_level = "info"
enable_retries = true
max_retry_attempts = 3
[auth]
# Path to Google Cloud service account JSON file
service_account_file = "~/Library/Application Support/com.SkyCorp.modelmux/service-account.json"
# Or inline JSON for containers:
# service_account_json = '{"type": "service_account", ...}'
[vertex]
# Vertex AI provider - set these OR use env vars (.env supported)
project = "{your-project}"
region = "{your-region}"
location = "{your-region}"
publisher = "anthropic"
model = "{your-model}"
[streaming]
mode = "auto" # auto, never, standard, buffered, always
buffer_size = 65536
chunk_timeout_ms = 5000Note: You can also use a .env file or environment variables (VERTEX_PROJECT, VERTEX_REGION, etc.) for provider config.
modelmux
# or
cargo run --releaseHomebrew (macOS): Run as a background service with brew services start modelmux (start/stop/restart like PostgreSQL or Redis).
Linux (systemd): Run as a daemon with systemd — see packaging/systemd/README.md.
# Validate your configuration
modelmux config validate
# Start the server
modelmuxcurl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Hello, ModelMux!"}],
"stream": false
}'That's it! Your OpenAI code now talks to Vertex AI.
ModelMux uses a modern, professional configuration system with multiple sources:
Create ~/.config/modelmux/config.toml:
# ModelMux Configuration
# Platform-specific locations:
# Linux: ~/.config/modelmux/config.toml
# macOS: ~/Library/Application Support/modelmux/config.toml
# Windows: %APPDATA%/modelmux/config.toml
[server]
port = 3000
log_level = "info" # trace, debug, info, warn, error
enable_retries = true
max_retry_attempts = 3
[auth]
# Recommended: Use service account file
service_account_file = "~/.config/modelmux/service-account.json"
# Alternative: Inline JSON (for containers)
# service_account_json = '{"type": "service_account", ...}'
[vertex]
# Vertex AI provider (config file OR env vars / .env)
project = "{your-project}"
region = "{your-region}"
location = "{your-region}"
publisher = "{your publisher}}"
model = "{your-model}"
[streaming]
mode = "auto" # auto, never, standard, buffered, always
buffer_size = 65536
chunk_timeout_ms = 5000# Interactive setup wizard
modelmux config init
# Display current configuration
modelmux config show
# Validate configuration
modelmux config validate
# Edit configuration file
modelmux config editSupported for backward compatibility. Place a .env file in your project directory or current working directory:
# Provider configuration
LLM_PROVIDER=vertex
VERTEX_PROJECT=my-gcp-project
VERTEX_REGION=europe-west1
VERTEX_LOCATION=europe-west1
VERTEX_PUBLISHER=anthropic
VERTEX_MODEL_ID=claude-3-5-sonnet@20241022
# Configuration overrides (use MODELMUX_ prefix)
MODELMUX_SERVER_PORT=3000
MODELMUX_SERVER_LOG_LEVEL=info
MODELMUX_AUTH_SERVICE_ACCOUNT_FILE=/path/to/key.jsonThe .env file is loaded automatically when modelmux starts (from the current working directory).
ModelMux intelligently adapts its streaming behavior based on the client:
auto(default): Automatically detects client capabilities and chooses the best streaming mode- Forces non-streaming for IDEs (RustRover, IntelliJ, VS Code) and CLI tools (goose, curl)
- Uses buffered streaming for web browsers
- Uses standard streaming for API clients
non-streaming: Forces complete JSON responses for all clientsstandard: Word-by-word streaming as received from Vertex AIbuffered: Accumulates chunks for better client compatibility
ModelMux automatically detects problematic clients:
Non-streaming clients:
- JetBrains IDEs (RustRover, IntelliJ, PyCharm, etc.)
- CLI tools (goose, curl, wget, httpie)
- API testing tools (Postman, Insomnia, Thunder Client)
- Clients that don't accept
text/event-stream
Buffered streaming clients:
- Web browsers (Chrome, Firefox, Safari, Edge)
- VS Code and similar editors
POST /v1/chat/completions
OpenAI-compatible chat completions with full tool calling support.
GET /v1/models
List available models in OpenAI format.
GET /health
Service health and metrics endpoint.
Use ModelMux programmatically in your Rust applications:
use modelmux::{Config, create_app};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load configuration from environment
let config = Config::from_env()?;
// Create the application
let app = create_app(config).await?;
// Start server
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
axum::serve(listener, app).await?;
Ok(())
}OpenAI Client ──► ModelMux ──► Vertex AI (Claude)
│ │ │
│ │ │
OpenAI API ──► Translation ──► Anthropic API
Format Layer Format
Core Components:
config- Configuration management and environment handlingauth- Google Cloud authentication for Vertex AIserver- HTTP server with intelligent routingconverter- Bidirectional format translationerror- Comprehensive error types and handling
modelmux/
├── Cargo.toml # Dependencies and metadata
├── README.md # This file
├── LICENSE-MIT # MIT license
├── LICENSE-APACHE # Apache 2.0 license
├── docs/
└── src/
├── main.rs # Application entry point
├── lib.rs # Library interface
├── config.rs # Configuration management
├── auth.rs # Google Cloud authentication
├── error.rs # Error types
├── server.rs # HTTP server and routes
└── converter/ # Format conversion modules
├── mod.rs
├── openai_to_anthropic.rs
└── anthropic_to_openai.rs
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "List files in the current directory"}
],
"tools": [
{
"type": "function",
"function": {
"name": "list_directory",
"description": "List files in a directory",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
}
]
}'curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Write a haiku about Rust"}],
"stream": true
}'ModelMux is built for production workloads:
- Zero-copy JSON parsing where possible
- Async/await throughout for maximum concurrency
- Connection pooling for upstream requests
- Intelligent buffering for streaming responses
- Memory efficient request/response handling
| Feature | Node.js | ModelMux (Rust) |
|---|---|---|
| Performance | Good | Excellent |
| Memory Usage | Higher | Lower |
| Type Safety | Runtime | Compile-time |
| Error Handling | Try/catch | Result types |
| Concurrency | Event loop | Async/await |
| Startup Time | Fast | Very Fast |
| Binary Size | Large | Small |
curl http://localhost:3000/healthReturns service metrics:
{
"status": "ok",
"metrics": {
"total_requests": 1337,
"successful_requests": 1300,
"failed_requests": 37,
"quota_errors": 5,
"retry_attempts": 42
}
}Configure log levels via environment:
export LOG_LEVEL=debug
export RUST_LOG=modelmux=traceLicensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
git clone https://github.com/yarenty/modelmux
cd modelmux
cargo test
cargo runSee ROADMAP.md for detailed future plans.
✅ Completed in v1.0.0:
- ✅ Brew services and systemd daemon support
- ✅ .deb packages for Ubuntu/Debian (amd64, arm64)
- ✅ Professional configuration system with TOML files
- ✅ CLI configuration management (
modelmux config init/show/edit)
Near term:
- Docker container images
- Enhanced metrics and monitoring (Prometheus, OpenTelemetry)
Future:
- Multiple provider support (OpenAI, Anthropic, Cohere, etc.)
- Intelligent request routing and load balancing
- Request/response caching layer
- Web UI for configuration and monitoring
- Advanced analytics and usage insights
| | | | |
\ | | | /
\ | | | /
\| | |/
+----------->
Many models enter. One response leaves.
ModelMux - Because your AI shouldn't be tied to one vendor.
