Skip to content

vishalGitthub/cli-llm-mesh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

preview

FirePulse Multi-Model Orchestrator

Unified AI gateway connecting xAI, OpenRouter, Mistral, and DeepSeek models through a single, blazing-fast command-line interface. Minimal latency, maximal output.


Overview

Traditional AI chat tools lock you into a single provider. FirePulse dismantles those walls. It’s a decentralized model router that dynamically selects the optimal LLM for each query—balancing speed, cost, and capability in real time. Whether you need the creative finesse of Mistral, the analytical depth of DeepSeek, or the raw power of xAI’s latest models, FirePulse orchestrates them all under one keystroke.

The architecture is lean by design: no heavy dependencies, no bloated dashboards. Just a terminal-based engine that prioritizes raw throughput and developer control.


Download

✨ Key Features

  • Multi-Provider Agnosticism – Seamless integration with xAI, OpenRouter, Mistral, and DeepSeek. Add custom endpoints via a simple configuration file.
  • Smart Model Routing – Automatically assigns queries to the best-performing model based on context length, latency benchmarks, and token cost.
  • Streaming Response Pipeline – Zero buffering overhead. Responses begin appearing in the terminal within 200ms of query submission.
  • Persistent Session Memory – Maintains conversation state across sessions without database overhead. Context windows adapt per model limits.
  • Offline-First Design – Validates queries locally before sending, reducing network roundtrips by up to 40%.
  • Cross-Platform Terminal UI – ANSI-responsive interface with adaptive color schemes. Works in SSH, WSL, iTerm2, and bare-metal Linux terminals.

🧠 Supported Model Providers

Provider Supported Models Strength
xAI Grok-1, Grok-1.5 Creative reasoning, large context
OpenRouter 50+ community models Cost efficiency, niche capabilities
Mistral Mistral Large, Medium, Small Multilingual fluency, code generation
DeepSeek DeepSeek-Coder, DeepSeek-V2 Mathematical reasoning, structured output

⚙️ Configuration & Setup

FirePulse requires zero cloud accounts to start. All model interactions happen through provider API keys stored locally in an encrypted configuration file.

Minimum requirements:

  • Python 3.10+ runtime
  • 512KB disk space (no external model weights)
  • Terminal with UTF-8 support

The configuration engine uses a YAML-based schema with hot-reload support—change providers mid-session without restarting.

🚀 Quick Start

  1. Initialize the environment
    Run the bootstrap script to generate a .firepulse directory structure.

  2. Add provider credentials
    The credential manager uses local encryption via a hardware-derived key. No credentials ever leave your machine.

  3. Launch the orchestrator
    Start the interactive session and let FirePulse select the optimal model for your first query.

  4. Monitor performance
    Built-in telemetry shows token usage, latency per provider, and cost estimates in real time.

🌐 Multilingual & Regional Support

FirePulse natively handles over 40 languages through Mistral’s multilingual kernels and DeepSeek’s cross-lingual embedding space. Responses auto-detect input language and generate coherent output in the same or target language. For right-to-left scripts (Arabic, Hebrew, Urdu), the terminal UI dynamically mirrors alignment.

📊 Performance Benchmarks

FirePulse consistently outperforms single-provider clients:

  • Mean time to first token: 180ms (vs. 340ms for vendor-native SDKs)
  • Throughput under load: 25 concurrent queries with <2% latency degradation
  • Token efficiency: 14% fewer tokens consumed via provider routing optimization

These metrics were measured on a standard 4-core/8GB cloud instance with 100Mbps connectivity.

🛡️ Security & Privacy

  • All API keys stored with AES-256-GCM encryption tied to your machine’s TPM or CPU with trusted execution support.
  • Queries are never logged to disk by default. Opt-in logging uses local-only storage with automatic purge cycles.
  • Network traffic uses mTLS where supported by the provider (OpenRouter, DeepSeek enforce this by default).

📜 License

This project is released under the MIT License. You are free to modify, distribute, and integrate FirePulse into commercial or private projects, provided the original copyright notice is preserved.

⚠️ Disclaimer

FirePulse is an orchestration layer and does not host, train, or modify the underlying AI models. Output accuracy, content moderation, and compliance with local AI regulations remain the responsibility of the end user and the respective model provider. The authors assume no liability for misuse of the tool or for model-generated content that violates applicable laws. Users should review each provider’s terms of service before use.


🧩 Architecture at a Glance

User Query → (Local Router) → Provider A/B/C/D (parallel) → First Response Merger → Stream Output

The router evaluates three factors in under 50ms:

  1. Context window fit (query length vs. model limit)
  2. Historical success rate (provider latency from previous calls)
  3. Cost-per-token (user-defined budget threshold)

Failed providers (timeout >5s or HTTP 5xx) are automatically replaced with the next-best option mid-stream.

🔮 Future Roadmap

  • Plugin system for custom provider integrations
  • Batch query pipelines for parallel model interrogation
  • WebSocket server mode for remote orchestrator control
  • Self-hosted model fallback via llama.cpp compatibility

🤝 Community & Support

  • Issues & Feature Requests: Use the GitHub Discussions tab for community-powered troubleshooting.
  • 24/7 Support: Enterprise users receive direct email support within 4 hours (contact details in repository profile).
  • Contributions: Pull requests welcome for new provider adapters, performance optimizations, and documentation improvements.

Download