Where On-Device and Cloud LLMs Meet

What is this? Minions is a communication protocol that enables small on-device models to collaborate with frontier models in the cloud. By only reading long contexts locally, we can reduce cloud costs with minimal or no quality degradation. This repository provides a demonstration of the protocol. Get started below or see our paper and blogpost below for more information.

Paper: Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models

Blogpost: https://hazyresearch.stanford.edu/blog/2025-02-24-minions

Setup

We have tested the following setup on Mac and Ubuntu with Python 3.10-3.11 (Note: Python 3.13 is not supported)

Optional: Create a virtual environment with your favorite package manager (e.g. conda, venv, uv)

conda create -n minions python=3.11

Step 1: Clone the repository and install the Python package.

git clone https://github.com/HazyResearch/minions.git
cd minions
pip install -e .  # installs the minions package in editable mode

note: for optional MLX-LM install the package with the following command:

pip install -e ".[mlx]"

note: for optional Cartesia-MLX install, pip install the basic package and then follow the instructions below.

Step 2: Install a server for running the local model.

We support two servers for running local models: ollama and tokasaurus. You need to install at least one of these.

You should use ollama if you do not have access to NVIDIA GPUs. Install ollama following the instructions here. To enable Flash Attention, run launchctl setenv OLLAMA_FLASH_ATTENTION 1 and, if on a mac, restart the ollama app.
You should use tokasaurus if you have access to NVIDIA GPUs and you are running the Minions protocol, which benefits from the high-throughput of tokasaurus. Install tokasaurus with the following command:

uv pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ tokasaurus==0.0.1.post1

Optional: Install Cartesia-MLX (only available on Apple Silicon)

Download XCode
Install the command line tools by running xcode-select --install
Install the Nanobind🧮

pip install nanobind@git+https://github.com/wjakob/nanobind.git@2f04eac452a6d9142dedb957701bdb20125561e4

Install the Cartesia Metal backend by running the following command:

pip install git+https://github.com/cartesia-ai/edge.git#subdirectory=cartesia-metal

Install the Cartesia-MLX package by running the following command:

pip install git+https://github.com/cartesia-ai/edge.git#subdirectory=cartesia-mlx

Step 3: Set your API key for at least one of the following cloud LLM providers.

If needed, create an OpenAI API Key or TogetherAI API key for the cloud model.

# OpenAI
export OPENAI_API_KEY=<your-openai-api-key>
export OPENAI_BASE_URL=<your-openai-base-url>  # Optional: Use a different OpenAI API endpoint

# Together AI
export TOGETHER_API_KEY=<your-together-api-key>

# OpenRouter
export OPENROUTER_API_KEY=<your-openrouter-api-key>
export OPENROUTER_BASE_URL=<your-openrouter-base-url>  # Optional: Use a different OpenRouter API endpoint

# Perplexity
export PERPLEXITY_API_KEY=<your-perplexity-api-key>
export PERPLEXITY_BASE_URL=<your-perplexity-base-url>  # Optional: Use a different Perplexity API endpoint

# Tokasaurus
export TOKASAURUS_BASE_URL=<your-tokasaurus-base-url>  # Optional: Use a different Tokasaurus API endpoint

Minions Demo Application

To try the Minion or Minions protocol, run the following command:

streamlit run app.py

If you are seeing an error about the ollama client,

An error occurred: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download

try running the following command:

OLLAMA_FLASH_ATTENTION=1 ollama serve

Example code: Minion (singular)

The following example is for an ollama local client and an openai remote client. The protocol is minion.

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minion import Minion

local_client = OllamaClient(
        model_name="llama3.2",
    )

remote_client = OpenAIClient(
        model_name="gpt-4o",
    )

# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)


context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minion(
    task=task,
    context=[context],
    max_rounds=2
)

Example Code: Minions (plural)

The following example is for an ollama local client and an openai remote client. The protocol is minions.

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minions import Minions
from pydantic import BaseModel

class StructuredLocalOutput(BaseModel):
    explanation: str
    citation: str | None
    answer: str | None

local_client = OllamaClient(
        model_name="llama3.2",
        temperature=0.0,
        structured_output_schema=StructuredLocalOutput
)

remote_client = OpenAIClient(
        model_name="gpt-4o",
)


# Instantiate the Minion object with both clients
minion = Minions(local_client, remote_client)


context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minion(
    task=task,
    doc_metadata="Medical Report",
    context=[context],
    max_rounds=2
)

Python Notebook

To run Minion/Minions in a notebook, checkout minions.ipynb.

CLI

To run Minion/Minions in a CLI, checkout minions_cli.py.

Set your choice of local and remote models by running the following command. The format is <provider>/<model_name>. Choice of providers are ollama, openai, anthropic, together, perplexity, openrouter, groq, and mlx.

export MINIONS_LOCAL=ollama/llama3.2
export MINIONS_REMOTE=openai/gpt-4o

minions --help

minions --context <path_to_context> --protocol <minion|minions>

Miscellaneous Setup

Using Azure OpenAI with Minions

Set Environment Variables

export AZURE_OPENAI_API_KEY=your-api-key
export AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
export AZURE_OPENAI_API_VERSION=2024-02-15-preview

Example Code

Here's an example of how to use Azure OpenAI with the Minions protocol in your own code:

from minions.clients.ollama import OllamaClient
from minions.clients.azure_openai import AzureOpenAIClient
from minions.minion import Minion

local_client = OllamaClient(
    model_name="llama3.2",
)

remote_client = AzureOpenAIClient(
    model_name="gpt-4o",  # This should match your deployment name
    api_key="your-api-key",
    azure_endpoint="https://your-resource-name.openai.azure.com/",
    api_version="2024-02-15-preview",
)

# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)

Maintainers

Avanika Narayan (contact: avanika@cs.stanford.edu)
Dan Biderman (contact: biderman@stanford.edu)
Sabri Eyuboglu (contact: eyuboglu@cs.stanford.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
assets		assets
minions		minions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
gateway_app_demo.py		gateway_app_demo.py
mcp.json		mcp.json
minions.ipynb		minions.ipynb
minions_cli.py		minions_cli.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Where On-Device and Cloud LLMs Meet

Setup

Minions Demo Application

Example code: Minion (singular)

Example Code: Minions (plural)

Python Notebook

CLI

Miscellaneous Setup

Using Azure OpenAI with Minions

Set Environment Variables

Example Code

Maintainers

About

Releases

Packages

Contributors 10

Languages

License

HazyResearch/minions

Folders and files

Latest commit

History

Repository files navigation

Where On-Device and Cloud LLMs Meet

Setup

Minions Demo Application

Example code: Minion (singular)

Example Code: Minions (plural)

Python Notebook

CLI

Miscellaneous Setup

Using Azure OpenAI with Minions

Set Environment Variables

Example Code

Maintainers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages