# Install the solo-server package using pip
pip install solo-server
# Run the solo server setup in simple mode
solo setup
- Seamless Setup: Manage your on device AI with a simple CLI and HTTP servers
- Open Model Registry: Pull models from registries like Ollama & Hugging Face
- Cross-Platform Compatibility: Deploy AI models effortlessly on your hardware
- Configurable Framework: Auto-detect hardware (CPU, GPU, RAM) and sets configs
- ๐ Docker: Required for containerization
Install 'uv' using these docs: https://docs.astral.sh/uv/getting-started/installation/
# Install uv
# On Windows (PowerShell)
iwr https://astral.sh/uv/install.ps1 -useb | iex
# If you have admin use, consider: https://github.com/astral-sh/uv/issues/3116
powershell -ExecutionPolicy Bypass -c "pip install uv"
# On Unix/MacOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
# Activate the virtual environment
source .venv/bin/activate # On Unix/MacOS
# OR
.venv\Scripts\activate # On Windows
uv pip install solo-server
Creates an isolated environment using uv
for performance and stability.
Run the interactive setup to configure Solo Server:
solo setup
โ๏ธ Detects CPU, GPU, RAM for hardware-optimized execution
โ๏ธ Auto-configures solo.conf
with optimal settings
โ๏ธ Recommends the compute backend OCI (CUDA, HIP, SYCL, Vulkan, CPU, Metal)
Example Output:
โญโโโโโโโโโโโโโโโโโโ System Information โโโโโโโโโโโโโโโโโโโฎ
โ Operating System: Windows โ
โ CPU: AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD โ
โ CPU Cores: 8 โ
โ Memory: 15.42GB โ
โ GPU: NVIDIA โ
โ GPU Model: NVIDIA GeForce GTX 1660 Ti โ
โ GPU Memory: 6144.0GB โ
โ Compute Backend: CUDA โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ง Starting Solo Server Setup...
๐ Available Server Options:
โข Ollama
โข vLLM
โข Llama.cpp
โจ Ollama is recommended for your system
Choose server [ollama]:
solo serve -s ollama -m llama3.2
Command Options:
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --server -s TEXT Server type (ollama, vllm, llama.cpp) [default: ollama] โ
โ --model -m TEXT Model name or path [default: None] โ
โ --port -p INTEGER Port to run the server on [default: None] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
You can now use the API endpoint created by the Solo Server to interact with the model. You can send a POST request to http://localhost:11434/api/chat
with a JSON payload containing the model name and the messages you want to send to the model.
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
solo status
Example Output:
๐น Running Models:
-------------------------------------------
| Name | Model | Backend | Port |
|----------|--------|---------|------|
| llama3 | Llama3 | CUDA | 8080 |
| gptj | GPT-J | CPU | 8081 |
-------------------------------------------
solo stop
Example Output:
๐ Stopping Solo Server...
โ
Solo server stopped successfully.
After setup, all settings are stored in:
~/.solo_server/solo.json
Example:
# Solo Server Configuration
{
"hugging_face": {
"token": ""
},
"system_info": {
"os": "Windows",
"cpu_model": "AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD",
"cpu_cores": 8,
"memory_gb": 15.42,
"gpu_vendor": "NVIDIA",
"gpu_model": "NVIDIA GeForce GTX 1660 Ti",
"gpu_memory": 6144.0,
"compute_backend": "CUDA"
},
"starfish": {
"api_key": ""
},
"hardware": {
"use_gpu": true
}
}
Refer example_apps for sample applications.
# Clone the repository
git clone https://github.com/GetSoloTech/solo-server.git
# Navigate to the directory
cd solo-server
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Unix/MacOS
# OR
.venv\Scripts\activate # Windows
# Install in editable mode
pip install -e .
This project wouldn't be possible without the help of other projects like:
- uv
- llama.cpp
- ramalama
- ollama
- whisper.cpp
- vllm
- podman
- huggingface
- llamafile
- cog
Like using Solo, consider leaving us a โญ on GitHub