Fine-tune language models with LoRA using a simple, powerful API. Train models with composable primitives that give you full control over the training loop.
Signal exposes a clean, low-level training API based on four primitives:
forward_backward- Compute gradients for a batchoptim_step- Update model weightssample- Generate text from current checkpointsave_state- Export LoRA adapter or merged model
This design gives you full control over the training loop while we handle the infrastructure. Inspired by pioneering work from Thinking Machines on LoRA fine-tuning.
import rewardsignal
# Initialize client
client = SignalClient(
api_key="sk-...", # Your API key
)
# List available models
models = client.list_models()
print(f"Available models: {models}")
# Create a training run
run = client.create_run(
base_model="meta-llama/Llama-3.2-3B",
lora_r=32,
lora_alpha=64,
learning_rate=3e-4,
)
# Prepare training data
batch = [
{"input": "The quick brown fox jumps over the lazy dog."},
{"output": "Machine learning is transforming technology."},
]
# Training loop
for step in range(10):
# Forward-backward pass
result = run.forward_backward(batch=batch)
print(f"Step {step}: Loss = {result['loss']:.4f}")
# Optimizer step
run.optim_step()
# Sample from model every 5 steps
if step % 5 == 0:
samples = run.sample(
prompts=["The meaning of life is"],
temperature=0.7,
)
print(f"Sample: {samples['outputs'][0]}")
# Save final model
artifact = run.save_state(mode="adapter", push_to_hub=False)
print(f"Saved to: {artifact['checkpoint_path']}")meta-llama/Llama-3.2-1Bmeta-llama/Llama-3.2-3Bmeta-llama/Llama-3.1-8Bmeta-llama/Llama-3.1-8B-Instructmeta-llama/Llama-3.1-70Bmeta-llama/Llama-3.3-70B-Instruct
google/gemma-2-2bgoogle/gemma-2-9bunsloth/gemma-2-9b-it-bnb-4bit
Qwen/Qwen2.5-3BQwen/Qwen2.5-7BQwen/Qwen2.5-14BQwen/Qwen2.5-32B
Want a specific model? Contact us and we'll add it!
Signal automatically allocates the optimal GPU resources based on your model size. You don't need to think about infrastructure - just specify your model and Signal handles the rest:
- < 1B parameters: L40S (single GPU)
- 1B - 7B parameters: L40S or A100 (single GPU)
- 7B - 13B parameters: A100-80GB (single GPU)
- 13B - 30B parameters: A100-80GB (2 GPUs)
- 30B - 70B parameters: A100-80GB (4 GPUs)
- > 70B parameters: A100-80GB (8 GPUs) or H100 (4 GPUs)
Models under 7B parameters automatically use single-GPU training:
- Efficient LoRA fine-tuning
- Fast iteration speed
- Cost-effective for smaller models
- Optional quantization (4-bit/8-bit)
Models over 7B parameters automatically use multi-GPU training with DataParallel:
- Distributed across 2-8 GPUs
- Transparent to your training code
- Same API primitives work seamlessly
- Quantization disabled (incompatible with DataParallel)
# Automatic allocation - no GPU config needed!
run = client.create_run(
base_model="Qwen/Qwen2.5-7B", # Automatically uses optimal GPU config
lora_r=32,
)
# Training loop is identical regardless of GPU count
result = run.forward_backward(batch=batch)
run.optim_step()You can optionally override the automatic allocation for specific use cases:
# Automatic allocation (recommended)
run = client.create_run(
base_model="Qwen/Qwen2.5-7B",
lora_r=32,
)
# Manual override - use 2x L40S instead
run = client.create_run(
base_model="Qwen/Qwen2.5-7B",
gpu_config="L40S:2", # Override auto-allocation
lora_r=32,
)
# Use H100 for maximum performance
run = client.create_run(
base_model="meta-llama/Llama-3.1-70B",
gpu_config="H100:4", # Use 4x H100 for fastest training
lora_r=32,
)Supported GPU types:
- L40S - NVIDIA L40S (48GB) - Cost-effective for most models
- A100 - NVIDIA A100 (40GB) - Good balance of performance and cost
- A100-80GB - NVIDIA A100 (80GB) - For larger models and multi-GPU
- H100 - NVIDIA H100 (80GB) - Maximum performance
- T4 - NVIDIA T4 (16GB) - Budget option for small models
- A10G - NVIDIA A10G (24GB) - Good for small-medium models
run = client.create_run(
base_model="meta-llama/Llama-3.1-8B",
lora_r=64, # Higher rank = more capacity
lora_alpha=128, # Usually 2x lora_r
lora_dropout=0.05,
lora_target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
)artifact = run.save_state(
mode="merged",
push_to_hub=True,
hub_model_id="your-username/your-model-name",
)Signal is designed to be self-hosted on your own infrastructure.
- Supabase Account - Sign up at supabase.com for auth and database
- Modal Account - Sign up at modal.com for GPU infrastructure
- HuggingFace Account - For model access tokens
- Python 3.12+ - Recommended version
# Clone the repository
git clone https://github.com/yourusername/signal.git
cd signal
# Install dependencies
pip install -r requirements.txt
# Setup Supabase (follow docs/SUPABASE_SETUP.md)
# 1. Create Supabase project
# 2. Run SQL migrations from docs/SUPABASE_SETUP.md
# 3. Configure Google OAuth
# Configure environment variables
cp .env.example .env
# Edit .env with your Supabase credentials
# Setup Modal
modal setup
# Deploy Modal functions
modal deploy modal_runtime/primitives.py
# Create API key in Supabase api_keys table or via script
# (See QUICKSTART.md for details)See QUICKSTART.md for detailed setup instructions.
# Run locally
python api/main.py
# Or with uvicorn
uvicorn api.main:app --host 0.0.0.0 --port 8000from rewardsignal import SignalClient
client = SignalClient(
api_key="sk-...", # Your API key
base_url="http://localhost:8000"
)
run = client.create_run(
base_model="meta-llama/Llama-3.2-3B",
lora_r=32,
)Edit config/models.yaml to add or modify supported models:
models:
- name: "your-org/your-model"
framework: "transformers" # or "unsloth"
gpu: "a100-80gb:4" # Specify GPU type and count
family: "llama"GPU Configuration Format:
- Single GPU:
"l40s:1","a100-80gb:1","h100:1" - Multi-GPU:
"a100-80gb:4","h100:8" - The system automatically uses FSDP for multi-GPU setups
Training Infrastructure:
Signal uses a hybrid approach for maximum efficiency:
-
Single-GPU (1 GPU): Direct PEFT with 8-bit quantization
- Faster for small models (1-8B parameters)
- Lower memory overhead
- Uses
bitsandbytesfor quantization
-
Multi-GPU (2-8 GPUs): Accelerate with FSDP
- Required for large models (70B+ parameters)
- Fully Sharded Data Parallel across all GPUs
- Automatic model sharding and gradient synchronization
- Mixed precision (bfloat16) training
Modal GPU Allocation:
- GPU resources are allocated dynamically per run
- Uses Modal's
with_options()for runtime GPU selection - Each training primitive (
forward_backward,optim_step) runs on the same GPU configuration - Inference always uses single GPU for efficiency
How It Works:
When configured, Signal API will:
- Before training: Validate user has sufficient credits via
/internal/validate-credits - Before training: Fetch user integrations (WandB, HuggingFace keys) via
/internal/get-integrations - After completion: Deduct credits via
/internal/deduct-credits
Self-Hosting Without Billing:
Leave FRONTIER_BACKEND_URL empty to run Signal without credit management. All training operations will work normally, but credit validation and integration management will be disabled.
Database (Supabase PostgreSQL):
- User profiles and authentication
- API keys (hashed)
- Run metadata and configuration
- Training metrics and logs
Files (Modal Volumes):
/data/runs/{user_id}/{run_id}/
├── config.json # Run configuration
├── lora_adapters/ # LoRA checkpoints by step
├── optimizer_state.pt # Optimizer state
├── gradients/ # Saved gradients
└── checkpoints/ # Exported checkpointsWhen setting up your Supabase database:
- Run the RLS migration:
supabase/migrations/20250110000001_rls_security.sql - Use
SUPABASE_ANON_KEYin your.env(not service role key) - RLS policies automatically protect your data
See supabase/SETUP.md for detailed migration instructions.
See QUICKSTART.md for detailed setup instructions.
# 1. Setup Modal
modal setup
modal secret create secrets-hf-wandb HF_TOKEN=your_hf_token
# 2. Deploy functions
modal deploy modal_runtime/primitives.py
# 3. Start API (in another terminal)
uvicorn api.main:app --host 0.0.0.0 --port 8000
# 4. Use the SDKWe adore contributions!
See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE file for details.
- GitHub Issues: Bug reports and feature requests
- Documentation: See
QUICKSTART.mdanddocs/folder
Built with:
- Modal - Serverless GPU infrastructure
- PyTorch - Deep learning framework
- Transformers - Model library
- PEFT - LoRA implementation
- Unsloth - Fast fine-tuning
Made with ❤️ for the AI research and engineer community