A Bittensor subnet for competitive model distillation of Qwen/Qwen3.5-35B-A3B (35B total, 3B active MoE).
Dashboard: distil.arbos.life
API: api.arbos.life
Subnet: Finney netuid 97
Miners distill the teacher into a smaller model (≤5.25B total params), upload to HuggingFace, and commit the repo link on-chain. One commitment per hotkey, permanently.
Validators evaluate by computing full-distribution KL-divergence on GPU. Lower KL = better distillation = higher rewards. Winner-take-all — best miner gets 100% of emissions.
The validator uses a king-of-the-hill architecture for efficient, high-confidence scoring:
-
Pre-checks (no GPU) — Every epoch (~10 min), all committed models are verified:
- Architecture compliance (≤5.25B params, vocab_size=248,320, no quantization)
- Duplicate detection — SHA256 hash of safetensors weights; identical weights to an existing model → permanently blacklisted. Earlier commitment (by block number) owns the hash.
- Integrity — Model must still be public and unchanged on HuggingFace
- Models that fail pre-checks are never sent to GPU — no wasted compute
-
King identification — The miner with the lowest KL score from state is the "king" (current emissions winner)
-
Challenger detection — Only models that haven't been evaluated yet are challengers. Already-evaluated models that didn't beat the king are not re-evaluated (their scores are final).
-
Head-to-head GPU eval — The king and all new challengers are scored together on the same 40 FineWeb prompts (block-seeded). Both models see identical teacher continuations, making the comparison fair. The king is only put on GPU when there's a challenger — no wasted compute on idle re-evaluation.
-
Epsilon threshold (1%) — A challenger must achieve KL divergence more than 1% lower than the king's to dethrone it. For example, if the king has KL=0.097, a challenger needs KL < 0.096 (= 0.097 × 0.99). This prevents noisy near-ties from flipping the winner every epoch and rewards meaningful improvements.
-
Weight setting — King gets weight=1.0, everyone else gets 0.0. Raw scores, no EMA smoothing. Weights are set on-chain immediately after each evaluation.
Why this is better than evaluating all models every epoch:
- 2x more prompts per model (40 vs 20) → tighter confidence intervals, lower variance
- GPU only runs when needed — no challengers means no GPU eval at all
- Fair comparison — king and challenger scored on identical prompts in the same run
- Epsilon prevents flip-flopping — the king holds unless clearly beaten
- Scales to many miners — 100 miners with 1 new challenger = 2 models evaluated, not 100
Models are permanently disqualified (KL=∞, $0 earnings) for:
- COPY — Same safetensors weights as another miner (SHA256 match). First committer owns the hash.
- REMOVED — Model deleted, made private, or weights changed after commitment
- INVALID — Fails architecture checks (too large, wrong tokenizer, quantized, etc.)
Disqualification reasons are shown on the dashboard and available via the API.
- SHA256 hash duplicate detection: Model weight hashes tracked forever; copies permanently blacklisted
- Logit fingerprinting: Even if hashes differ, models with identical KL distributions on the first 2 prompts are flagged as functional copies (cosine similarity > 0.9999 on per-position KL vectors)
- Commitment block priority: Earlier on-chain commitment wins hash ownership
- Integrity verification: Models verified public + unchanged before every weight-set
- MoE-aware param counting: Total params from safetensors metadata (not config estimates)
- Quantization rejected: GPTQ/AWQ/FP8 all blocked — architecture distillation only
- Block-seeded prompts: Deterministic from block number, unpredictable in advance
- Full-distribution KL: Scored on all 248,320 tokens, not top-k
- Bittensor wallet registered on subnet 97
- HuggingFace account for model hosting
- Training infrastructure (your choice)
Your model must:
- Use same tokenizer as Qwen3.5-35B-A3B (vocab_size=248,320)
- Have ≤ 5.25B total parameters (15% of teacher's 35B)
- Be in safetensors format (bf16/fp16)
- Be loadable via
AutoModelForCausalLM.from_pretrained() - No quantized models (GPTQ/AWQ/GGUF rejected)
- Unique weights — Cannot be identical to any previously committed model
pip install -e .
python miner.py \
--network finney \
--netuid 97 \
--wallet-name my_wallet \
--hotkey-name my_hotkey \
--model-repo your-username/your-distilled-modelTo change models, register a new hotkey.
| Model | Params | KL (nats) | Notes |
|---|---|---|---|
| Qwen3.5-4B | 4.66B | ~0.10–0.15 | Strong baseline |
| Qwen3.5-2B | 2.27B | ~0.12–0.16 | Competitive |
| Qwen3.5-0.8B | 0.87B | ~0.17–0.21 | Moderate |
These are untrained baselines — purpose-built distillations should do significantly better. Models with KL > 2.0 are disqualified.
Live data at https://api.arbos.life:
| Endpoint | Description |
|---|---|
GET / |
API overview |
GET /api/metagraph |
Full subnet metagraph (UIDs, stakes, weights, incentive) |
GET /api/commitments |
Miner model commitments (HF links + block numbers) |
GET /api/scores |
Current KL scores, disqualification reasons, last eval details |
GET /api/price |
Token price, emission, market data |
GET /api/health |
Service status |
All endpoints are public, no authentication required.
├── miner.py # One-shot commitment script
├── eval/
│ ├── kl_divergence.py # Full-distribution KL on GPU
│ ├── model_checker.py # Param counting, integrity, hash, duplicate detection
│ ├── dataset.py # FineWeb prompt loader (500 cached prompts)
│ └── scoring.py # Winner-take-all + disqualification tracking
├── api/
│ └── server.py # FastAPI dashboard backend
├── scripts/
│ ├── pod_eval.py # GPU eval runner (runs on remote pod)
│ ├── remote_validator.py # King-of-the-hill validator (Hetzner + Lium GPU)
│ └── run_validator.sh # PM2 wrapper
└── state/ # Persistent scores, hashes, disqualifications
- Hetzner server (secure): Wallet keys, chain access, weight setting
- Lium GPU pod (remote): Teacher/student forward passes, KL computation
- Wallet keys never leave the Hetzner server. The GPU pod has no chain access.
MIT