Turn any modern Android phone into a local AI inference server your laptop talks to like a GPU — no cloud, no GPU, no friction.
PhoneBrain is an Android foreground service that runs an OpenAI-compatible REST API over USB or WiFi. It lets developers run local LLMs for code completion, chat, and reasoning without sending data to cloud APIs or needing a dedicated GPU on their laptop.
┌──────────────────────────────┐ ┌──────────────────────────────────┐
│ LAPTOP │ │ ANDROID PHONE │
│ Any OpenAI-compatible client │◄──────┤ Ktor HTTP Server (port 11434) │
│ (Continue, Cursor, Open │ USB │ ┌────────────────────────────┐ │
│ WebUI, custom scripts) │ or │ │ Inference Router │ │
│ │ WiFi │ │ ┌──────┐ ┌──────────────┐ │ │
└──────────────────────────────┘ │ │ │MLC │ │Google LiteRT │ │ │
│ │ │LLM │ │-LM │ │ │
│ │ │(GGUF)│ │(.litertlm) │ │ │
│ │ └──────┘ └──────────────┘ │ │
│ │ + Thermal Governor │ │
│ │ + Session + KV Cache │ │
│ │ + Bearer Auth │ │
│ └────────────────────────────┘ │
└──────────────────────────────────┘
# Build
cd android && ./gradlew assembleDebug
# Install
adb install app/build/outputs/apk/debug/app-debug.apk
# Launch
adb shell am start -n com.phonebrain/.ui.onboarding.OnboardingActivity
# Port forward (USB mode)
adb forward tcp:11434 tcp:11434
# Test
curl http://localhost:11434/health- Dual-engine inference — MLC LLM (GGUF, OpenCL) + Google LiteRT-LM (.litertlm, NPU), with CPU fallback
- OpenAI-compatible API —
/v1/chat/completionswith SSE streaming - USB or WiFi — localhost-only over ADB, bearer auth over WiFi
- Thermal governor — 3-tier auto-management (green/yellow/red)
- Resumable downloads — SHA256-verified model downloads via Android DownloadManager
- Session management — multi-turn context with configurable expiry
- KV cache — system prompt prefix reuse for 2–4x speedup
- mDNS discovery — zero-config WiFi setup
- On-device privacy — no prompts or responses ever leave the device
android/ # Android app (Kotlin)
├── app/
│ ├── build.gradle.kts # Dependencies & build config
│ ├── proguard-rules.pro
│ └── src/main/
│ ├── AndroidManifest.xml
│ └── java/com/phonebrain/
│ ├── auth/ # Bearer token management
│ ├── download/ # Model downloads & verification
│ ├── engine/ # Inference router & engine wrappers
│ ├── model/ # Data model entities
│ ├── server/ # Ktor HTTP server & routes
│ ├── service/ # Foreground service
│ ├── session/ # Session manager
│ ├── telemetry/ # Firebase Crashlytics
│ ├── thermal/ # Thermal governor
│ └── ui/ # Activities & fragments
├── mlc_engine_pack/ # Play Asset Delivery module
├── litert_engine_pack/ # Play Asset Delivery module
├── build.gradle.kts
└── settings.gradle.kts
specs/001-phonebrain-app/ # Feature specification
├── spec.md # Requirements & scenarios
├── plan.md # Implementation plan
├── research.md # Technical research
├── data-model.md # Entity definitions
├── tasks.md # Task breakdown (57 tasks)
├── contracts/
│ └── openai-api.md # API contract
├── checklists/
│ ├── requirements.md # Spec quality checklist
│ └── spec-coverage.md # Domain coverage checklist
├── reports/
│ └── verification-guide.md # Manual verification steps
└── quickstart.md # Quick start guide
.specify/
├── memory/constitution.md # Project constitution
└── templates/ # Speckit workflow templates
- Android: API 26+ (min), API 34+ (target)
- Hardware: Snapdragon 8 Gen 1+ / Dimensity 9000+ / Google Tensor G2+ recommended for acceptable performance
- Build: Android Studio Hedgehog+, Java 17+, Gradle 8.2+
| Component | Technology |
|---|---|
| Language | Kotlin |
| Server | Ktor (Netty) |
| GGUF Engine | MLC LLM (OpenCL) |
| .litertlm Engine | Google LiteRT-LM |
| Downloads | Android DownloadManager + WorkManager |
| Crash Reporting | Firebase Crashlytics |
| Service Discovery | Android NsdManager (mDNS) |
| APK Delivery | Play Asset Delivery |
Apache 2.0