Skip to content

unn-Known1/PhoneBrain

Repository files navigation

PhoneBrain

Turn any modern Android phone into a local AI inference server your laptop talks to like a GPU — no cloud, no GPU, no friction.

Overview

PhoneBrain is an Android foreground service that runs an OpenAI-compatible REST API over USB or WiFi. It lets developers run local LLMs for code completion, chat, and reasoning without sending data to cloud APIs or needing a dedicated GPU on their laptop.

┌──────────────────────────────┐       ┌──────────────────────────────────┐
│         LAPTOP                │       │          ANDROID PHONE           │
│  Any OpenAI-compatible client │◄──────┤  Ktor HTTP Server (port 11434)  │
│  (Continue, Cursor, Open     │  USB  │  ┌────────────────────────────┐  │
│   WebUI, custom scripts)     │  or   │  │ Inference Router           │  │
│                              │  WiFi  │  │ ┌──────┐ ┌──────────────┐ │  │
└──────────────────────────────┘       │  │ │MLC   │ │Google LiteRT │ │  │
                                        │  │ │LLM   │ │-LM          │ │  │
                                        │  │ │(GGUF)│ │(.litertlm)  │ │  │
                                        │  │ └──────┘ └──────────────┘ │  │
                                        │  │ + Thermal Governor        │  │
                                        │  │ + Session + KV Cache      │  │
                                        │  │ + Bearer Auth             │  │
                                        │  └────────────────────────────┘  │
                                        └──────────────────────────────────┘

Quick Start

# Build
cd android && ./gradlew assembleDebug

# Install
adb install app/build/outputs/apk/debug/app-debug.apk

# Launch
adb shell am start -n com.phonebrain/.ui.onboarding.OnboardingActivity

# Port forward (USB mode)
adb forward tcp:11434 tcp:11434

# Test
curl http://localhost:11434/health

Features

  • Dual-engine inference — MLC LLM (GGUF, OpenCL) + Google LiteRT-LM (.litertlm, NPU), with CPU fallback
  • OpenAI-compatible API/v1/chat/completions with SSE streaming
  • USB or WiFi — localhost-only over ADB, bearer auth over WiFi
  • Thermal governor — 3-tier auto-management (green/yellow/red)
  • Resumable downloads — SHA256-verified model downloads via Android DownloadManager
  • Session management — multi-turn context with configurable expiry
  • KV cache — system prompt prefix reuse for 2–4x speedup
  • mDNS discovery — zero-config WiFi setup
  • On-device privacy — no prompts or responses ever leave the device

Project Structure

android/                          # Android app (Kotlin)
├── app/
│   ├── build.gradle.kts          # Dependencies & build config
│   ├── proguard-rules.pro
│   └── src/main/
│       ├── AndroidManifest.xml
│       └── java/com/phonebrain/
│           ├── auth/             # Bearer token management
│           ├── download/         # Model downloads & verification
│           ├── engine/           # Inference router & engine wrappers
│           ├── model/            # Data model entities
│           ├── server/           # Ktor HTTP server & routes
│           ├── service/          # Foreground service
│           ├── session/          # Session manager
│           ├── telemetry/        # Firebase Crashlytics
│           ├── thermal/          # Thermal governor
│           └── ui/               # Activities & fragments
├── mlc_engine_pack/              # Play Asset Delivery module
├── litert_engine_pack/           # Play Asset Delivery module
├── build.gradle.kts
└── settings.gradle.kts

specs/001-phonebrain-app/          # Feature specification
├── spec.md                        # Requirements & scenarios
├── plan.md                        # Implementation plan
├── research.md                    # Technical research
├── data-model.md                  # Entity definitions
├── tasks.md                       # Task breakdown (57 tasks)
├── contracts/
│   └── openai-api.md              # API contract
├── checklists/
│   ├── requirements.md            # Spec quality checklist
│   └── spec-coverage.md           # Domain coverage checklist
├── reports/
│   └── verification-guide.md      # Manual verification steps
└── quickstart.md                  # Quick start guide

.specify/
├── memory/constitution.md         # Project constitution
└── templates/                     # Speckit workflow templates

Requirements

  • Android: API 26+ (min), API 34+ (target)
  • Hardware: Snapdragon 8 Gen 1+ / Dimensity 9000+ / Google Tensor G2+ recommended for acceptable performance
  • Build: Android Studio Hedgehog+, Java 17+, Gradle 8.2+

Tech Stack

Component Technology
Language Kotlin
Server Ktor (Netty)
GGUF Engine MLC LLM (OpenCL)
.litertlm Engine Google LiteRT-LM
Downloads Android DownloadManager + WorkManager
Crash Reporting Firebase Crashlytics
Service Discovery Android NsdManager (mDNS)
APK Delivery Play Asset Delivery

License

Apache 2.0

About

Android AI inference server with OpenAI-compatible API. Turn your phone into a local LLM co-processor — runs MLC LLM (GGUF) + LiteRT-LM (.litertlm) with dual-engine routing, bearer auth, thermal governor, KV cache, and resumable model downloads. No cloud, no GPU, no friction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages