A fully on-device AI assistant for Android that runs Qwen2.5-1.5B / Llama-3.2-3B INT4 on the Snapdragon 8 Elite Hexagon NPU with zero internet permission. Features a 4-tier inference router, on-device tool layer, and kernel-enforced privacy -- your data never leaves the device.
Demo video:
screen-20260607-182502.mp4— live run on a Snapdragon 8 Elite device showing tier routing, on-device tool calls, and the privacy dashboard.
MyAI uses a 4-tier inference router to deliver sub-second responses for common queries while reserving the full LLM for open-ended generation:
| Tier | Strategy | Latency | Description |
|---|---|---|---|
| Tier 0 | Keyword Trie | <1 ms | Prefix-tree lookup for greetings, FAQs, identity questions |
| Tier 1 | Learning Cache | <10 ms | LRU cache of previous LLM responses for repeated queries |
| Tier 1.5 | Deterministic Tools | <10 ms | Structured JSON tool-calls (calculator, datetime, calendar, etc.) |
| Tier 2 | INT4 LLM on NPU | ~30 tok/s | Qwen2.5-1.5B or Llama-3.2-3B (w4a16) via Qualcomm Genie SDK |
The router evaluates tiers top-down and short-circuits at the first match, minimizing NPU wake-ups and power draw.
- No INTERNET permission in AndroidManifest.xml -- the kernel blocks all network syscalls
- Gradle CI gate (
assertNoInternettask) fails the build if INTERNET is ever added - TrafficStats verification confirms zero bytes transmitted at runtime
- All inference, tool execution, and data storage happen entirely on-device
The agent loop detects tool-call intent from the LLM's structured JSON output and dispatches to local tool implementations:
| Tool | Capability |
|---|---|
CalculatorTool |
Arithmetic evaluation via structured expressions |
DateTimeTool |
Current date, time, timezone queries |
CalendarTool |
Read/create calendar events via ContentResolver |
ContactsTool |
Contact lookup via ContentResolver |
AlarmTool |
Set alarms via AlarmManager intent |
Tools are registered in ToolRegistry and invoked by AgentLoop which manages the detect-execute-format cycle.
- ~30 tok/s sustained generation on Hexagon NPU v79 (Snapdragon 8 Elite)
- <1 ms Tier 0 keyword trie responses
- <10 ms Tier 1 cache hits and Tier 1.5 tool executions
- Zero network egress verified via TrafficStats and packet capture
- 43 passing JVM unit tests covering router logic, trie, cache, tools, and agent loop
- Android Studio Hedgehog or later
- Snapdragon 8 Elite device (SM8750) with USB debugging enabled
- Qualcomm Genie SDK (place in
android/genie-sdk/) - Quantized model files (place in
models/-- not tracked in git) - JDK 17+, Gradle 8.x
# Place Genie SDK and model files
cp -r /path/to/genie-sdk android/genie-sdk/
cp /path/to/qwen2.5-1.5b-w4a16.bin models/
# Build and install
cd android
./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apkandroid/app/src/main/java/com/sparq/myai/
├── MainActivity.kt # Chat UI and lifecycle
├── TierRouter.kt # 4-tier routing logic
├── KeywordTrie.kt # Tier 0: prefix-tree matcher
├── LearningCache.kt # Tier 1: LRU response cache
├── CannedResponses.kt # Static response templates
└── tools/
├── Tool.kt # Tool interface
├── ToolRegistry.kt # Tool discovery and dispatch
├── AgentLoop.kt # Tool-call detect-execute-format cycle
├── CalculatorTool.kt # Arithmetic evaluation
├── DateTimeTool.kt # Date/time queries
├── CalendarTool.kt # Calendar access
├── ContactsTool.kt # Contacts lookup
└── AlarmTool.kt # Alarm scheduling
android/app/src/test/java/com/sparq/myai/
├── TierRouterTest.kt # Router unit tests
├── KeywordTrieTest.kt # Trie unit tests
├── CannedResponsesTest.kt # Response template tests
└── tools/
├── CalculatorToolTest.kt
├── ToolRegistryTest.kt
└── AgentLoopTest.kt
This project was developed for the Qualcomm Sparq 2026 Hackathon (Edge + Cloud AI track).
