zuza

a warm little assistant that lives on your phone.

zuza is a private AI chat app for Android. Every conversation runs entirely on your device — no cloud, no account, no telemetry, no data leaving your phone. Pick an open-weight model from a curated catalog, download it over Wi-Fi, and chat. Works on budget phones that most AI apps refuse to touch.

Built with llama.cpp vendored natively for ARM, Jetpack Compose for the UI, and Room for persistence. One APK, seven CPU backend variants, zero external services.

How it works

graph LR
    U[User] -->|types| C[Compose UI]
    C -->|prompt| Z[zuza::Engine<br/>C++ / llama.cpp]
    Z -->|tokens| C
    C -->|persist| R[(Room DB)]
    M[GitHub manifest] -.->|catalog refresh| C
    HF[HuggingFace CDN] -.->|model download| D[DownloadService]
    D -->|.gguf file| Z

User types a message in the Compose UI
The chat screen builds a prompt using the active model's chat template
The prompt is fed through zuza::Engine (a C++ wrapper around llama.cpp's inference API) via JNI
Tokens stream back one at a time into the UI — Compose recomposes on each token
Conversations persist to a Room database; the model's KV cache is reused across turns
On context overflow (~60% of the window), a background summarization condenses older turns into a prose paragraph so the model never loses context

Features

Inference

Fat APK with 7 CPU variants — ships libggml-cpu-android_armv8.0_1.so through libggml-cpu-android_armv9.2_2.so. Runtime dispatch picks the best one for the current SoC: baseline for Cortex-A53, DOTPROD/FP16 for A76-class, i8mm/SVE2 for A715+.
Dynamic context window — n_ctx scales with device RAM (1024 on 3 GB → 8192 on 12 GB+) so budget phones aren't OOM'd and flagships get the full window.
Multi-turn KV cache reuse — turn 2+ appends to the existing cache via continueConversation; only the new user turn is tokenized + decoded.
Background summarization — when the cache crosses 60% capacity, older turns are summarized into a prose paragraph during the idle window between turns. The summary replaces those turns in the next prompt rebuild. If the user outruns the background path, an inline fallback fires at 85%.
Qwen 3 think-tag parsing — <think>...</think> blocks are hidden; the UI shows a pulsing "thinking" indicator until the real answer starts streaming.

Models

Remote catalog — the app fetches a JSON manifest from zuza-chat/models on GitHub at startup. Adding a model = editing the JSON; every install picks it up on next refresh. Falls back to a bundled list if unreachable.
Resumable downloads — HTTP Range + ETag resume. Partial .part files survive app kills and network drops; re-tap Download to continue.
Foreground download service — downloads run in an Android foreground service with an ongoing notification. Lock the screen, minimize the app, use other apps — the download keeps going.
GGUF magic validation — every downloaded file is checked for the GGUF magic bytes before it's declared ready. Silent CDN corruption or zero-filled responses are caught, not silently loaded.
RAM fit warnings — the picker checks totalMem against each model's runtime footprint and shows tight / won't fit badges with confirmation dialogs.

Personality

Three-tier system prompt — tiny models (LFM 2 350M) get a one-sentence prompt; mid-size models (Gemma 3 1B, Llama 3.2 1B) get a standard prompt; large models (Qwen 3 1.7B, Gemma 3 4B, Gemma 4) get a rich personality with backstory, opinions, and anti-corporate-speak guards.
Name personalization — first-run onboarding asks "what should I call you?"; the system prompt weaves the name into its instructions. Changeable in Settings.

UI / UX

Onboarding — first-launch welcome screen with name input, skip option, and privacy tagline.
Dark UI — cerulean accent, Inter variable font, Lucide bot icon.
Markdown rendering — assistant bubbles render bold, italic, code, fenced blocks, lists, headers via compose-richtext.
Saved conversations — Room-backed persistence. Every chat is reachable from the drawer with title, message count, and timestamp.
Settings — thread count, temperature, max tokens, cellular data toggle, your name, catalog source URL, manual refresh.

Model catalog

Model	Size	Template	Personality	RAM	Speed (budget)
LFM 2 350M	219 MB	ChatML	tiny	3 GB+	~8–12 tok/s
Gemma 3 1B	769 MB	Gemma 3	standard	4 GB+	~3–4 tok/s
Llama 3.2 1B	770 MB	Llama 3	standard	4 GB+	~3–4 tok/s
Qwen 3 1.7B	1.19 GB	ChatML	rich	4 GB+	~1–1.5 tok/s
Gemma 3 4B	2.32 GB	Gemma 3	rich	6 GB+	~0.5–1 tok/s
Gemma 4 E2B	3.22 GB	Gemma 4	rich	6 GB+	~0.3–0.6 tok/s
Gemma 4 E4B	5.03 GB	Gemma 4	rich	8 GB+	flagship only

Templates are implemented in Kotlin (ChatTemplate.kt), not via llama.cpp's built-in Jinja engine. Each variant has full unit tests. Gemma 4 uses different turn markers (<|turn> / <turn|>) from Gemma 3 (<start_of_turn> / <end_of_turn>) — mixing them up causes the model to hallucinate fake dialogues, which is how we discovered the difference.

The remote manifest lives at zuza-chat/models. See docs/MODELS.md for the schema, per-model notes, and how to add entries.

Architecture

chat.zuza/
├── MainActivity.kt                    app entry point + top-level state
├── engine/
│   ├── Zuza.kt                        singleton over libzuza.so (load/generate/stop)
│   ├── ZuzaNative.kt                  JNI external fun declarations
│   ├── ZuzaParams.kt                  ZuzaLoadParams + ZuzaGenParams
│   ├── ContextBudget.kt               dynamic n_ctx from device RAM
│   ├── ContextSummarizer.kt           two-tier memory: summarize older turns
│   ├── BudgetChecker.kt               soft (60%) / hard (85%) threshold logic
│   └── TokenEstimator.kt              rough char-to-token heuristic
├── data/
│   ├── catalog/
│   │   ├── ChatTemplate.kt            LLAMA3 / CHATML / PHI3 / GEMMA3 / GEMMA4
│   │   ├── ModelInfo.kt               data class for a single catalog entry
│   │   ├── ModelCatalog.kt            active catalog (mutableStateOf, replaceable)
│   │   ├── CatalogJson.kt             JSON parser for the remote manifest
│   │   └── RemoteCatalog.kt           fetch + cache + bootstrap logic
│   ├── download/
│   │   ├── DownloadService.kt         foreground service for background downloads
│   │   ├── DownloadStateRegistry.kt   process-wide SnapshotStateMap of progress
│   │   ├── ModelRepository.kt         HTTP Range + ETag resume, .part → .gguf
│   │   ├── DownloadState.kt           sealed interface (NotDownloaded/Downloading/...)
│   │   └── ResumeStore.kt             per-model ETag + totalBytes persistence
│   ├── conversations/
│   │   ├── ConversationStore.kt       save / load / delete + legacy JSON migration
│   │   └── room/                      DAO, entities, migrations, ZuzaDatabase
│   └── preferences/
│       └── SettingsStore.kt           SharedPreferences-backed flows
├── ui/
│   ├── theme/Theme.kt                 Inter + cerulean palette
│   ├── common/                        DottedRule, ZuzaSeal, CircleIconButton
│   ├── onboarding/OnboardingScreen.kt first-run name input
│   ├── chat/                          ChatScreen + 8 focused siblings
│   ├── models/                        ModelsScreen + ModelPill + RamFit
│   ├── drawer/                        ZuzaDrawer + row composables
│   ├── settings/                      SettingsScreen + controls
│   └── about/                         AboutScreen
├── util/DeviceRam.kt                  Context.deviceTotalRamBytes()
└── cpp/
    ├── zuza_engine.h                  public C++ class (zuza::Engine)
    ├── zuza_engine.cpp                implementation (load/generate/poll/stop)
    ├── zuza_jni.cpp                   thin JNI marshalling (~95 lines)
    ├── CMakeLists.txt                 build config + multi-variant toggle
    └── llama/                         vendored llama.cpp (untouched)

graph TD
    subgraph "Kotlin"
        MA[MainActivity] --> CS[ChatScreen]
        MA --> MS[ModelsScreen]
        MA --> SS[SettingsScreen]
        MA --> OS[OnboardingScreen]
        CS --> Z[Zuza singleton]
        CS --> CSm[ContextSummarizer]
        MS --> DS[DownloadService]
        DS --> MR[ModelRepository]
        DS --> DSR[DownloadStateRegistry]
        Z --> CB[ContextBudget]
        Z --> BC[BudgetChecker]
        MA --> RC[RemoteCatalog]
        RC --> MC[ModelCatalog]
        RC --> CJ[CatalogJson]
    end
    subgraph "C++ / NDK"
        Z --> ZN[ZuzaNative]
        ZN --> ZE[zuza::Engine]
        ZE --> LC[llama.cpp]
    end
    subgraph "Storage"
        MR --> FS[(filesDir/models/*.gguf)]
        CS --> Room[(Room: zuza.db)]
        RC --> Cache[(filesDir/catalog.json)]
    end
    subgraph "Network"
        MR --> HF[HuggingFace CDN]
        RC --> GH[GitHub raw]
    end

No ViewModels, no DI framework. State is hoisted to MainActivity in idiomatic Compose fashion. Unit-testable logic lives in pure Kotlin files with zero Compose or Android dependencies, covered by 149 JVM tests.

Building

Requirements:

Android Studio Ladybug or newer (or just the command-line SDK tools)
Android NDK r27.1.12297006
CMake 3.22+
JDK 17
~5 GB disk (llama.cpp compiles seven CPU variant .so files)

# Clone
git clone https://github.com/zuza-chat/zuza.git
cd zuza

# Build
./gradlew :app:assembleDebug

# Install on a connected device
./gradlew :app:installDebug

# Run the tests (149 JVM tests, ~3s)
./gradlew :app:testDebugUnitTest

First build takes 1–2 minutes (llama.cpp compiles once per variant). Incremental builds are seconds.

Tests

149 pure-JVM unit tests, no Robolectric, no emulator, ~3 seconds total.

Area	Tests	What they cover
ChatTemplate	12	Every template variant × begin/continue/wrap; GEMMA4 marker regression
PromptBuilder	9	Double-user-turn regression, summary-aware rebuild, empty-turns guard
SystemPrompt	12	Anonymous, personalized, rich, tiny; format guards; length caps
ContextBudget	15	RAM breakpoints, boundary conditions, degenerate inputs
TokenEstimator	5	Scaling, overhead, conservative bias
BudgetChecker	11	Hard/soft thresholds, custom ratios, zero-nCtx
ContextSummarizer	8	Prompt builder purity, system prompt constant, gen params
ConversationStore	15	CRUD, ordering, summary round-trip, legacy JSON migration
ModelRepository	9	200/206/200-fallback/404, GGUF magic, resume, delete
AssistantContent	9	Think-tag parsing, nested tags, incomplete streams
RamFit	10	Fine/Tight/WontFit thresholds
CatalogJson	15	Happy path, schema versions, per-entry validation, round-trip
RemoteCatalog	12	Remote success, network error, cache fallback, corrupt cache

# Run a single test class
./gradlew :app:testDebugUnitTest --tests "chat.zuza.engine.ContextBudgetTest"

# Run all tests in a package
./gradlew :app:testDebugUnitTest --tests "chat.zuza.data.catalog.*"

Device support

Tier	Example devices	Models that work well
Budget (3 GB)	Redmi 14, Galaxy A16, HONOR X5	LFM 2 350M, Gemma 3 1B
Mid-range (6–8 GB)	Pixel 10a, Galaxy A56, Poco X7, Nothing Phone 2a	+ Llama 3.2 1B, Qwen 3 1.7B, Gemma 3 4B
Upper mid (8–12 GB)	Pixel 10, Galaxy S25, OnePlus 13	+ Gemma 4 E2B
Flagship (12–16 GB)	Galaxy S26 Ultra, Pixel 10 Pro, OnePlus 14	Full catalog including Gemma 4 E4B

Privacy

zuza has exactly two network calls:

Model downloads — HTTPS GET to HuggingFace CDN URLs listed in the manifest
Catalog refresh — HTTPS GET to raw.githubusercontent.com to fetch the model list JSON

That's it. No analytics, no crash reporter, no telemetry, no feature flags, no account system, no server-side anything. The ACCESS_NETWORK_STATE permission detects metered connections for the cellular data warning. Inspect the source — there's nothing else to find.

Contributing

See docs/CONTRIBUTING.md for how to add a model, implement a new chat template, or extend the UI.

Third-party code

Dependency	License	What it does
llama.cpp	MIT	Vendored under `cpp/llama/`, compiled natively for ARM
Inter	OFL 1.1	Variable font for all text
Lucide	MIT	Bot icon (`res/drawable/ic_robot.xml`)
compose-richtext	Apache 2.0	Markdown rendering in assistant bubbles
OkHttp MockWebServer	Apache 2.0	Test-only: HTTP server for ModelRepository + RemoteCatalog tests

License

MIT — do whatever you want with the code, keep the copyright notice intact.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
docs		docs
gradle		gradle
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zuza

How it works

Features

Inference

Models

Personality

UI / UX

Model catalog

Architecture

Building

Tests

Device support

Privacy

Contributing

Third-party code

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zuza

How it works

Features

Inference

Models

Personality

UI / UX

Model catalog

Architecture

Building

Tests

Device support

Privacy

Contributing

Third-party code

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages