Skip to content

๐Ÿง  Offline AI Chat for Android โ€” Run Llama.cpp locally in Java with no internet, no cloud, full privacy.

License

Notifications You must be signed in to change notification settings

weaktogeek/llama.cpp-android-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  Offline.AI

Offline AI Chat for Android โ€” Run llama.cpp locally in Java with no internet, no cloud, full privacy.

Android Language LLM Min%20SDK Status


๐Ÿ“– Overview

Offline.AI is an Android app that runs llama.cpp models fully on-device, written in Java and integrated through JNI (Java Native Interface).
It provides an offline AI chat experience โ€” no internet required, no cloud inference, and complete data privacy.

This project showcases how to embed llama.cpp inside an Android app, load GGUF models, and perform text generation locally.

โš ๏ธ Note: This is a proof-of-concept (POC) intended for learning and experimentation, not a production-ready app.


โœจ Features

  • โœ… 100% offline LLM inference โ€” no network calls
  • โœ… Java + JNI bridge to native llama.cpp
  • โœ… Local model loader for GGUF models
  • โœ… Streaming chat interface built with RecyclerView
  • โœ… Works on Android 12+ (API 31+)
  • โœ… Privacy-first design โ€” your data never leaves your phone

๐Ÿš€ Quick Start (Alternative: Install APK)

If you donโ€™t want to build, you can simply download the APK from the repositoryโ€™s Releases section, install it on your device, and run it directly.

Settings โ†’ Security โ†’ allow installing apps from unknown sources (if prompted).


๐Ÿ“ฆ Prerequisites

  1. Clone llama.cpp (native engine used by the app):
    git clone https://github.com/ggml-org/llama.cpp
  2. Android Studio (Electric Eel or newer) with Android NDK and CMake components installed.
  3. Android device running Android 12+ (API 31+) with at least 4โ€“6 GB RAM recommended.
  4. A GGUF model (e.g., Llama 3.2 1B).

๐Ÿ› ๏ธ Build & Run (Step-by-step)

1) Clone this repository

git clone https://github.com/weaktogeek/llama.cpp-android-java.git
cd llama.cpp-android-java

2) Ensure youโ€™re on the main branch

git checkout main

3) Open the project in Android Studio

  • File โ†’ Openโ€ฆ โ†’ select the project folder.
  • Let Gradle sync and the NDK/CMake components finish downloading if prompted.

4) Configure the native build path (CMakeLists.txt)

  • Open the llama module โ†’ src/main/cpp/ โ†’ CMakeLists.txt.
  • At line 36, update the path that points to your local llama.cpp build directory (the repo you cloned in Prerequisites).
    • Example (adjust to your machine):
      # Example: point this to your local llama.cpp build dir
      set(LLAMA_BUILD_DIR "/absolute/path/to/llama.cpp/build-llama")
  • If build-llama does not exist yet, create it or adjust the path to the correct native source/build location within your cloned llama.cpp repo.

5) Sync & Build

  • Click Sync Project with Gradle Files.
  • Select a physical device (recommended) or compatible emulator (x86_64, plenty of RAM).
  • Click Run โ–ถ to build and install the app.

6) First Launch

When the app launches:

  1. Grant storage permission (used only to let you pick model files from device storage).
  2. Prepare a GGUF model (example: llama-3.2-1b-instruct.Q4_K_M.gguf).
    • Place it anywhere accessible on your device (e.g., Downloads/).

โ–ถ๏ธ How to Use

  1. Open the app and grant storage permission.
  2. Download a small GGUF model (e.g., Llama 3.2 1B) to your device.
  3. Tap Load Model and select the downloaded .gguf file.
  4. Wait for initialization; once the model is loaded, youโ€™re good to go.
  5. Enter your prompt and chat locally โ€” all inference stays on-device.

โณ Initial load may take some time depending on device performance and model size.


โ“ Troubleshooting

  • Build fails: NDK/CMake not found
    Open Android Studio โ†’ SDK Manager โ†’ SDK Tools โ†’ install NDK, CMake, and LLDB.

  • CMakeLists.txt path error at line 36
    Make sure LLAMA_BUILD_DIR (or equivalent variable) points to your actual local path of the llama.cpp repo (e.g., /Users/you/dev/llama.cpp/build-llama).

  • App crashes on model load
    Use a smaller model (e.g., 1B or 3B quantized GGUF), close background apps, and ensure 4โ€“6 GB free RAM.

  • Very slow inference
    Smaller/quantized models run faster. Multi-threading and acceleration toggles may be limited in this POC.


๐Ÿงญ Roadmap

  • Add multi-threaded inference settings
  • Add token streaming with partial text updates
  • UI: voice input + TTS reply
  • Support model quantization selector
  • Optional Vulkan acceleration toggle

๐Ÿ”’ Privacy

  • No network calls
  • No analytics or telemetry
  • All prompts and generations stay on-device

๐Ÿ™ Credits

About

๐Ÿง  Offline AI Chat for Android โ€” Run Llama.cpp locally in Java with no internet, no cloud, full privacy.

Resources

License

Stars

Watchers

Forks

Packages

No packages published