A minimal web UI included with the project. Branding, colours and layout are intended to be customised per client.
A lightweight, privacy-focused framework for building small AI agents that run entirely on your own hardware. This project is designed to be modular, simple to extend, and suitable for self-hosted or offline environments.
- OpenMemory — provides long‑term memory for AI systems. Self‑hosted. Local‑first. Explainable. Scalable. A full cognitive memory engine — not a vector database.
Runtime configuration is provided via environment variables (optionally loaded from a .env file using shell tooling):
LAC_WEB_HOST– bind address for the Flask web UI (default:0.0.0.0).LAC_WEB_PORT– port for the web UI (default:5001).OPENMEMORY_URL/OM_BASE_URL– base URL for the OpenMemory backend (default:http://localhost:8080).OPENMEMORY_API_KEY/OM_API_KEY– optional API key for authenticated OpenMemory instances.
See .env.example for a template and .env for a sample local dev setup.
A shared Agent abstraction (used by both CLI and web UI) that:
- wraps routing, prompt construction, inference, and memory
- supports a single-model, multi-role pipeline (Interpreter/Planner + Narrator) for GPT‑OSS aliases via an "orchestrator" mode
- threads OpenMemory context and metadata (domains, channels, session kinds) through each turn
A clean wrapper around a local or remote LLM HTTP endpoint.
Supports:
- synchronous inference
- streaming responses
- generic JSON
/completionstyle APIs
Config-driven routing via router.yaml / router.example.yaml:
- map human-friendly aliases (e.g.
general,code-python) to models - resolve the correct base URL and
/completionendpoint - carry defaults like
speaker,stream, andsystem_prompt
Model-aware prompt construction based on alias configuration.
Built-in formats:
llama-chat– system/user/assistant header stylecodellama/code– instruction-style coding promptsphind– Alpaca/Vicuna-style sectionsphi4–<|system|>/<|user|>/<|assistant|>formatplain– simple system + user concatenationgpt-oss-harmony– Harmony prompt format for OpenAI GPT‑OSS models, used by the GPT‑OSS orchestrator for interpreter + narrator phases
- Optional
MemoryStoreabstraction, withOpenMemoryStoreproviding HTTP integration to a self-hosted OpenMemory backend. - Per-alias controls to enable memory and tune recall depth.
- Retrieved memories are injected as a compact "Relevant past context" block, including domain/channel/session tags.
- Stored interactions are enriched with metadata and tags so the OpenMemory dashboard can distinguish professional vs social, interactive vs automation, and new vs continuation turns.
A simple text-to-speech hook is provided via tts.speak_text.
By default, this just logs the text that would be spoken. In a real deployment, this function can be swapped or extended to use a concrete TTS engine (e.g. edge-tts, ElevenLabs, local speech synthesis).
A minimal Flask-based web interface for interacting with the agent:
- Chat-style interface
- Route alias selection (e.g.
gpt-ossas default,llama,code-python) - Professional / Social memory profile toggle for GPT‑OSS, mapping to
gpt-oss(professional) andgpt-oss-social(social) aliases while sharing the same model backend - Clear branding placeholders so client-specific themes can be applied
An example n8n workflow is included under examples/n8n/local-agent-webhook.json.
It exposes a HTTP webhook (/local-agent) that:
- accepts
{"text": "..."}as input - forwards the request to the local
/chatendpoint - returns the LLM response as either text or JSON
- Local-first, privacy-first design
- Minimal and transparent dependencies
- Modular architecture (LLM runner, routing, prompts, tools, TTS)
- Single-model, multi-role orchestration with long-term memory
- Professional / social personality tracks backed by tagged memory
- Clear, maintainable code
- Step-by-step expansion with tested milestones
- CLI and web UI both use a shared
Agentabstraction. - Long-term memory is fully integrated via OpenMemory, with per-alias opt-in and confirmed end-to-end wiring (including API key auth).
- Memory entries are tagged with
memory_domain(e.g. professional vs social),channel(interactive vs automation), andsession_kindmetadata, and key fields are projected to OpenMemorytagsfor visibility in the dashboard. - Web UI exposes GPT‑OSS as the default route with a professional/social profile switch that maps to
gpt-ossandgpt-oss-socialaliases. - GPT‑OSS / Harmony prompt format is supported via the
gpt-oss-harmonybuilder, including a two-pass Interpreter/Planner + Narrator orchestrator for GPT‑OSS aliases. - A global inference lock in
LLMRunnerserialises all LLM calls (CLI, web UI, automation), providing simple queueing semantics for a single-GPU deployment. - Interpreter output is parsed into a structured
InterpreterResultand a skeletonToolPlanis derived for future tool execution, but no tools are executed yet. - All existing modules are fully functional and tested independently. Each commit represents a clean, working milestone that can be extended safely.
- Tool/action layer
- Tool interface + registry
- Executor that uses
ToolPlanand simple policies (e.g. execute only for automation channels)
- Optional background task runner for scheduled or long-running actions
- Speech input (STT) integration for voice-driven interactions
- Expanded TTS integration surfaced in the web UI
