TimeLayer is a local-first personal AI system that focuses on long-term conversation continuity and auditable memory.
It’s built around one simple idea:
Keep an append-only timeline of what happened, then build higher-level memory layers (daily/weekly/monthly + structured facts) on top of it — all reconstructible.
- Append-only raw timeline (
logs/*.jsonl): every user/assistant line is stored. - Time-layered summaries (
*.daily.json,*.weekly.json,*.monthly.json): compress history over time. - Structured facts with workflow:
pending → remember/reject → conflict → history.
- Semantic search over your own memory (summaries + facts).
- Optional rerank stage (HTTP call) with an explicit intent gate to avoid negative utility.
- Context audit endpoint: see exactly which blocks were injected into the prompt and why.
- CLI (
cmd/local-ai): stdin chat + slash commands - Web UI (
cmd/local-ai-web): browser chat (SSE streaming) + Facts Center + Context Audit
Note: the diagram keeps some historical naming (e.g. “Gin”), while the current implementation uses Go
net/http+http.ServeMux.
Also, “summaries/*.json” in the diagram is a logical layer — the current implementation writes summary JSON files underlogs/(see “Storage layout”).
---
config:
theme: neo
layout: elk
---
flowchart TB
subgraph CLI[" "]
direction TB
CLI_T["Command Line (stdin)<br><br>"]
CLI_PAD[" "]
CLI_IN["Read single line / multiline fence"]
CLI_SLASH{"/ Command?"}
CLI_CMD["handleCommand()"]
CLI_CHAT["Chat() entry"]
end
subgraph WEB["WEB (Gin HTTP)"]
HTTP_REQ["HTTP request"]
HTTP_ROUTER["Router handler"]
WEB_CHAT["/api/chat/stream (SSE)"]
WEB_AUDIT["/api/context/audit"]
WEB_FACTS["/api/facts/*"]
WEB_STATIC["Static assets (app.js / css / html)"]
end
subgraph CHAT_TURN_BOX[" "]
direction TB
CHAT_T["Single chat turn (long-term conversation)<br><br>"]
CHAT_PAD[" "]
INTENT["ParseAutoFactsIntent()<br>remember / forget / none"]
REM_OP["ProposePendingRememberFact()<br>(write/update pending_facts)"]
FORGET_OP["ForgetFact()<br>(mark user_facts inactive + record history)"]
LWU["LogWriter.WriteUser()"]
SYS["Build system prompt<br>(+ rule: do NOT say 'remembered' in user-visible output)"]
CTX["BuildChatContext()"]
MSGS["Assemble context messages"]
STREAM["LLM streaming call"]
SAN["Sanitize final assistant text<br>(strip 'remembered' etc. if present)"]
IMPL["maybeAutoProposePendingFromUserInput()<br>(implicit facts: user + assistant pair)"]
LWA["LogWriter.WriteAssistant()"]
end
subgraph CTX_BUILD_BOX[" "]
direction TB
CTX_T["Context building (BuildChatContext)<br><br>"]
CTX_PAD[" "]
PVAL["Load policy config<br>(RecentMaxLines, SearchTopK, etc.)"]
F0["Load active user facts<br>(user_facts active)"]
F1["Load daily summary<br>(filesystem)"]
F2["SearchWithScore()<br>(embedding scan + optional rerank)"]
F3["Load recent raw dialog lines<br>(tail of jsonl by RecentMaxLines)"]
RESOLVE["ResolvePromptBlocks()<br>+ record debug STEPS"]
PB["PromptBlock list"]
end
subgraph SEARCH_BOX[" "]
direction TB
SEARCH_T["Retrieval (SearchWithScore)"]
EMB["Embed query"]
SCAN["Scan embeddings of summaries + facts"]
SORT1["Sort by embedding score"]
TOPN["Take Top N"]
GATE{"Rerank enabled<br>and conditions met?"}
SKIP["Skip rerank<br>(with reason)"]
RR_PROXY["HTTP call rerank-proxy<br>(/v1/rerank_text)"]
SORT2["Sort by rerank score"]
OUT["Return Top K hits"]
end
subgraph RERANK_TOOLS_BOX[" "]
direction TB
RERANK_T["Rerank toolchain (Python + C++)<br><br>"]
RPKG["rerank-proxy (Python / FastAPI)<br>API: /v1/rerank_text"]
TOK["HF tokenizer<br>(query + doc → input_ids / attention_mask / token_type_ids?)"]
CALL_CPP["Forward to C++ rerank_http<br>CPP_RERANK_URL=/v1/rerank"]
CPP["rerank_http (C++ / ONNX Runtime)<br>API: /v1/rerank"]
ORT["ORT Session.Run() inference<br>(CPU / CoreML EP optional)"]
SCORES["Return scores[]"]
end
subgraph ROLLUP_BOX[" "]
direction TB
ROLLUP_T["Daily rollup / summary"]
DAY{"Day changed?"}
NOOP["No rollup"]
RU["rollupAndArchive()<br>(archive yesterday logs)"]
DAILY["ensureDaily()<br>(generate daily summary json)"]
PEND_ING["EnsurePendingFactsFromDailyJSON()<br>(user_facts_explicit → pending_facts)"]
UPS_SUM["Upsert summaries table"]
UPS_EMB["Upsert embeddings table"]
LINT["Summary lint warnings<br>(e.g. 'possibly')"]
end
subgraph FACTS["Facts Center (API + operations)"]
COUNT["GET /api/facts/counts<br>(pending / conflicts / active / history)"]
LP["GET /api/facts/pending<br>(grouped + scored)"]
LC["GET /api/facts/conflicts"]
LA["GET /api/facts/active"]
LH["GET /api/facts/history"]
REMEMBER["POST /api/facts/pending/:id/remember"]
REJECT["POST /api/facts/pending/:id/reject"]
RESOLVE_CF["POST /api/facts/conflicts/:id/resolve<br>(keep / replace)"]
UF["Write/update user_facts (active)"]
CF["Write/update conflicts"]
HIST["Append user_fact_history"]
SYNC["Sync facts to search index<br>(upsertSummary + embedding)"]
end
subgraph AUDIT_BOX[" "]
direction TB
AUDIT_T["Context audit (Debug)"]
A1["Load policy values"]
A2["Load facts"]
A3["Load recent raw dialog"]
A4["Run SearchWithScore"]
A5["Return blocks + steps + hits"]
end
subgraph STORE_BOX[" "]
direction TB
STORE_T["Storage<br><br>"]
STORE_PAD[" "]
FS["Filesystem<br>logs/*.jsonl<br>summaries/*.json<br>archives/*"]
DB["SQLite<br>summaries, embeddings,<br>user_facts, pending_facts,<br>conflicts, history"]
end
subgraph UI_BOX[" "]
direction TB
UI_T["Web UI (app.js)"]
UI_STREAM["Render SSE tokens<br>(strip 'remembered' prefix during stream)"]
UI_REFRESH["fetchFactCounts()<br>(after stream & facts ops)<br>(update badges / indicators)"]
end
CLI_T ~~~ CLI_PAD
CLI_PAD ~~~ CLI_IN
U["User input"] --> CLI_IN & CLI_IN & HTTP_REQ
CLI_IN --> CLI_SLASH & CLI_SLASH
CLI_SLASH -- Yes --> CLI_CMD
CLI_SLASH -- No --> CLI_CHAT
HTTP_REQ --> HTTP_ROUTER
HTTP_ROUTER --> WEB_CHAT & WEB_AUDIT & WEB_FACTS & WEB_STATIC
CHAT_T ~~~ CHAT_PAD
CHAT_PAD ~~~ INTENT
INTENT -- remember --> REM_OP
REM_OP --> LWU
INTENT -- forget --> FORGET_OP
INTENT -- none --> LWU
LWU --> SYS & FS
SYS --> CTX
CTX --> MSGS & PVAL & F0 & F1 & F2 & F3
MSGS --> STREAM
STREAM --> SAN
SAN --> IMPL
IMPL --> LWA
CLI_CHAT --> INTENT
WEB_CHAT --> INTENT & UI_STREAM
CTX_T ~~~ CTX_PAD
CTX_PAD ~~~ PVAL
PVAL --> RESOLVE
F0 --> RESOLVE & DB
F1 --> RESOLVE & FS
F2 --> RESOLVE & EMB
F3 --> RESOLVE & FS
RESOLVE --> PB
PB --> MSGS
SEARCH_T ~~~ EMB
EMB --> SCAN
SCAN --> SORT1
SORT1 --> TOPN
TOPN --> GATE
GATE -- No --> SKIP
SKIP --> OUT
GATE -- Yes --> RR_PROXY
RR_PROXY --> SORT2
SORT2 --> OUT
%% Rerank toolchain wiring
RR_PROXY --> RPKG
RPKG --> TOK
TOK --> CALL_CPP
CALL_CPP --> CPP
CPP --> ORT
ORT --> SCORES
SCORES --> RR_PROXY
ROLLUP_T ~~~ DAY
LWA --> DAY & FS
DAY -- No --> NOOP
DAY -- Yes --> RU
RU --> DAILY & FS
DAILY --> LINT
LINT --> PEND_ING
PEND_ING --> UPS_SUM & DB
UPS_SUM --> UPS_EMB & DB
WEB_FACTS --> COUNT & LP & LC & LA & LH & UI_REFRESH
REMEMBER --> UF & HIST
UF --> SYNC & CF
REJECT --> HIST
RESOLVE_CF --> UF & HIST
CF --> HIST
AUDIT_T ~~~ A1
WEB_AUDIT --> A1
A1 --> A2
A2 --> A3
A3 --> A4
A4 --> A5
STORE_T ~~~ STORE_PAD
STORE_PAD ~~~ FS
UPS_EMB --> DB
LP --> DB
LC --> DB
LA --> DB
LH --> DB
SYNC --> DB
UI_T ~~~ UI_STREAM
UI_STREAM --> UI_REFRESH
UI_REFRESH --> COUNT
CLI_T:::sgtitle
CLI_PAD:::sgtitle
CHAT_T:::sgtitle
CHAT_PAD:::sgtitle
CTX_T:::sgtitle
CTX_PAD:::sgtitle
SEARCH_T:::sgtitle
ROLLUP_T:::sgtitle
AUDIT_T:::sgtitle
STORE_T:::sgtitle
STORE_PAD:::sgtitle
UI_T:::sgtitle
RERANK_T:::sgtitle
classDef sgtitle fill:transparent,stroke:transparent,stroke-width:0px
- Raw chat lines are written to
~/local-ai/logs/YYYY-MM-DD.jsonl. - This file is the source of truth. Everything else can be rebuilt from it.
- Daily summary: derived from today’s
jsonltimeline. - Weekly summary: derived from daily summaries.
- Monthly summary: derived from weekly summaries.
Each summary is stored twice:
- As a JSON file (
logs/<period>.daily.json, etc.) for human inspection / backup. - In SQLite (
summariestable) for retrieval & embedding indexing.
Facts are treated as structured, controlled memory, not free-form chat:
pending_facts: candidates that are proposed (from explicit “remember” intents, or from summaries).user_facts: facts that are active and used in context injection.conflicts: when a new fact contradicts an existing active fact for the same subject/key.user_fact_history: full audit trail of remember/reject/forget/resolve operations.
This lets you keep long-term state stable, reviewable, and reversible.
A single chat turn follows this sequence (high-level):
- Parse intent from the user input
remember / forget / none
- Write raw user line to JSONL (append-only)
- Build system prompt (includes safety rules and output constraints)
- Build context (BuildChatContext)
- active facts
- daily summary
- retrieval hits (SearchWithScore)
- recent raw dialog tail (RecentMaxLines)
- LLM call (streaming for Web; once for CLI)
- Sanitize output (strip internal “remembered …” artifacts if present)
- Write assistant line to JSONL (append-only)
- Optional implicit pending proposal (heuristic: user+assistant pair can propose a pending fact)
Search is purely over your own persisted memory:
-
Embed the query via
TIMELAYER_EMBED_URL(POST{"input": "..."}) -
Scan all stored embedding vectors (
embeddingsjoined tosummaries) -
Compute cosine similarity, filter by:
SearchMinScore(default0.75)
-
Sort by embedding score
-
Take top-N candidates (
RerankTopN, default20) -
Optional rerank (precision pass, gated to keep latency down):
- Enabled when
EnableRerank=true - Hard override (testing/benchmarking):
TIMELAYER_RERANK_FORCE=1(rerank whenever there are ≥2 hits) - Minimum candidate set:
len(hits) >= RerankMinBatch - Gate mode:
TIMELAYER_RERANK_MODE=conservative|ambiguous|smart|always(defaultsmart)conservative: rerank only when embedding already has a clear winnertop1 >= SearchMinStrongand(top1-top2) >= SearchMinGap
ambiguous: rerank when embedding is unsure (top-2 are close)top1 >= SearchMinStrongand(top1-top2) < SearchMinGap(and top2 is not too weak)
smart: rerank whenever the query looks strong enoughtop1 >= SearchMinStrong
always: rerank whenever there are enough candidates (still requiresEnableRerank=true)
⚠️ Note: the gate only sees indexed content (summaries + active facts in SQLite). A newly typed/remember ...is stored as pending first, and won’t affect retrieval until you click “Remember” (or rollup promotes it). If you test immediately, you may match older items and see a small gap. - Enabled when
-
Return top-K (
SearchTopK, default5)
This design keeps rerank as a precision enhancer, not a mandatory dependency.
Default base directory: ~/local-ai/
~/local-ai/
├── logs/
│ ├── 2026-01-11.jsonl # raw timeline (append-only)
│ ├── 2026-01-11.daily.json # daily summary
│ ├── 2026-W02.weekly.json # weekly summary (example)
│ ├── 2026-01.monthly.json # monthly summary (example)
│ └── archive/ # rotated/archived timelines
├── prompts/ # prompt templates (daily/weekly/monthly)
└── memory/
└── memory.sqlite # structured memory + embeddings
Top-level:
-
cmd/local-ai/— CLI entrypoint -
cmd/local-ai-web/— Web server entrypoint -
internal/app/— core engine (all business logic)web_server.go— HTTP API + embedded Web UI (internal/app/web/*)http_middleware.go— auth token check, loopback bypass, rate-limit, streaming guardschat*.go— chat orchestration, prompt assembly, context building, auditingsummary_*.go— daily/weekly/monthly summary generatorssearch.go— semantic search + rerank intent gatepending_facts*.go/facts*.go— Facts Center workflow and conflict handlingdb*.go— SQLite schema + migrations + helpers
-
tools/rerank-http/— optional C++ ONNX Runtime reranker server (POST /v1/rerank) -
tools/rerank-proxy/— optional Python FastAPI proxy: text → tokens →rerank-http(POST /v1/rerank_text)
TimeLayer does not ship a model — it calls your services via HTTP:
- Chat endpoint (
TIMELAYER_CHAT_URL)
Must be OpenAI-compatible chat completion with streaming support for Web. - Embedding endpoint (
TIMELAYER_EMBED_URL)
AcceptsPOST {"input": "..."} - Optional rerank endpoint (
TIMELAYER_RERANK_URL)
If enabled, used for reranking candidate hits.
If you run everything locally, llama-server can expose an OpenAI-compatible chat endpoint and an embedding endpoint.
Example (adjust model path and flags to your GPU/CPU):
# Chat + embeddings on :8080 (provides /v1/chat/completions and /embedding on recent llama.cpp builds)
llama-server \
-m /path/to/Qwen3-8B-Q5_K_M.gguf \
--port 8080 \
--ctx-size 8192 \
--embedding --pooling cls
export TIMELAYER_CHAT_URL='http://127.0.0.1:8080/v1/chat/completions'
export TIMELAYER_EMBED_URL='http://127.0.0.1:8080/embedding'
export TIMELAYER_CHAT_MODEL='Qwen3-8B-Q5_K_M.gguf'If your embedding endpoint is
/v1/embeddingsinstead of/embedding, just setTIMELAYER_EMBED_URLaccordingly.
This repo includes a full local rerank stack under tools/:
tools/rerank-http(C++): runs the ONNX model and exposesPOST /v1/reranktools/rerank-proxy(Python): tokenizes text and exposesPOST /v1/rerank_text(what TimeLayer calls)
Quickstart (macOS/Linux, requires ONNX Runtime C++ SDK for your platform):
# 1) Build & run C++ rerank-http
export ORT_ROOT=/path/to/onnxruntime-<platform>-1.23.2
cmake -S tools/rerank-http -B build/rerank-http
cmake --build build/rerank-http -j
./build/rerank-http/rerank_http \
--ep cpu \
--model /path/to/model_fp16.onnx
# listens on http://127.0.0.1:8089 by default
# 2) Run Python rerank-proxy
python -m venv .venv && source .venv/bin/activate
pip install -r tools/rerank-proxy/requirements.txt
export RERANK_TOKENIZER_DIR=/path/to/tokenizer_dir
export CPP_RERANK_URL='http://127.0.0.1:8089/v1/rerank'
python tools/rerank-proxy/rerank_proxy.py
# listens on http://127.0.0.1:8090 by default
# 3) Point TimeLayer at the proxy
export TIMELAYER_ENABLE_RERANK=1
export TIMELAYER_RERANK_URL='http://127.0.0.1:8090/v1/rerank_text'If you don’t want rerank, disable it:
export TIMELAYER_ENABLE_RERANK=0TimeLayer intentionally does not rerank every query. Rerank is a cross-encoder pass: higher precision, higher latency.
The gate runs on the embedding-stage scores (cosine) of the current candidate set and is configurable via TIMELAYER_RERANK_MODE:
conservative: rerank only when embedding already has a clear winnerambiguous: rerank when embedding is unsure (top-2 are close)smart(default): rerank whenever the query looks strong enough (top1 ≥TIMELAYER_SEARCH_MIN_STRONG)always: rerank whenever there are enough candidates (still gated byTIMELAYER_RERANK_MIN_BATCH)
Common skip reasons:
weak_query: top1 is belowTIMELAYER_SEARCH_MIN_STRONGnot_enough_hits: fewer thanTIMELAYER_RERANK_MIN_BATCHhitsgap_too_small: only inconservativemode (top-2 are too close)gap_too_large: only inambiguousmode (embedding is already confident)
Tip: for many embedding models, the top-2 cosine scores are often close, so conservative can be very strict. If you want rerank to reflect its “disambiguation” nature, try TIMELAYER_RERANK_MODE=ambiguous.
To make any query rerank whenever there are ≥2 candidates:
export TIMELAYER_ENABLE_RERANK=1
export TIMELAYER_RERANK_FORCE=1If you want rerank to run more often in a large corpus, lower the gate thresholds:
# Common “always rerank unless only 0/1 hit” tuning:
export TIMELAYER_SEARCH_MIN_STRONG=0.0
export TIMELAYER_SEARCH_MIN_GAP=0.0Or keep it conservative but less strict:
export TIMELAYER_SEARCH_MIN_STRONG=0.85
export TIMELAYER_SEARCH_MIN_GAP=0.02If you prefer to keep the gate on but still want a single query that tends to pass it, use a very specific query that creates a clear top-1 winner.
Practical pattern:
- First write a unique, specific fact (via chat, so it goes into your timeline):
/remember My unique anchor: "TimeLayer rerank smoke-test 2026-01-11 21:58:13".
- Then ask a question that repeats the anchor nearly verbatim:
What is my unique anchor "TimeLayer rerank smoke-test 2026-01-11 21:58:13"?
This tends to push the top1 embedding score higher and widen the gap, which makes the gate pass more reliably.
Defaults are defined in internal/app/config.go. Common knobs:
| Env | Default | Meaning |
|---|---|---|
TIMELAYER_CHAT_URL |
http://localhost:8080/v1/chat/completions |
Chat completion endpoint (OpenAI-compatible). |
TIMELAYER_EMBED_URL |
http://localhost:8080/embedding |
Embedding endpoint. |
TIMELAYER_CHAT_MODEL |
Qwen3-8B-Q5_K_M.gguf |
Sent as the model name in chat requests. |
TIMELAYER_HTTP_ADDR |
127.0.0.1:3210 |
Web listen addr. |
TIMELAYER_HTTP_AUTH_TOKEN |
empty | If set: /api/* requires token (see Security). |
TIMELAYER_HTTP_ALLOW_INSECURE_REMOTE |
false |
Allow binding to non-loopback without token (not recommended). |
TIMELAYER_HTTP_RATE_LIMIT_RPM |
120 |
Simple per-IP RPM for /api/* (0 disables). |
TIMELAYER_HTTP_MAX_CONCURRENT_STREAMS |
4 |
Limit concurrent /api/chat/stream sessions. |
TIMELAYER_HTTP_MAX_INPUT_BYTES |
65536 |
Max request input size. |
TIMELAYER_RECENT_MAX_LINES |
20 |
Tail lines injected as “recent raw dialog”. |
TIMELAYER_ENABLE_RERANK |
true |
Enable rerank stage. |
TIMELAYER_RERANK_FORCE |
false |
Force rerank whenever there are ≥2 candidates (testing/benchmarking). |
TIMELAYER_RERANK_MODE |
smart |
conservative (clear-winner), ambiguous (near-tie), smart (if strong), always (if enough hits). |
TIMELAYER_RERANK_URL |
http://127.0.0.1:8090/v1/rerank_text |
Rerank endpoint. |
TIMELAYER_RERANK_TOPN |
20 |
Candidate pool size before rerank. |
TIMELAYER_RERANK_TIMEOUT_MS |
15000 |
Per rerank request timeout. |
TIMELAYER_RERANK_MIN_BATCH |
2 |
Skip rerank if fewer hits. |
TIMELAYER_SEARCH_MIN_STRONG |
0.90 |
Gate threshold for rerank modes: top1 embedding score must be ≥ this value (except always). |
TIMELAYER_SEARCH_MIN_GAP |
0.06 |
Gap threshold used by conservative / ambiguous (see TIMELAYER_RERANK_MODE). |
TIMELAYER_SQLITE_JOURNAL_MODE |
WAL |
SQLite journal mode. |
TIMELAYER_SQLITE_SYNCHRONOUS |
NORMAL |
SQLite synchronous level. |
If you try to bind to a non-loopback address (e.g. 0.0.0.0:3210 or LAN IP):
- TimeLayer refuses to start unless you either:
- set
TIMELAYER_HTTP_AUTH_TOKEN, or - explicitly allow insecure remote bind:
TIMELAYER_HTTP_ALLOW_INSECURE_REMOTE=1(not recommended)
- set
When TIMELAYER_HTTP_AUTH_TOKEN is set:
- all
/api/*require either:X-Auth-Token: <token>orAuthorization: Bearer <token>
Requests coming from 127.0.0.1 / ::1 can access /api/* without a token, unless proxy-forwarding headers are present:
Forwarded,X-Forwarded-For,X-Real-IP,X-Forwarded-Proto
This prevents accidental exposure behind a reverse proxy.
go run ./cmd/local-aiCommon commands:
/chat <message>/ask <question>/search <query>/daily//weekly//monthly/remember <fact>/forget <fact>/reindex daily|weekly|monthly|all
go run ./cmd/local-ai-web
# then open http://127.0.0.1:3210/GET /health→ok
POST /api/chat
Body:{"input":"hello"}
Response:{"text":"..."}
POST /api/chat/stream
Body:{"input":"hello"}
SSE events:delta,done,error,notice(seeinternal/app/web/app.jsfor client behavior).
POST /api/context/audit(alias of/api/debug/context)
Body:{"input":"..."}
Response: includes injected blocks, steps, and retrieval hits.
- counts:
GET /api/facts/counts(alias of/api/facts/status/counts) - pending list:
GET /api/facts/pending - pending groups:
GET /api/facts/pending/groups - remember/reject:
- JSON body:
POST /api/facts/remember/POST /api/facts/reject({"id":123}) - REST alias:
POST /api/facts/pending/123/remember/.../reject
- JSON body:
- conflicts:
GET /api/facts/conflicts- resolve (JSON body):
POST /api/facts/conflicts/keepor/replace - resolve (REST):
POST /api/facts/conflicts/123/resolvewith{"action":"keep"}or{"action":"replace","replacement":"..."}
- Single-user, local-first design (no user auth/multi-tenant model).
- No HTTPS/TLS built-in (put it behind your own proxy if needed).
- Rerank is optional and best-effort; failures do not break chat.
- Embedding schema assumes consistent embedding dimensionality across time (mixed embedding models should be reindexed).
If you plan to expose it to a network, keep token auth enabled and add network controls (firewall / SSH tunnel / reverse proxy with TLS).
GPL-3.0-only. See LICENSE.