Skip to content

AI Private Workspace v0.2.2

Choose a tag to compare

@github-actions github-actions released this 25 Jun 08:37
ebf5eef

Downloads

macOS

Windows

A focused Ask quality release: the same local, private RAG, but the answers
are better grounded and the model's context window is used more deliberately. All
improvements are deterministic or best-effort, require no new model downloads,
and keep the default fast-start experience (the reranker stays off). The new
source-aware chunks take effect after a reindex (scan, then build).

Added

  • Source-aware chunks. Every indexed chunk now carries a deterministic one-line header — [source: path › section · part N/M] — derived from its own structure (a heading or a definition name). The header is stored and shown with the chunk so the model grounds and cites more reliably and the path is keyword-searchable, while the dense embedding is computed on the clean body so similarity is unaffected. No LLM call at index time; takes effect after a reindex.
  • Hallucinated-citation guard. When an answer cites a file in backticks that wasn't in the retrieved context, Ask attaches a non-blocking "verify these" note instead of silently trusting it.
  • Optional query rewrite. Ask can distil a question into a compact search query with the loaded model before retrieval, so pronoun- or intent-phrased questions still find the right files. Off by default (one extra call per ask); opt in via AI_WORKSPACE_ASK_QUERY_REWRITE. Reuses the loaded model — no new download.

Changed

  • Smarter use of the model's context window in Ask. The grounded prompt is now token-budgeted to the model's real context window — retrieved chunks are sized to what actually fits after reserving room for project memory, conversation history and the answer, so the engine never silently truncates an overflowing prompt. The prompt is also reordered (stable instructions/context first, the question last) and llama.cpp prompt caching is enabled, so multi-turn answers start faster.
  • Semantic project memory. Recalled memory is re-ranked by embedding similarity (when a local embedder is available), so a question about "production" surfaces a note that says "prod is called prd" even with no shared words. Best-effort, with the deterministic keyword + pin + recency selection as the fallback.
  • Conversation history fits the window, with rolling summarization. Recent turns are kept by token budget (not a fixed count), and older evicted turns are folded into a short running summary — so a long chat no longer forgets its earlier context after a handful of exchanges. Best-effort; falls back to just the recent turns.
  • More, more diverse context per answer. With the reranker off (the default), Ask now pulls a wider candidate pool and selects a relevant-but-diverse subset with Maximal Marginal Relevance, so a fixed context budget covers more of the codebase instead of near-duplicate top hits.
All commits in this release
  • release: 0.2.2
  • style: ruff format test_contextual_chunk
  • fix(index): embed clean chunk body, keep provenance header for display (B)
  • style: ruff format (post-merge)
  • style: ruff format
  • docs(changelog): note answer-quality batch (chunk headers, citation guard, MMR, query rewrite)
  • feat(ask): optional LLM query rewrite before retrieval (D)
  • feat(index): contextual chunk headers (source path + section) (B)
  • feat(ask): MMR diversification + wider candidate pool (A)
  • feat(ask): flag hallucinated source citations (C)
  • feat(ask): token-budgeted conversation history with rolling summarization
  • docs(changelog): note Ask context-budgeting + semantic memory (Unreleased)
  • feat(memory): semantic re-rank of project memory (synonyms/paraphrase)
  • perf(ask): token-budget the grounded prompt to the model window + KV-cache-friendly layout
  • ci(osv): apply root osv-scanner.toml via --config so triage ignores take effect
  • security(deps): floor idna/Pygments; triage unfixable OSV advisories

Full changelog: v0.2.1...v0.2.2