Skip to content

v0.17.0

Choose a tag to compare

@vikasagarwal101 vikasagarwal101 released this 06 Jun 20:56
· 75 commits to main since this release

Evidence & Analytics — Audit Trail V2 + Advanced Forecasting

v0.17 is the largest release since v0.16. It ships a complete rewrite of the audit trail (canonical projection over existing tables, scoped evidence bundles, integrity-ready envelope), a full advanced analytics layer (forecasts with confidence, cumulative flow, bottleneck detection, agent quality signals, sprint metrics), and a broad platform hardening pass covering schema consolidation, MCP tool architecture, and UI consistency.

⚠️ Prerelease Reminder

v0.17 is still prerelease. Expect breaking changes in v0.18 and beyond. Schema, MCP tool shapes, API endpoints, and data formats may shift without a migration path. Pin your version, snapshot your data, and do not use v0.17 against production workloads. See the README for the full disclaimer.


Audit Trail V2

The audit layer was rebuilt as a virtual canonical projection over the existing task_events, mission_events, effort_entries, code evidence, pipeline, and integration sync tables — no physical audit_events table. Source rows are projected into a stable AuditEvent shape with deterministic IDs (e.g. task_event:<id>, mission_event:<id>, effort_entry:<id>), per-event completeness, and warnings.

Canonical projection

  • Single AuditEvent shape with required completeness field — every event declares how much of its provenance is known
  • Deterministic IDs source-prefixed from underlying tables, stable across queries, safe for cross-referencing
  • Server-generated summaries for important sources; UI/MCP may rephrase but not invent meaning
  • Actor/source normalization — legacy system actor IDs (status-engine, CI actors) normalized to namespaced form (system:status-engine, system:integration-sync, etc.) in projection; new emissions use the namespaced form
  • CSV exports include stable core columns (id, occurredAt, habitatId, entityType, entityId, action, actorType, actorId, source, summary, completenessStatus) with optional provenanceJson / metadataJson
  • JSON/JSONL exports use the structured AuditEvent shape directly

Habitat-scoped audit feed

GET /habitats/:habitatId/audit/events — non-streaming envelope with events, warnings, and completenessSummary. Authorization required via the existing habitat access check. Health snapshots are opt-in via entityType=health_snapshot or includeHealthSnapshots=true.

Canonical filters

  • entityType / entityId are canonical; taskId / missionId are convenience aliases that normalize to canonical filters
  • Conflicting filter combinations return validation errors
  • Presets: effort_corrections, code_evidence_changes, failed_pipelines
  • Health snapshots excluded from default feeds and exports

Entity-scoped evidence bundles

  • GET /tasks/:taskId/audit/bundle — task lifecycle, effort, code evidence, pipeline, warnings
  • GET /missions/:missionId/audit/bundle — separates directMissionEvidence from rolledUpTaskEvidence (rolled-up rows keep originating task id/title so task-level proof is not blurred with mission-level proof)
  • Raw payload / diff / content fields are stripped at the projection layer

MCP audit tools (scoped)

  • orcy_habitat_taskget-audit-bundle — task bundle
  • orcy_habitat_missionget-audit-bundle — mission bundle
  • orcy_adminexport-audit-log — admin export with canonical filters (humans only)
  • Broad habitat-wide audit export and scheduled export management remain human-only — MCP agents can only request scoped bundles

Request & MCP provenance

  • REST requests automatically record metadata.audit with requestId, route, method, source, and the authenticated actor
  • MCP requests add X-Orcy-Audit-Source: mcp_tool plus the tool name and action via MCP-side AsyncLocalStorage
  • The API treats these as mcp_tool provenance only for agent-key requests

Audit-visible actions

  • Manual audit export requests emit a system event with actor, route, filters, format, event count, and success/failure
  • Scheduled audit export executions emit system events with schedule id, run time, format, filters, event count, output reference, and success/failure
  • The event for the current export is not included in that same export (cutoff-based projection or post-generation emit)
  • Export/bundle access events are recorded but excluded from the response that triggered them

Mission delete events preserved

Migration 0024_preserve_mission_events_on_delete.sql removed the cascade FK on mission_events.mission_id. Mission delete audit events now survive mission deletion. The Drizzle schema was updated to match — no .references() on missionEvents.missionId.

Archival (integrity-ready)

auditArchivalService.archiveOldEvents() writes a schema-versioned archive envelope of canonical task and mission lifecycle events and deletes the archived source rows. Same-day archives are merged; legacy array-shaped archives remain readable. Active hash-chain verification, restore workflow, and audit_archive_runs remain deferred.

Design decisions preserved

  • Audit projection never silently drops failed sources; authorization and primary entity failures hard fail
  • Supporting-source failures return partial results with explicit warnings
  • CSV hard fails if warnings cannot be represented clearly
  • Health snapshots are opt-in supporting evidence, not default feed
  • Inferred presence from task_time_records is excluded from default audit feeds

Advanced Analytics

A full forecasting and operational analytics layer was added on top of the existing data, with sample-size-aware confidence and explicit caveats.

Forecasting (predictionService)

  • TaskEstimate now carries targetType=task, targetId, missionId, point/range dates, confidenceScore, structured ForecastReason[], sampleSize, and basis=throughput
  • Sample-size confidence follows v0.17 thresholds: 0-2 = insufficient_data, 3-9 = low, 10-29 = medium, 30+ = high
  • Habitats with 0-2 completed tasks in the last 30 days are insufficient_data, not ordinary low — estimates are not produced, ranges are unavailable, and no_recent_velocity / small_sample reasons are emitted
  • Aggregate mission and sprint forecasts derived from child task forecasts — weakest child confidence, max child completion date/range, inherited structured reasons

Trends (trendService)

  • Equal-period habitat trends comparing current period to previous equal-length period
  • Throughput and cycle time trends with MetricTrend (current/previous values, absolute/percent delta, direction, sample size, sample-size confidence)
  • direction=unknown when confidence is insufficient

Cumulative flow (cumulativeFlowService)

  • New cumulative_flow_snapshots table with countsByColumn, countsByStatus, source, completeness, warnings, and unique (habitatId, snapshotDate) semantics
  • Stored snapshots authoritative; current day projected from live board state with current_state_projection source label
  • Older missing days return zero-count partial points with partial_history warning (deliberately conservative — no fake historical reconstruction)
  • Atomic upsert via INSERT ... ON CONFLICT(habitat_id, snapshot_date) DO UPDATE SET ...
  • GET /habitats/:habitatId/cumulative-flow?days=30 route with agentOrHumanAuth + requireHabitatAccess

Bottlenecks (bottleneckService)

  • Dwell-time bottlenecks from completed transition samples (avg/median/p90 minutes)
  • WIP-exceeded findings from current mission counts vs column WIP limits
  • Blocked-dependency findings from unresolved task dependencies (archived missions excluded)
  • Confidence, evidence, and recommendations per finding
  • Insufficient dwell samples produce warnings but no dwell bottleneck
  • WIP limit of 0 (disabled) skipped entirely — no false high-severity findings
  • GET /habitats/:habitatId/bottlenecks?days=30 route

Time-in-column (timeInColumnService)

  • Per-column completed dwell samples from task_events.from_column_id / to_column_id and timestamps
  • 2x lookback to capture tasks that entered a column before the query window but exited within it
  • Average / median / p90 minutes with sample-size confidence
  • Active tasks without a leave event are not mixed into completed dwell samples

Sprint analytics (sprintAnalyticsService)

  • GET /sprints/:id/metrics — completion / velocity / remaining / planned effort, separating logged effort from inferred presence
  • GET /sprints/:id/burndown — delegates to predictionService.getBurndown(habitatId, sprintDurationDays, { sprintId }), preserves sprint start/end semantics, excludes non-sprint tasks
  • GET /sprints/:id/carry-over — inferred reasons only: incomplete tasks, blocked dependencies, missing estimates, overdue mission due date, no recent task activity, repeated task rejection, effort overrun
  • Forecast/on-track behavior is conservative: no forecast or insufficient_data forecast adds warnings and does not claim on-track
  • Routes reuse requireSprintAccess inside routes/sprints.ts; read-only endpoints use agentOrHumanAuth before the sprint access check

Agent quality signals (agentQualityService)

  • GET /habitats/:habitatId/agent-quality — informational-only signals across approval, evidence, estimate, and consistency dimensions
  • Sample-size confidence: same thresholds as forecast/trend
  • score=null for insufficient_data; warnings use non-punitive language ("Low confidence: not enough completed work yet.", "High rejection rate in recent sample.")
  • Batched evidence sampling (3 queries total, not per-task)
  • Informational only — does not affect assignment, approval gates, review routing, task eligibility, or permissions

MCP analytics surface

  • orcy_habitatpredictions, bottlenecks, agent-quality
  • orcy_sprintget_metrics, get_burndown, get_carry_over
  • Cumulative-flow remains API/UI-first (dense chart series not exposed via MCP)
  • Sprint burndown returns summary fields, not the dense chart series

Cycle time median

queryCycleTime() collects per-task cycle minutes per completion date, sorts each bucket, computes median independently from average. Regression test locks the difference ([10, 20, 90] → average 40, median 20).


Platform Hardening

A broad pass to close correctness, performance, and consistency gaps surfaced by the v0.17 code review.

Schema consolidation (single source of truth)

The Drizzle TypeScript schema files are now the single source of truth. SQL files are generated outputs.

  • 0000_schema.sql regenerated — 64 tables, 171 indexes (was 34 tables with stale index names)
  • initTestDb applies only 0000_schema.sql, skipping migrations 0001-0026 entirely
  • Production databases still apply migrations incrementally via the journal
  • 8 missing .index() declarations added to Drizzle schema for FK columns that only had indexes in migration SQL (pull_requests, pipeline_events, code_evidence_links, code_evidence_gaps)
  • drizzle/meta/ snapshots rebuilt — was 7 stale snapshot files + 25-entry journal (broken), now a clean 0000_snapshot.json matching the current schema
  • New comprehensive Schema Workflow section in docs/DATABASE.md documenting source of truth, two deployment paths, how to add columns/indexes/FKs, verification checklist, common mistakes, and snapshot maintenance

MCP tool architecture

The orcy_admin dispatch tool was removed from MCP tool registration. MCP is for programmatic/AI use only — humans use CLI and WebUI for webhook/template/scheduled-task/audit-export management. The 3 agent-useful actions (batch-assign, batch-set-priority, batch-delete) were moved into orcy_habitat_task. The admin dispatch file and exports are preserved for backward compatibility and existing tests, but the tool is no longer registered in ALL_TOOLS or wired into the server's TOOL_HANDLERS.

MCP tool quality

  • tasksTemplate items in admin-dispatch now declare proper JSON schema properties (title, description, priority, requiredDomain, requiredCapabilities, estimatedMinutes, order) with required=[title] — was items: { type: "object" } which told LLMs nothing
  • entityTypes and entityType audit filters now mutually exclusive — both provided throws an error
  • getRelevantInsights filtering moved to server-side via tags query param, backed by the existing getRelevantInsights repo function (uses json_each SQL) — MCP client no longer fetches 100 active insights and filters locally
  • getAgentById in the MCP client now only returns null for 404, rethrows all other errors — was swallowing every error, masking 5xx failures

UI consistency

  • Shared WarningList component extracted with variant="note" (prefixes with "Note:" or "Critical:") and variant="plain" (message only). Replaces local copies in FlowAnalyticsPanel and SprintAnalyticsPanel.
  • ErrorBoundary component added; all 6 lazy-loaded chart Suspense blocks in DashboardCharts wrapped for graceful fallback
  • 6 instances of .replace("_", " ") replaced with .replaceAll("_", " ") for status display
  • TaskEstimates migrated from raw Tailwind colors (text-gray-*, text-red-*, text-green-*) to semantic design tokens (text-muted-foreground, text-foreground, text-destructive, border-border)
  • FlowAnalyticsPanel empty-state message changed from "Select a habitat" to "Flow data not available" when habitat is selected
  • Webhook success rate guards total === 0 → returns 100% instead of NaN%
  • Sprint cancel action now uses ConfirmDialog with danger variant — warns "This will uncommit all missions and mark the sprint as cancelled. This cannot be undone." before proceeding

Documentation

  • New prominent prerelease warnings added to README, INSTALL, HUMAN-GUIDE, ARCHITECTURE, and API reference — all link back to the main README warning
  • New Schema Workflow section in docs/DATABASE.md
  • Updated CONTRIBUTING.md with the new Drizzle-first schema workflow

Commits

Features

  • add audit provenance foundation (4cf09e4) — Define shared canonical audit types, add request/MCP provenance context, centralize v0.17 audit event emission, normalize projected system actors, preserve mission events across mission deletion, and attribute integration-sync mission side effects.
  • add canonical audit exports and bundles (d447bfa) — Switch audit export generation to canonical AuditEvent output while preserving existing route paths and CSV/JSON/JSONL formats. Add filter support for canonical entity/source/provider/preset fields and entity-scoped audit evidence bundles.
  • expose scoped audit bundle access (114975d) — Surface task and mission audit bundles through MCP dispatch tools, plus MCP client methods that call entity-scoped routes and normalize task/mission ids.
  • add confidence-aware analytics forecasts (4c1674b) — TaskEstimate carries point/range dates, confidenceScore, structured ForecastReason[], sampleSize, and basis=throughput. 0-2 completed tasks in 30 days are insufficient_data, not ordinary low.
  • add cumulative flow analytics and bottleneck detection (55a75a8) — cumulativeFlowService with daily snapshots and current_state_projection fallback. bottleneckService with dwell, WIP, and blocked-dependency findings. timeInColumnService with 2x lookback.
  • add agent quality analytics and signals (cdd9d78) — GET /habitats/:habitatId/agent-quality with approval / evidence / estimate / consistency dimensions, sample-size confidence, and non-punitive warnings. Informational only.
  • add advanced analytics and audit trail v2 (645c6c2) — Composes the advanced analytics layer (forecasts, cumulative flow, bottleneck detection, agent quality, sprint metrics) and audit trail V2 (canonical projection, scoped bundles, completeness tracking) into a single integration commit.

Refactors

  • optimize database schema and performance (8752fd3) — Remove duplicate unique indexes from cumulative flow snapshots, switch mission comments to soft deletes, optimize cumulative flow snapshot upsert pattern, add audit query pagination, and improve error handling across multiple services.
  • extract shared analytics date utility (58b01ee) — analyticsDate.ts with MS_PER_DAY, utcDateKey, daysAgoISO, utcNowISO, diffDays, daysUntil, dateRange utilities. Standardizes all analytics date boundaries to UTC. 15-test suite.
  • migrate analytics services to shared date utility (P2-14) (e37d828) — 12 services updated to use the shared utility. Removes 7 local msDay definitions and local confidenceForSample duplicates.
  • remove admin tool, move batch actions to task dispatch (P3-2) (a3c8fd0) — orcy_admin removed from MCP tool registration. Batch actions live under orcy_habitat_task. Admin dispatch file and exports preserved for backward compatibility.
  • consolidate schema, fix snapshots, simplify test init (P2-15, P2-16 + cleanup) (09d24f4) — Regenerated 0000_schema.sql (64 tables, 171 indexes). Added 8 missing .index() declarations. Rebuilt stale drizzle/meta/ snapshots. initTestDb applies only 0000_schema.sql.

Bug Fixes

  • improve tool schemas and error handling (P3-3, P3-4, P3-5, P3-6) (7624d07) — tasksTemplate JSON schema properties, entityTypes vs entityType mutual exclusivity, server-side insight filtering, and 404-only swallowing in getAgentById.
  • extract shared components, design tokens, error boundaries (P3-7, P3-8, P3-9, P3-10, P3-12, P3-13) (15a0a99) — WarningList extraction, ErrorBoundary around lazy charts, replaceAll for status display, design tokens for TaskEstimates, NaN% guard, and FlowAnalyticsPanel empty-state message fix.
  • add confirmation dialog for sprint cancel (P3-11) (6dc291d) — Destructive sprint cancel now requires explicit confirmation.

Documentation

  • add prominent prerelease warnings across all user-facing docs (69b65f1) — README, INSTALL, HUMAN-GUIDE, ARCHITECTURE, and API reference all carry prerelease warnings that link back to the main disclaimer.
  • add comprehensive schema workflow documentation (5e64d69) — New "Schema Workflow" section in DATABASE.md and updated CONTRIBUTING.md with the Drizzle-first workflow.

Full Changelog: v0.16.6...v0.17.0


⚠️ Prerelease Reminder

Orcy is in active 0.x prerelease. Schema, APIs, MCP tool shapes, CLI commands, and on-disk formats may change between releases without a migration path. Your data, integrations, and configurations are not guaranteed to survive an upgrade. Pin your version, snapshot your data, and do not use v0.17 against production workloads. We are working toward a stable 1.0 release.