ci: add CI workflows for all services and standardize build triggers#9
Conversation
- Add build-and-push workflows for crawler, platform, proxy, and search services - Enable automatic builds on main branch with path-based triggers for all services - Update existing db, graph-db, and rag workflows to use consistent path triggers - Update db workflow to use new services/db path for Dockerfile - Remove obsolete build-platform.yml in favor of new build-and-push-platform.yml - Update docs to reflect SITE_URL usage instead of NEXT_PUBLIC_APP_URL
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR introduces a standardized multi-architecture Docker build-and-push workflow infrastructure across multiple services. It adds four new GitHub Actions workflows (crawler, platform, proxy, search) that build and push images to GHCR with multi-platform support (linux/amd64, linux/arm64) and automated testing. Simultaneously, existing workflows (db, graph-db, rag) are updated to include file-path-based triggers in addition to tag-based triggers. The legacy build-platform.yml workflow is removed in favor of the new standardized approach. Documentation is updated to reflect runtime URL derivation using SITE_URL instead of NEXT_PUBLIC_* variables. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro (Legacy) 📒 Files selected for processing (10)
Comment |
The function now handles both seconds and milliseconds timestamps using a heuristic: timestamps < 1e11 are treated as seconds and converted to milliseconds. This prevents silent miscalculations when metadata contains seconds-based timestamps from sources like RAG indexing. Addresses CodeRabbit review comment #9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function now handles both seconds and milliseconds timestamps using a heuristic: timestamps < 1e11 are treated as seconds and converted to milliseconds. This prevents silent miscalculations when metadata contains seconds-based timestamps from sources like RAG indexing. Addresses CodeRabbit review comment #9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…cies Bundle of round-2-confirmed cross-tenant fixes plus the dead-code delete of the semantic LLM response cache. POLICY_TYPES drift (W6 #5) - lib/shared/schemas/governance.ts now includes 'data_classification_notice' to match the Convex enum, killing the `as const` cast at use-data-classification-notice.ts:50. documents/compare_documents.ts (W6 #8) - Convex `_storage` is a global namespace; org membership alone was not enough to gate `ctx.storage.getUrl`. Adds a JOIN through fileMetadata via the new internal query verifyStorageIdsBelongToOrg to confirm both `baseStorageId` and `comparisonStorageId` are owned by the caller's org. Refuses with a clear error otherwise. Pattern copied from agent_tools/documents/helpers/retrieve_document.ts. file_metadata/actions.ts::checkFileRagStatuses (W6 #9) - Was an unauthenticated public action that could flip any org's fileMetadata.ragStatus to `failed` via expireStaleRagQueue (DoS, pre-existing on `main`). Now requires `getAuthUser` and filters storageIds to ones owned by an org the caller is a member of via the new file_metadata.internal_queries.filterStorageIdsByCallerOrg. governance/queries.ts (W6 #11) - getPolicy + listPolicies now apply a member-readable allow-list (data_classification_notice, feature_flags, pii_config, chat_filter, personalization, upload_policy, default_models). All other types — login_policy.trustedProxies, password_policy, two_factor_policy, model_access.rules, budgets, retention_policy, moderation_provider.endpoint, system_prompt — are admin-only. listPolicies silently filters those out for non-admins. semantic LLM response cache — DELETE (W6 #12 + #13) - Round-2 v05 confirmed the lookup is structurally cross-tenant (filters only on agent_name, model, expires_at, similarity; ignores user_id / organization_id even though they're stored). The platform helpers `lookupSemanticCache` / `storeSemanticCacheAsync` had ZERO callers in the monorepo, the FastAPI router was mounted but unreachable from platform — a latent foot-gun primed for the next dev to wire up unaware. Deletes: - services/platform/convex/lib/response_cache/semantic_cache.ts - services/platform/convex/lib/response_cache/internal_actions.ts - services/rag/app/routers/llm_cache.py - services/rag/app/services/llm_response_cache.py Plus the corresponding imports in routers/__init__.py, main.py, rag_service.py. Also removes the two empty-catch violations in semantic_cache.ts (no longer applicable). The exact-key Convex `lib/response_cache/{internal_mutations, internal_queries}.ts` cache stays — it is the actually-wired one and is correctly org-scoped.
…eout Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs + /redoc unauthenticated, RAG container ran as root, default token baked into image ENV, strict-mode env name diverged across the wire, non-constant-time token compare, plus three SSRF-guard gaps. services/rag/app/auth.py - W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes the dead-code EXEMPT_PATHS frozenset. services/rag/app/routers/health.py - W7 #1: split into public_router (`/`, `/health`) and protected_router (`/config`). main.py mounts the protected one under Depends(verify_internal_token). Old `router` re-export stays for backwards compat. services/rag/app/main.py - W7 #2: docs_url / redoc_url / openapi_url are None outside debug. - W7 #4: CORS allow_credentials flipped to False (bearer rides Authorization, never cookies). - W7 #1 wiring: mount health-public + health-protected separately. services/rag/app/config.py - W7 #8: require_custom_internal_token accepts BOTH RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN via pydantic AliasChoices. services/rag/Dockerfile + services/convex/Dockerfile - W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`). RAG ingests untrusted PDFs/DOCX through native parsers; biggest blast radius in the stack, now hardened. - W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from both runtime + scratch-squash stages and the matching bake in services/convex/Dockerfile. Operators MUST supply via env / compose / k8s secret. services/platform/convex/lib/helpers/rag_config.ts - W7 #9 F1: `redirect: 'manual'` on every ragFetch. - W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2). - W7 #9 F3: strip trailing `.` before hostname blocklist lookup. - W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding + env rotation mitigation). - W7 #9 F9: deleted path.startsWith('http') override branch (future- bypass foot-gun). services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts - W7 #10: pass timeoutMs=60_000 (default 10s was a regression). - Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to defend against an adversarial RAG response.
…cies Bundle of round-2-confirmed cross-tenant fixes plus the dead-code delete of the semantic LLM response cache. POLICY_TYPES drift (W6 #5) - lib/shared/schemas/governance.ts now includes 'data_classification_notice' to match the Convex enum, killing the `as const` cast at use-data-classification-notice.ts:50. documents/compare_documents.ts (W6 #8) - Convex `_storage` is a global namespace; org membership alone was not enough to gate `ctx.storage.getUrl`. Adds a JOIN through fileMetadata via the new internal query verifyStorageIdsBelongToOrg to confirm both `baseStorageId` and `comparisonStorageId` are owned by the caller's org. Refuses with a clear error otherwise. Pattern copied from agent_tools/documents/helpers/retrieve_document.ts. file_metadata/actions.ts::checkFileRagStatuses (W6 #9) - Was an unauthenticated public action that could flip any org's fileMetadata.ragStatus to `failed` via expireStaleRagQueue (DoS, pre-existing on `main`). Now requires `getAuthUser` and filters storageIds to ones owned by an org the caller is a member of via the new file_metadata.internal_queries.filterStorageIdsByCallerOrg. governance/queries.ts (W6 #11) - getPolicy + listPolicies now apply a member-readable allow-list (data_classification_notice, feature_flags, pii_config, chat_filter, personalization, upload_policy, default_models). All other types — login_policy.trustedProxies, password_policy, two_factor_policy, model_access.rules, budgets, retention_policy, moderation_provider.endpoint, system_prompt — are admin-only. listPolicies silently filters those out for non-admins. semantic LLM response cache — DELETE (W6 #12 + #13) - Round-2 v05 confirmed the lookup is structurally cross-tenant (filters only on agent_name, model, expires_at, similarity; ignores user_id / organization_id even though they're stored). The platform helpers `lookupSemanticCache` / `storeSemanticCacheAsync` had ZERO callers in the monorepo, the FastAPI router was mounted but unreachable from platform — a latent foot-gun primed for the next dev to wire up unaware. Deletes: - services/platform/convex/lib/response_cache/semantic_cache.ts - services/platform/convex/lib/response_cache/internal_actions.ts - services/rag/app/routers/llm_cache.py - services/rag/app/services/llm_response_cache.py Plus the corresponding imports in routers/__init__.py, main.py, rag_service.py. Also removes the two empty-catch violations in semantic_cache.ts (no longer applicable). The exact-key Convex `lib/response_cache/{internal_mutations, internal_queries}.ts` cache stays — it is the actually-wired one and is correctly org-scoped.
…eout Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs + /redoc unauthenticated, RAG container ran as root, default token baked into image ENV, strict-mode env name diverged across the wire, non-constant-time token compare, plus three SSRF-guard gaps. services/rag/app/auth.py - W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes the dead-code EXEMPT_PATHS frozenset. services/rag/app/routers/health.py - W7 #1: split into public_router (`/`, `/health`) and protected_router (`/config`). main.py mounts the protected one under Depends(verify_internal_token). Old `router` re-export stays for backwards compat. services/rag/app/main.py - W7 #2: docs_url / redoc_url / openapi_url are None outside debug. - W7 #4: CORS allow_credentials flipped to False (bearer rides Authorization, never cookies). - W7 #1 wiring: mount health-public + health-protected separately. services/rag/app/config.py - W7 #8: require_custom_internal_token accepts BOTH RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN via pydantic AliasChoices. services/rag/Dockerfile + services/convex/Dockerfile - W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`). RAG ingests untrusted PDFs/DOCX through native parsers; biggest blast radius in the stack, now hardened. - W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from both runtime + scratch-squash stages and the matching bake in services/convex/Dockerfile. Operators MUST supply via env / compose / k8s secret. services/platform/convex/lib/helpers/rag_config.ts - W7 #9 F1: `redirect: 'manual'` on every ragFetch. - W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2). - W7 #9 F3: strip trailing `.` before hostname blocklist lookup. - W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding + env rotation mitigation). - W7 #9 F9: deleted path.startsWith('http') override branch (future- bypass foot-gun). services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts - W7 #10: pass timeoutMs=60_000 (default 10s was a regression). - Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to defend against an adversarial RAG response.
…cies Bundle of round-2-confirmed cross-tenant fixes plus the dead-code delete of the semantic LLM response cache. POLICY_TYPES drift (W6 #5) - lib/shared/schemas/governance.ts now includes 'data_classification_notice' to match the Convex enum, killing the `as const` cast at use-data-classification-notice.ts:50. documents/compare_documents.ts (W6 #8) - Convex `_storage` is a global namespace; org membership alone was not enough to gate `ctx.storage.getUrl`. Adds a JOIN through fileMetadata via the new internal query verifyStorageIdsBelongToOrg to confirm both `baseStorageId` and `comparisonStorageId` are owned by the caller's org. Refuses with a clear error otherwise. Pattern copied from agent_tools/documents/helpers/retrieve_document.ts. file_metadata/actions.ts::checkFileRagStatuses (W6 #9) - Was an unauthenticated public action that could flip any org's fileMetadata.ragStatus to `failed` via expireStaleRagQueue (DoS, pre-existing on `main`). Now requires `getAuthUser` and filters storageIds to ones owned by an org the caller is a member of via the new file_metadata.internal_queries.filterStorageIdsByCallerOrg. governance/queries.ts (W6 #11) - getPolicy + listPolicies now apply a member-readable allow-list (data_classification_notice, feature_flags, pii_config, chat_filter, personalization, upload_policy, default_models). All other types — login_policy.trustedProxies, password_policy, two_factor_policy, model_access.rules, budgets, retention_policy, moderation_provider.endpoint, system_prompt — are admin-only. listPolicies silently filters those out for non-admins. semantic LLM response cache — DELETE (W6 #12 + #13) - Round-2 v05 confirmed the lookup is structurally cross-tenant (filters only on agent_name, model, expires_at, similarity; ignores user_id / organization_id even though they're stored). The platform helpers `lookupSemanticCache` / `storeSemanticCacheAsync` had ZERO callers in the monorepo, the FastAPI router was mounted but unreachable from platform — a latent foot-gun primed for the next dev to wire up unaware. Deletes: - services/platform/convex/lib/response_cache/semantic_cache.ts - services/platform/convex/lib/response_cache/internal_actions.ts - services/rag/app/routers/llm_cache.py - services/rag/app/services/llm_response_cache.py Plus the corresponding imports in routers/__init__.py, main.py, rag_service.py. Also removes the two empty-catch violations in semantic_cache.ts (no longer applicable). The exact-key Convex `lib/response_cache/{internal_mutations, internal_queries}.ts` cache stays — it is the actually-wired one and is correctly org-scoped.
…eout Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs + /redoc unauthenticated, RAG container ran as root, default token baked into image ENV, strict-mode env name diverged across the wire, non-constant-time token compare, plus three SSRF-guard gaps. services/rag/app/auth.py - W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes the dead-code EXEMPT_PATHS frozenset. services/rag/app/routers/health.py - W7 #1: split into public_router (`/`, `/health`) and protected_router (`/config`). main.py mounts the protected one under Depends(verify_internal_token). Old `router` re-export stays for backwards compat. services/rag/app/main.py - W7 #2: docs_url / redoc_url / openapi_url are None outside debug. - W7 #4: CORS allow_credentials flipped to False (bearer rides Authorization, never cookies). - W7 #1 wiring: mount health-public + health-protected separately. services/rag/app/config.py - W7 #8: require_custom_internal_token accepts BOTH RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN via pydantic AliasChoices. services/rag/Dockerfile + services/convex/Dockerfile - W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`). RAG ingests untrusted PDFs/DOCX through native parsers; biggest blast radius in the stack, now hardened. - W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from both runtime + scratch-squash stages and the matching bake in services/convex/Dockerfile. Operators MUST supply via env / compose / k8s secret. services/platform/convex/lib/helpers/rag_config.ts - W7 #9 F1: `redirect: 'manual'` on every ragFetch. - W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2). - W7 #9 F3: strip trailing `.` before hostname blocklist lookup. - W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding + env rotation mitigation). - W7 #9 F9: deleted path.startsWith('http') override branch (future- bypass foot-gun). services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts - W7 #10: pass timeoutMs=60_000 (default 10s was a regression). - Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to defend against an adversarial RAG response.
… 2FA pepper P0-16 — `scrubSubjectAuditLogs` doesn't clear `actorEmailHash` / `actorIpHash` (round-1 #8, round-2 V6). Peppered hashes are pseudonymized PII per GDPR Art 4(5) — they're still personal data and must be cleared on Art 17 erasure. Without this, a subject's audit-chain entries kept a stable identifier even after `scrubSubjectAuditLogs` "scrubbed" them; re-identification was possible by the controller (or anyone with the pepper) by hashing a known email. The signed `pii_scrub` checkpoint window already permits the row's hash to diverge from its original (verifier skips chain re-compute inside the window), so clearing these columns is chain-safe — just two added field clears in the patch. P0-17 — `notifications` table has no retention or erasure coverage (round-1 #8, round-2 V6). In-app notifications carry the subject's peppered email + IP in `params` (lockout alerts, system messages). Without retention they accumulated indefinitely AND survived subject erasure. Fix: - Added `'notifications'` to `RETENTION_CATEGORIES` + policy schema fields. Wired into bounds-proposal map + bounds validator + clamp. - New `cleanupNotifications` action category: hard-delete on TTL (no two-pass trash — admin telemetry isn't user-restorable), gated by org-wide hold only. - New `listExpiredNotifications` query + `deleteExpiredNotification` mutation (cross-org guard + mid-flight org-hold re-check). - New `eraseSubjectNotifications` for Art 17 cascade: matches params.email against plaintext OR peppered-hash form so rows written under either pepper state are covered. Wired into `processErasureRequest`. P1-F — 2FA writes plaintext email/IP to audit chain (round-1 #9, round-2 V6). Switched 2FA's recordFailure / clearOnSuccess / logEnrollmentEvent to splitEmailForAudit / splitIpForAudit shape; matches login_attempts so a single TALE_AUDIT_PEPPER env-var flip rotates the whole chain. Verified: typecheck clean; 599 tests pass across affected dirs.
…heck - audit_hash: add lifecycleStatus + statusChangedAt to EXCLUDED_FIELDS so retention soft-delete (markRowExpiredGeneric) patching audit log rows doesn't poison the chain hash recompute. Pre-fix, ANY soft-deleted audit row caused verifyIntegrity to fail valid=false from that row forward. Round-2 review CRITICAL #8. - audit_logs/validators: declare lifecycleStatus + statusChangedAt on auditLogItemValidator so query-return validation accepts soft-deleted rows. Defense-in-depth alongside the EXCLUDED_FIELDS fix. - verify_integrity: anchor candidate filter accepts subtype === undefined (legacy retention checkpoints written before subtype field existed). Strict equality dropped them, breaking verifyIntegrity for any deployment that ran retention pre-upgrade. Match canonicalCheckpointPayload's `?? 'retention'` fallback. Round-2 review CRITICAL #9. - verify_integrity: add fromTimestamp arg for paged resumption + suppress isFirstEntry head-anchor when paging mid-chain. Pre-fix, the response promised "page from lastVerifiedTimestamp + 1" but the query had no such arg — large-org chains could not be paged. - verify_integrity: drop unsignedScrubSubjects Set (security-flavored dead code; unsignedScrubCount alone tracks the metric). The set was populated but never read; the actual gate is `!hasSigningKey`. Comment clarified. - verify_integrity: type entries as Doc<'auditLogs'>[] instead of an open `[key: string]: unknown` index signature. - audit_logs/internal_mutations: delete archiveOldLogs deprecated re-export — zero callers, AGENTS.md prohibits @deprecated tombstones. - audit_logs/helpers (createAuditLog): introduce buildAuditRecordHashInput as single source of truth for the canonical record payload — both writer and self-check call it, eliminating drift risk that schema additions could change the hash output across writes vs verify. - audit_logs/helpers (createAuditLog): genesis sentinel — read + patch the per-org auditLogChainGenesis row before each write. This forces OCC contention on a real document for the first audit write per org, closing the genesis-fork race where two concurrent first-writers both observe lastEntry=null and commit two roots with previousHash=''. Round-2 review CRITICAL #10. - audit_logs/helpers (createAuditLog): inline self-check on every write recomputes the prior row's integrity hash and console.errors on mismatch. Catches naive scenario-1 tampering (field changed, hash not updated) at the next legitimate audit write — the only automated tamper detection today. Wrapped in try/catch and skips piiScrubbed rows so it cannot affect the legitimate write path. Round-2 review C.5. Lint + typecheck clean. Convex codegen succeeded.
…cies Bundle of round-2-confirmed cross-tenant fixes plus the dead-code delete of the semantic LLM response cache. POLICY_TYPES drift (W6 #5) - lib/shared/schemas/governance.ts now includes 'data_classification_notice' to match the Convex enum, killing the `as const` cast at use-data-classification-notice.ts:50. documents/compare_documents.ts (W6 #8) - Convex `_storage` is a global namespace; org membership alone was not enough to gate `ctx.storage.getUrl`. Adds a JOIN through fileMetadata via the new internal query verifyStorageIdsBelongToOrg to confirm both `baseStorageId` and `comparisonStorageId` are owned by the caller's org. Refuses with a clear error otherwise. Pattern copied from agent_tools/documents/helpers/retrieve_document.ts. file_metadata/actions.ts::checkFileRagStatuses (W6 #9) - Was an unauthenticated public action that could flip any org's fileMetadata.ragStatus to `failed` via expireStaleRagQueue (DoS, pre-existing on `main`). Now requires `getAuthUser` and filters storageIds to ones owned by an org the caller is a member of via the new file_metadata.internal_queries.filterStorageIdsByCallerOrg. governance/queries.ts (W6 #11) - getPolicy + listPolicies now apply a member-readable allow-list (data_classification_notice, feature_flags, pii_config, chat_filter, personalization, upload_policy, default_models). All other types — login_policy.trustedProxies, password_policy, two_factor_policy, model_access.rules, budgets, retention_policy, moderation_provider.endpoint, system_prompt — are admin-only. listPolicies silently filters those out for non-admins. semantic LLM response cache — DELETE (W6 #12 + #13) - Round-2 v05 confirmed the lookup is structurally cross-tenant (filters only on agent_name, model, expires_at, similarity; ignores user_id / organization_id even though they're stored). The platform helpers `lookupSemanticCache` / `storeSemanticCacheAsync` had ZERO callers in the monorepo, the FastAPI router was mounted but unreachable from platform — a latent foot-gun primed for the next dev to wire up unaware. Deletes: - services/platform/convex/lib/response_cache/semantic_cache.ts - services/platform/convex/lib/response_cache/internal_actions.ts - services/rag/app/routers/llm_cache.py - services/rag/app/services/llm_response_cache.py Plus the corresponding imports in routers/__init__.py, main.py, rag_service.py. Also removes the two empty-catch violations in semantic_cache.ts (no longer applicable). The exact-key Convex `lib/response_cache/{internal_mutations, internal_queries}.ts` cache stays — it is the actually-wired one and is correctly org-scoped.
…eout Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs + /redoc unauthenticated, RAG container ran as root, default token baked into image ENV, strict-mode env name diverged across the wire, non-constant-time token compare, plus three SSRF-guard gaps. services/rag/app/auth.py - W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes the dead-code EXEMPT_PATHS frozenset. services/rag/app/routers/health.py - W7 #1: split into public_router (`/`, `/health`) and protected_router (`/config`). main.py mounts the protected one under Depends(verify_internal_token). Old `router` re-export stays for backwards compat. services/rag/app/main.py - W7 #2: docs_url / redoc_url / openapi_url are None outside debug. - W7 #4: CORS allow_credentials flipped to False (bearer rides Authorization, never cookies). - W7 #1 wiring: mount health-public + health-protected separately. services/rag/app/config.py - W7 #8: require_custom_internal_token accepts BOTH RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN via pydantic AliasChoices. services/rag/Dockerfile + services/convex/Dockerfile - W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`). RAG ingests untrusted PDFs/DOCX through native parsers; biggest blast radius in the stack, now hardened. - W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from both runtime + scratch-squash stages and the matching bake in services/convex/Dockerfile. Operators MUST supply via env / compose / k8s secret. services/platform/convex/lib/helpers/rag_config.ts - W7 #9 F1: `redirect: 'manual'` on every ragFetch. - W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2). - W7 #9 F3: strip trailing `.` before hostname blocklist lookup. - W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding + env rotation mitigation). - W7 #9 F9: deleted path.startsWith('http') override branch (future- bypass foot-gun). services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts - W7 #10: pass timeoutMs=60_000 (default 10s was a regression). - Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to defend against an adversarial RAG response.
… 2FA pepper P0-16 — `scrubSubjectAuditLogs` doesn't clear `actorEmailHash` / `actorIpHash` (round-1 #8, round-2 V6). Peppered hashes are pseudonymized PII per GDPR Art 4(5) — they're still personal data and must be cleared on Art 17 erasure. Without this, a subject's audit-chain entries kept a stable identifier even after `scrubSubjectAuditLogs` "scrubbed" them; re-identification was possible by the controller (or anyone with the pepper) by hashing a known email. The signed `pii_scrub` checkpoint window already permits the row's hash to diverge from its original (verifier skips chain re-compute inside the window), so clearing these columns is chain-safe — just two added field clears in the patch. P0-17 — `notifications` table has no retention or erasure coverage (round-1 #8, round-2 V6). In-app notifications carry the subject's peppered email + IP in `params` (lockout alerts, system messages). Without retention they accumulated indefinitely AND survived subject erasure. Fix: - Added `'notifications'` to `RETENTION_CATEGORIES` + policy schema fields. Wired into bounds-proposal map + bounds validator + clamp. - New `cleanupNotifications` action category: hard-delete on TTL (no two-pass trash — admin telemetry isn't user-restorable), gated by org-wide hold only. - New `listExpiredNotifications` query + `deleteExpiredNotification` mutation (cross-org guard + mid-flight org-hold re-check). - New `eraseSubjectNotifications` for Art 17 cascade: matches params.email against plaintext OR peppered-hash form so rows written under either pepper state are covered. Wired into `processErasureRequest`. P1-F — 2FA writes plaintext email/IP to audit chain (round-1 #9, round-2 V6). Switched 2FA's recordFailure / clearOnSuccess / logEnrollmentEvent to splitEmailForAudit / splitIpForAudit shape; matches login_attempts so a single TALE_AUDIT_PEPPER env-var flip rotates the whole chain. Verified: typecheck clean; 599 tests pass across affected dirs.
…heck - audit_hash: add lifecycleStatus + statusChangedAt to EXCLUDED_FIELDS so retention soft-delete (markRowExpiredGeneric) patching audit log rows doesn't poison the chain hash recompute. Pre-fix, ANY soft-deleted audit row caused verifyIntegrity to fail valid=false from that row forward. Round-2 review CRITICAL #8. - audit_logs/validators: declare lifecycleStatus + statusChangedAt on auditLogItemValidator so query-return validation accepts soft-deleted rows. Defense-in-depth alongside the EXCLUDED_FIELDS fix. - verify_integrity: anchor candidate filter accepts subtype === undefined (legacy retention checkpoints written before subtype field existed). Strict equality dropped them, breaking verifyIntegrity for any deployment that ran retention pre-upgrade. Match canonicalCheckpointPayload's `?? 'retention'` fallback. Round-2 review CRITICAL #9. - verify_integrity: add fromTimestamp arg for paged resumption + suppress isFirstEntry head-anchor when paging mid-chain. Pre-fix, the response promised "page from lastVerifiedTimestamp + 1" but the query had no such arg — large-org chains could not be paged. - verify_integrity: drop unsignedScrubSubjects Set (security-flavored dead code; unsignedScrubCount alone tracks the metric). The set was populated but never read; the actual gate is `!hasSigningKey`. Comment clarified. - verify_integrity: type entries as Doc<'auditLogs'>[] instead of an open `[key: string]: unknown` index signature. - audit_logs/internal_mutations: delete archiveOldLogs deprecated re-export — zero callers, AGENTS.md prohibits @deprecated tombstones. - audit_logs/helpers (createAuditLog): introduce buildAuditRecordHashInput as single source of truth for the canonical record payload — both writer and self-check call it, eliminating drift risk that schema additions could change the hash output across writes vs verify. - audit_logs/helpers (createAuditLog): genesis sentinel — read + patch the per-org auditLogChainGenesis row before each write. This forces OCC contention on a real document for the first audit write per org, closing the genesis-fork race where two concurrent first-writers both observe lastEntry=null and commit two roots with previousHash=''. Round-2 review CRITICAL #10. - audit_logs/helpers (createAuditLog): inline self-check on every write recomputes the prior row's integrity hash and console.errors on mismatch. Catches naive scenario-1 tampering (field changed, hash not updated) at the next legitimate audit write — the only automated tamper detection today. Wrapped in try/catch and skips piiScrubbed rows so it cannot affect the legitimate write path. Round-2 review C.5. Lint + typecheck clean. Convex codegen succeeded.
Closes #9, #10, #11, #12 — cascade correctness + GC durability. - `personalization_cascade.ts:cascadeOnOrgDeleted` swaps delete order: `db.delete` runs FIRST, then `storage.delete` inside the try/catch. Matches the documented contract in `tts/cascade_helpers.ts:55-62` — Convex `_storage` writes are out-of-band and not rolled back on tx abort, so the reverse order leaves a surviving row pointing at a dead storageId (404 on `/api/tts-audio`). - `threads/cascade_helpers.ts` step 7c TTS cleanup gets the same swap, for the same reason. - `cascadeOnTtsForMemberRemoved` per-mutation page cap lowered from 50 (~10K writes) to 30 (~6K writes) to stay under Convex's ~8K per-mutation write budget. `cascadeOnOrgDeleted` gets the same cap reduction. The hourly cron picks up whatever doesn't fit in a single pass — still well inside the 30-day Art 12(3) GDPR window. - `gcOrgTtsChunks` now persists its org-cursor in a new singleton `ttsGcCursor` table between cron runs. A deployment with more orgs than `MAX_ORGS_PER_RUN` now advances through the full org list over successive hours instead of restarting from the lex-first org every time and starving lex-tail orgs forever. On reaching the end of the org list the cursor wraps to null and the next run starts over. - `gcOrgTtsChunks` skip-empty: an org with no rows older than the retention cutoff no longer counts against `MAX_ORGS_PER_RUN`. Without this, a busy tail of stale orgs sandwiched behind quiet lex-leading orgs would never get reaped.
Closes #9, #10, #11, #12 — cascade correctness + GC durability. - `personalization_cascade.ts:cascadeOnOrgDeleted` swaps delete order: `db.delete` runs FIRST, then `storage.delete` inside the try/catch. Matches the documented contract in `tts/cascade_helpers.ts:55-62` — Convex `_storage` writes are out-of-band and not rolled back on tx abort, so the reverse order leaves a surviving row pointing at a dead storageId (404 on `/api/tts-audio`). - `threads/cascade_helpers.ts` step 7c TTS cleanup gets the same swap, for the same reason. - `cascadeOnTtsForMemberRemoved` per-mutation page cap lowered from 50 (~10K writes) to 30 (~6K writes) to stay under Convex's ~8K per-mutation write budget. `cascadeOnOrgDeleted` gets the same cap reduction. The hourly cron picks up whatever doesn't fit in a single pass — still well inside the 30-day Art 12(3) GDPR window. - `gcOrgTtsChunks` now persists its org-cursor in a new singleton `ttsGcCursor` table between cron runs. A deployment with more orgs than `MAX_ORGS_PER_RUN` now advances through the full org list over successive hours instead of restarting from the lex-first org every time and starving lex-tail orgs forever. On reaching the end of the org list the cursor wraps to null and the next run starts over. - `gcOrgTtsChunks` skip-empty: an org with no rows older than the retention cutoff no longer counts against `MAX_ORGS_PER_RUN`. Without this, a busy tail of stale orgs sandwiched behind quiet lex-leading orgs would never get reaped.
Summary by CodeRabbit
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.