fix: post-audit P1 hardening (RLS, shutdown, CORS, secrets, cleanup)#10
Merged
yuzushi-dev merged 4 commits intomainfrom Apr 22, 2026
Merged
fix: post-audit P1 hardening (RLS, shutdown, CORS, secrets, cleanup)#10yuzushi-dev merged 4 commits intomainfrom
yuzushi-dev merged 4 commits intomainfrom
Conversation
These were never pytest test modules - they are smoke scripts with asyncio entrypoints used to probe provider factory, NIM timeout, and OpenRouter behaviour by hand. Having them under src/ meant they got included in the Docker image and cluttered the production package. Refs: docs/plans/2026-04-17-production-audit-fix-plan.md (P1-3)
…utdown Neo4jClient and MilvusVectorStore both expose close(), not aclose() (only the Redis async client has aclose). The wrong method name raised AttributeError on every Platform.shutdown(), leaving SQLAlchemy pool connections un-returned and triggering the follow-up "non-checked-in connection ... will be terminated" errors from the GC. Refs: docs/plans/2026-04-17-production-audit-fix-plan.md (P1-2)
RLS policies on tenant tables read current_setting(\"app.current_tenant\") without the missing_ok flag, so any request that reached get_db_session without a bound tenant raised "unrecognized configuration parameter" 42704 on the first query. The super-admin cross-tenant metrics path in admin/maintenance was the loudest symptom, logging "Failed to resolve document names" for every call. Fix: always emit set_config(), using an empty string when no tenant is bound. Behaviour for normal tenant-scoped requests is unchanged. Refs: docs/plans/2026-04-17-production-audit-fix-plan.md (P1-1)
Two startup guardrails: - CORS: a missing CORS_ORIGINS config used to silently fall back to "*". With DEBUG=false we now refuse to boot and raise RuntimeError, forcing the operator to set an explicit allow-list. DEBUG=true keeps the permissive default with a warning log. - SECRET_KEY_OLD: log a warning when the rotation-fallback secret is one of the well-known non-entropic dev defaults (amber-dev-key-2024, default-insecure-key). Those values defeat the dual-key keyring and leave legacy hashes forgeable. Refs: docs/plans/2026-04-17-production-audit-fix-plan.md (P1-4, P1-5)
| "SECRET_KEY_OLD is set to a known dev default (%s). " | ||
| "Complete the rotation and unset SECRET_KEY_OLD, or replace " | ||
| "it with the actual previous secret being retired.", | ||
| settings.secret_key_old[:12] + "…", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Landing the P1 block of the 2026-04-17 production audit plan (
docs/plans/2026-04-17-production-audit-fix-plan.md). Four independent fixes grouped by concern; each commit is self-contained and reviewable on its own.fix(api): always set
app.current_tenantGUC (P1-1)RLS policies on tenant tables read
current_setting('app.current_tenant')without the missing_ok flag, so requests reachingget_db_sessionwithout a bound tenant blew up with42704 unrecognized configuration parameter. The loudest symptom was the super-admin cross-tenant metrics endpoint loggingFailed to resolve document nameson every call. Fix: unconditionally emitset_config, using an empty string when no tenant is bound.fix(platform):
close()instead ofaclose()on Neo4j/Milvus shutdown (P1-2)Neo4jClientandMilvusVectorStoreexposeclose(), notaclose()(only the Redis async client hasaclose). The wrong method raisedAttributeErroron everyPlatform.shutdown(), leaving SQLAlchemy pool connections un-returned and triggering the downstreamnon-checked-in connection ... will be terminatederrors from the GC.chore: move
src/test_*.pyad-hoc scripts →scripts/debug/(P1-3)Four asyncio smoke scripts (
test_factory_chain,test_nim_timeout,test_or,test_timeout) were committed under the production Python package. Moved toscripts/debug/so they no longer ship in the Docker image.security(api): refuse wildcard CORS in prod + warn on dev
SECRET_KEY_OLD(P1-4, P1-5)CORS_ORIGINSused to silently fall back to"*". WithDEBUG=falsewe now raiseRuntimeErrorat startup, forcing the operator to declare an explicit allow-list. Debug mode keeps the permissive default with a warning log.amber-dev-key-2024,default-insecure-key), which defeat the dual-key keyring and leave legacy hashes forgeable.Test plan
Failed to resolve document names: ... unrecognized configuration parameter "app.current_tenant"Error closing Neo4j: 'Neo4jClient' object has no attribute 'aclose'Error closing Milvus store: 'MilvusVectorStore' object has no attribute 'aclose'non-checked-in connection ... will be terminated/v1/admin/maintenance/metrics/queriesshows document filenames (not doc_ids) on the default tenant at minimumDEBUG=falseandCORS_ORIGINSset, API boots; withCORS_ORIGINSunset, API refuses to start.env: removeSECRET_KEY_OLDonce rotation window is confirmed closed (or replace with the actual retiring key)