Problem
During #324 showcase_rich validation, the knowledge phase fails with a 401/502 cascade when the embedding provider has a placeholder or invalid OpenAI key, instead of skipping gracefully (as it already does when the provider is unreachable).
Root cause
- The demo probe only checks key presence (
bool(settings.openai_api_key), app/features/demo/pipeline.py), not validity → a placeholder key reports reachable=True.
- The indexing call then hits OpenAI, gets a 401, which
OpenAIEmbeddingProvider._embed_batch swallows in a generic except Exception and re-raises as a generic EmbeddingError (app/features/rag/embeddings.py).
/rag/index/project-docs and /rag/retrieve map EmbeddingError to 502 (app/features/rag/routes.py) with no way for the demo pipeline to tell auth from connection failure.
- The demo knowledge steps (
rag_index_subset, rag_retrieve_probe) only skip on ctx.embedding_unreachable, so an auth failure surfaces as a hard fail.
Fix (PR1, PRP-42)
- Classify embedding auth failures distinctly (new
EmbeddingAuthError), keep public /rag 502 status stable but add a machine-readable problem marker.
- Make the showcase_rich knowledge steps skip gracefully on an auth-classified failure.
- Tighten
tests/test_e2e_demo.py::test_run_demo_showcase_rich_full_epic so RAG knowledge steps may skip but must not fail.
- Update RUNBOOKS for the new behavior.
Out of scope
Manual dogfood + screenshots (PR2), hygiene/docs-drift (PR3), Playwright e2e.
Problem
During #324 showcase_rich validation, the knowledge phase fails with a 401/502 cascade when the embedding provider has a placeholder or invalid OpenAI key, instead of skipping gracefully (as it already does when the provider is unreachable).
Root cause
bool(settings.openai_api_key),app/features/demo/pipeline.py), not validity → a placeholder key reportsreachable=True.OpenAIEmbeddingProvider._embed_batchswallows in a genericexcept Exceptionand re-raises as a genericEmbeddingError(app/features/rag/embeddings.py)./rag/index/project-docsand/rag/retrievemapEmbeddingErrorto 502 (app/features/rag/routes.py) with no way for the demo pipeline to tell auth from connection failure.rag_index_subset,rag_retrieve_probe) only skip onctx.embedding_unreachable, so an auth failure surfaces as a hard fail.Fix (PR1, PRP-42)
EmbeddingAuthError), keep public/rag502 status stable but add a machine-readable problem marker.tests/test_e2e_demo.py::test_run_demo_showcase_rich_full_epicso RAG knowledge steps may skip but must not fail.Out of scope
Manual dogfood + screenshots (PR2), hygiene/docs-drift (PR3), Playwright e2e.