Skip to content

fix(api): repair showcase safer promote cascade#325

Merged
w7-mgfcode merged 3 commits into
devfrom
fix/showcase-safer-promote-artifact-uri
May 31, 2026
Merged

fix(api): repair showcase safer promote cascade#325
w7-mgfcode merged 3 commits into
devfrom
fix/showcase-safer-promote-artifact-uri

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

@w7-mgfcode w7-mgfcode commented May 31, 2026

Summary

Repairs the showcase_rich demo pipeline cascade where the planning phase failed on the safer-promote placeholder artifact. Fix is contained entirely in the app/features/demo slice (no scenarios/registry changes).

Closes #324.

Root cause

Two compounding issues in app/features/demo/pipeline.py:

  1. step_safer_promote_flow (PRP-39, decision phase) swapped the demo-production alias to a deliberately-worse run whose artifact_uri was the literal placeholder demo/safer-promote-placeholder.joblib.
  2. step_scenario_simulate_and_save (PRP-40, planning phase, runs after the swap) resolved that live alias, read the placeholder URI, and _parse_artifact_key() (regex model_([0-9a-f]+)(?:\.joblib)?$) raised Cannot parse artifact-key… → the step failed and the pipeline aborted at step 16.

Aggravating factor: run_pipeline stops on the first failed step and step_cleanup is the last row, so a failure between the swap and cleanup left demo-production permanently pointing at the worse-WAPE run (the R15 restore never ran).

Fix

  • Resolve the champion via ctx.winning_run_id (recorded by step_register, never touched by the swap) instead of the live alias in step_scenario_simulate_and_save; fall back to the alias only when no champion was recorded. This routes replay to a run with a real, parseable, loadable artifact — fixing both the parse error and the latent "model not found" failure.
  • step_safer_promote_flow now writes a real-shape, parseable artifact_uri (demo/seasonal_naive-model_{hex12}.joblib) instead of the placeholder.
  • New _restore_demo_alias_after_failure safeguard wired into the orchestrator fail-branch — restores demo-production to the champion on any non-cleanup mid-run failure (best-effort, never raises), so the alias is never left on the worse run.

Tests / validation

  • app/features/demo/tests/test_pipeline.py: updated the 3 scenario tests for the new champion-resolution path; added 5 regressions — _parse_artifact_key rejects the old placeholder / accepts the new shape, the step ignores the corrupted alias, and the alias-restore safeguard (repoint / no-op / swallows-errors).
  • tests/test_e2e_demo.py: removed the KNOWN_PREEXISTING_FAILURES = {"scenario_simulate_and_save"} tolerance; the step must now pass (only environment-dependent knowledge-phase steps are tolerated).
  • docs/_base/RUNBOOKS.md: entry 18 marked FIXED.

Gates run locally:

  • ruff check . — passed
  • ruff format --check . — passed
  • pytest -m "not integration" — 1807 passed, 7 skipped, 3 deselected
  • demo slice — 57 passed
  • live integration pytest -m integration tests/test_e2e_demo.py::test_run_demo_showcase_rich_full_epic — passed (pipeline reaches agents + ops + cleanup; cascade gone)
  • mypy app/ — only pre-existing xgboost stub errors; none in changed files
  • pyright — clean (0 errors) on the changed files

Notes

  • Pre-existing dirty working-tree files were intentionally not included in this commit/PR: .gitignore and docs/user-guide/showcase-manual-demo-guide.md.
  • Out of scope (surfaced, not fixed): the knowledge phase's rag_index_subset fails rather than skipping gracefully when the embedding provider returns HTTP 401 from a placeholder key — a separate pre-existing RAG-phase robustness gap.

Summary by Sourcery

Fix the showcase_rich demo pipeline cascade by resolving scenario simulation against the recorded champion run instead of the mutable demo-production alias, hardening alias handling, and tightening end-to-end expectations.

Bug Fixes:

  • Resolve the scenario_simulate_and_save step against ctx.winning_run_id instead of the potentially corrupted demo-production alias, preventing artifact URI parse failures.
  • Ensure safer_promote_flow writes a real, parseable artifact URI instead of an unparseable placeholder to avoid downstream model loading errors.
  • Add a failure-path safeguard that restores the demo-production alias to the original champion when the pipeline aborts mid-run so the alias is not left pointing at a worse run.

Enhancements:

  • Strengthen demo scenario tests and end-to-end checks to assert that scenario_simulate_and_save now passes and that only environment-dependent knowledge steps may legitimately fail.

Documentation:

  • Update the RUNBOOK entry for scenario_simulate_and_save artifact-key parse failures to mark the cascade as fixed and document the new behavior and safeguards.

Tests:

  • Extend demo pipeline unit tests with regressions for champion resolution, parsing of safer-promote artifact URIs, and the alias-restore safeguard behavior.
  • Tighten the demo end-to-end test to require scenario_simulate_and_save to succeed and to constrain allowable failing steps to specific environment-dependent knowledge-phase steps.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 31, 2026

Reviewer's Guide

Fixes the showcase_rich demo pipeline cascade by resolving the scenario planning step against the recorded champion run instead of the live alias, giving safer_promote a parseable artifact URI, and adding a best-effort alias-restore safeguard, with tests and runbook docs updated accordingly.

Sequence diagram for updated demo pipeline failure handling and alias restore

sequenceDiagram
    participant run_pipeline
    participant step_safer_promote_flow
    participant step_scenario_simulate_and_save
    participant step_cleanup
    participant _restore_demo_alias_after_failure
    participant Registry

    run_pipeline->>step_safer_promote_flow: step_safer_promote_flow(ctx, client)
    step_safer_promote_flow->>Registry: client.request(POST /registry/aliases)
    Registry-->>step_safer_promote_flow: alias swapped to worse run
    step_safer_promote_flow-->>run_pipeline: StepResult

    run_pipeline->>step_scenario_simulate_and_save: step_scenario_simulate_and_save(ctx, client)
    alt ctx.winning_run_id is None
        step_scenario_simulate_and_save->>Registry: client.request(GET /registry/aliases/demo-production)
        Registry-->>step_scenario_simulate_and_save: alias_body(run_id)
    else ctx.winning_run_id is set
        step_scenario_simulate_and_save-->>step_scenario_simulate_and_save: winner_run_id = ctx.winning_run_id
    end
    step_scenario_simulate_and_save->>Registry: client.request(GET /registry/runs/{winner_run_id})
    Registry-->>step_scenario_simulate_and_save: run_body(artifact_uri)
    step_scenario_simulate_and_save-->>run_pipeline: StepResult

    alt some_step (not cleanup) fails
        run_pipeline->>_restore_demo_alias_after_failure: _restore_demo_alias_after_failure(ctx, client)
        _restore_demo_alias_after_failure->>Registry: client.request(POST /registry/aliases)
        Registry-->>_restore_demo_alias_after_failure: response
        _restore_demo_alias_after_failure-->>run_pipeline: return
        run_pipeline-->>run_pipeline: pipeline aborted
    else all steps succeed
        run_pipeline->>step_cleanup: step_cleanup(ctx, client)
        step_cleanup->>Registry: client.request(POST /registry/aliases)
        Registry-->>step_cleanup: response
        step_cleanup-->>run_pipeline: StepResult
    end
Loading

Flow diagram for champion resolution in scenario_simulate_and_save

flowchart TD
    A[start step_scenario_simulate_and_save] --> B{ctx.winning_run_id is not None}
    B -- Yes --> C[Set winner_run_id = ctx.winning_run_id]
    B -- No --> D["client.request(GET /registry/aliases/DEMO_ALIAS)"]
    D --> E{alias_run_id is str}
    E -- No --> F[return fail: DEMO_ALIAS alias has no run_id]
    E -- Yes --> G[Set winner_run_id = alias_run_id]
    C --> H["client.request(GET /registry/runs/{winner_run_id})"]
    G --> H
    H --> I[Continue with scenario_simulate_and_save]
Loading

File-Level Changes

Change Details Files
Scenario planning step now resolves the champion run via DemoContext.winning_run_id with a defensive fallback to the demo-production alias.
  • Updated step_scenario_simulate_and_save to use ctx.winning_run_id as the primary source of the champion run id.
  • Added defensive fallback to GET /registry/aliases/{DEMO_ALIAS} only when no champion was recorded, preserving previous alias-based behavior in that edge case.
  • Adjusted tests to mock run lookups by champion id instead of by alias and to assert that the alias is not read on the happy path.
app/features/demo/pipeline.py
app/features/demo/tests/test_pipeline.py
Safer-promote decision step now writes a real, regex-parseable artifact_uri instead of a placeholder file name.
  • Changed step_safer_promote_flow to construct an artifact_uri in the V1 shape demo/{model_type}-model_{KEY}.joblib.
  • Derived KEY from the worse_run_id, stripping dashes and truncating to 12 hex characters to satisfy the model_([0-9a-f]+) regex used by _parse_artifact_key.
  • Added a regression test ensuring the old placeholder artifact_uri is rejected while the new format parses correctly.
app/features/demo/pipeline.py
app/features/demo/tests/test_pipeline.py
New safeguard restores the demo-production alias to the original champion after mid-run failures, and the orchestrator now calls it on failure paths.
  • Introduced _restore_demo_alias_after_failure to POST /registry/aliases with ctx.original_demo_alias_run_id, logging and swallowing transport/step errors so it never masks the original failure.
  • Wired run_pipeline to call the safeguard whenever a step fails and the failing step is not cleanup, ensuring alias restoration even when cleanup is skipped.
  • Added tests for the safeguard’s success, no-op when no swap occurred, and error-swallowing behavior.
app/features/demo/pipeline.py
app/features/demo/tests/test_pipeline.py
End-to-end demo and runbook documentation updated to assert the cascade is fixed and to constrain tolerated failures to environment-dependent knowledge-phase steps.
  • Updated the e2e showcase_rich test to require scenario_simulate_and_save to pass and to only tolerate failures in rag_index_subset and rag_retrieve_probe as environment-dependent.
  • Replaced the previous KNOWN_PREEXISTING_FAILURES logic with explicit assertions about the scenario step’s success and acceptable failing steps.
  • Rewrote RUNBOOKS entry 18 to mark the artifact-key parse failure as FIXED, document the dual root causes, the implemented fixes, and remaining guidance for irregular artifact_uri shapes.
tests/test_e2e_demo.py
docs/_base/RUNBOOKS.md

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf773a76-0ed3-47b4-b561-aa760b647e2a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/showcase-safer-promote-artifact-uri

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In _restore_demo_alias_after_failure, the logger.warning call drops the underlying exception details; consider capturing and logging the exception (exc_info=True or including the error message) so alias-restore issues are diagnosable without changing behavior.
  • The explanatory comments in step_scenario_simulate_and_save, _restore_demo_alias_after_failure, and the test_run_demo_showcase_rich_full_epic assertions are quite long and tightly coupled to specific issue numbers; consider trimming or extracting the core invariants to reduce future drift while keeping the behavior-focused parts of the comments.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_restore_demo_alias_after_failure`, the `logger.warning` call drops the underlying exception details; consider capturing and logging the exception (`exc_info=True` or including the error message) so alias-restore issues are diagnosable without changing behavior.
- The explanatory comments in `step_scenario_simulate_and_save`, `_restore_demo_alias_after_failure`, and the `test_run_demo_showcase_rich_full_epic` assertions are quite long and tightly coupled to specific issue numbers; consider trimming or extracting the core invariants to reduce future drift while keeping the behavior-focused parts of the comments.

## Individual Comments

### Comment 1
<location path="app/features/demo/pipeline.py" line_range="1787-1788" />
<code_context>
+            # consumer that parses it via ``_parse_artifact_key`` does not choke
+            # on a placeholder. KEY is hex-only (dashes stripped) to satisfy the
+            # ``model_([0-9a-f]+)`` parser regex.
+            "artifact_uri": (
+                f"demo/seasonal_naive-model_{worse_run_id_raw.replace('-', '')[:12]}.joblib"
+            ),
             "artifact_hash": "0" * 64,
</code_context>
<issue_to_address>
**suggestion:** Consider centralizing artifact key formatting to avoid relying on an inline magic slice and keep it aligned with the parser.

This logic tightly couples URI formatting to the `model_([0-9a-f]+)` regex and hard-codes the `[:12]` truncation. To make future changes safer (e.g., regex or key length changes), consider extracting artifact key generation into a helper shared with `_parse_artifact_key` (or at least a small local helper), so the format/truncation rules and assumptions about `worse_run_id_raw` are defined in one place.

Suggested implementation:

```python
    artifact_key = _format_demo_artifact_key(worse_run_id_raw)

    run_body = await client.request(
        json_body={
            "status": "success",
            "metrics": {"wape": 99.0},
            # issue #324 — write a real-shape, parseable artifact_uri (V1 demo
            # shape ``demo/{model_type}-model_{KEY}.joblib``) so any downstream
            # consumer that parses it via ``_parse_artifact_key`` does not choke
            # on a placeholder. KEY is hex-only (dashes stripped) to satisfy the
            # ``model_([0-9a-f]+)`` parser regex.
            "artifact_uri": f"demo/seasonal_naive-model_{artifact_key}.joblib",
            "artifact_hash": "0" * 64,
            "artifact_size_bytes": 1,
        },
    )

```

```python
def _format_demo_artifact_key(run_id_raw: str) -> str:
    """Format the artifact key for demo runs.

    This keeps the formatting aligned with the ``model_([0-9a-f]+)`` parser
    regex by stripping dashes and truncating to the expected key length.
    """
    normalized = run_id_raw.replace("-", "")
    # NOTE: if the parser regex or expected key length changes, update this helper.
    return normalized[:12]


async def _restore_demo_alias_after_failure(ctx: DemoContext, client: _Client) -> None:
    """Best-effort restore of the demo-production alias after a mid-run failure.

    issue #324 — when a step fails the pipeline aborts before the trailing
    ``cleanup`` row runs, which would otherwise leave ``demo-production``

```

To fully centralize formatting with `_parse_artifact_key`, consider:
1. Updating `_parse_artifact_key` (wherever it is defined) to either:
   - Use `_format_demo_artifact_key` for constructing/validating keys, or
   - Move `_format_demo_artifact_key` into a shared module (e.g., a `utils` or `artifacts` helper) and have both this pipeline code and `_parse_artifact_key` import it.
2. If the `model_([0-9a-f]+)` regex or the required key length changes, adjust `_format_demo_artifact_key` accordingly and ensure `_parse_artifact_key`’s regex stays in sync.
</issue_to_address>

### Comment 2
<location path="app/features/demo/pipeline.py" line_range="1975-1980" />
<code_context>
+                "description": ("Restored by the showcase pipeline failure safeguard (#324)."),
+            },
+        )
+    except (_StepError, httpx.HTTPError, OSError):
+        # Best-effort — a restore failure must never mask the original failure.
+        logger.warning(
+            "demo.cleanup.alias_restore_safeguard_failed",
+            run_id=ctx.original_demo_alias_run_id,
+        )
</code_context>
<issue_to_address>
**suggestion:** Log the exception details when the alias restore safeguard fails.

The safeguard correctly avoids masking the original failure by swallowing `_StepError`, `httpx.HTTPError`, and `OSError`, but the warning only logs the event name and `run_id`, so you lose the exception details needed to debug intermittent restore issues. Please include the exception context (e.g., `exc_info=True` on `logger.warning`) so stack traces are captured while keeping the current failure semantics unchanged.

```suggestion
    except (_StepError, httpx.HTTPError, OSError):
        # Best-effort — a restore failure must never mask the original failure.
        logger.warning(
            "demo.cleanup.alias_restore_safeguard_failed",
            run_id=ctx.original_demo_alias_run_id,
            exc_info=True,
        )
```
</issue_to_address>

### Comment 3
<location path="tests/test_e2e_demo.py" line_range="503-512" />
<code_context>
+    scenario_step = by_name.get("scenario_simulate_and_save")
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting that the overall pipeline status is pass when there are no env-dependent failures

You could add an assertion that when `failed` is empty, `result["overall_status"] == "pass"`. This would help catch future regressions where the overall status diverges from the per-step statuses and keep this e2e test aligned with the intended behavior.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread app/features/demo/pipeline.py Outdated
Comment thread app/features/demo/pipeline.py
Comment thread tests/test_e2e_demo.py Outdated
Comment on lines +503 to +512
scenario_step = by_name.get("scenario_simulate_and_save")
assert scenario_step is not None, "scenario_simulate_and_save did not run on showcase_rich"
assert scenario_step["status"] == "pass", (
"scenario_simulate_and_save must pass after #324, got "
f"status={scenario_step['status']!r} detail={scenario_step['detail']!r}"
)

# Any OTHER failed step must be an environment-dependent knowledge-phase step
# (embedding provider unreachable / misconfigured key). Those are designed to
# skip gracefully when the provider is absent (RUNBOOKS entry 20-22); a real
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider asserting that the overall pipeline status is pass when there are no env-dependent failures

You could add an assertion that when failed is empty, result["overall_status"] == "pass". This would help catch future regressions where the overall status diverges from the per-step statuses and keep this e2e test aligned with the intended behavior.

@w7-mgfcode w7-mgfcode merged commit 6dd1708 into dev May 31, 2026
8 checks passed
@w7-mgfcode w7-mgfcode deleted the fix/showcase-safer-promote-artifact-uri branch May 31, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant