fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff by tofikwest · Pull Request #2885 · trycompai/comp

tofikwest · 2026-05-21T04:23:55Z

Summary

Fixes two coordinated bugs in the AWS Auto-Remediate dialog that customers hit on the AWS Config recorder not configured finding (and others requiring a service-linked role or a configure-only plan):

Bug A — Null required AWS SDK param: the AI sometimes generates CreateServiceLinkedRoleCommand without populating AWSServiceName. AWS rejects the call with the cryptic "Member must not be null". Affects Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, AWS Backup.
Bug B — Empty {} → {} diff in the Auto-Remediate dialog: enrichEmptyState only fired when fixSteps contained Create* commands. For plans whose actionable steps are mostly Put*/Start*/Update* (e.g., Config recorder configure flow), the dialog showed no meaningful state diff.

Both fixes are systemic, not Config-recorder-specific. The change is fully behind the existing AI remediation pipeline — no schema changes, no migrations, no shared-infra mutations.

What changed (commit-by-commit)

feat(cloud-tests): add deterministic AWS plan normalizer for SLR params — new pure module plan-normalizer.ts that runs after AI plan generation. Backfills AWSServiceName from the nearest non-IAM/STS neighbor service step. Idempotent, doesn't mutate input. 18 unit tests.
feat(cloud-tests): fail fast on missing required AWS command params — extends validatePlanSteps with a narrow REQUIRED_PARAMS check (CreateServiceLinkedRole, PutConfigurationRecorder, PutDeliveryChannel, StartConfigurationRecorder, PutBucketPolicy, CreateTrail). Replaces AWS's cryptic null error with Step N (CommandName): Required param "X" is missing or empty. 11 unit tests.
fix(cloud-tests): show meaningful Auto-Remediate diff for configure-only plans — broadens enrichEmptyState to emit configured:false → configured:true + willChange: [...] for plans whose only actionable steps are non-Create*. Wires normalizeFixPlan into all three plan-producing sites (generateFixPlan happy, refineFixPlan happy, refineFixPlan catch). Existing Create* behavior preserved exactly.
docs(cloud-tests): add SLR AWSServiceName mapping to the remediation prompt — adds an explicit mandatory mapping table to the AI system prompt for the 8 SLR-requiring services. Belt-and-suspenders on top of the deterministic normalizer.

Why this is safe

No schema / migration changes. Pure code change inside the AI remediation path.
Pure functions, idempotent. Running the normalizer twice produces the same result (test in place).
Happy path preserved. When the AI gets the plan right, normalizer is a no-op; when state is non-empty, enrichEmptyState short-circuits; when required params are present, validatePlanSteps adds no errors. Existing willCreate shape for Create-from-scratch plans is unchanged.
Fail-closed defense. If the normalizer can't infer AWSServiceName, validatePlanSteps blocks the call with a clear error BEFORE we make a billed AWS request. Old behavior would surface AWS's cryptic message after the call.

Test plan

cd apps/api && npx jest src/cloud-security --passWithNoTests → 215/215 pass (one suite failure is the pre-existing remediation.controller.spec.ts Postgres-TLS env issue, identical on main, unrelated to this change)
No new typecheck errors introduced (the existing failures in integration-platform, offboarding-checklist.service.spec, risks.controller.spec, timelines.service.spec, training-certificate-pdf.service.spec are pre-existing on main)
Manual: click Fix on AWS Config recorder not configured → preview shows non-empty state, Fix succeeds (no aWSServiceName null error)
Manual regression: click Fix on a known-working finding (e.g., S3 bucket versioning disabled) → behavior unchanged
Manual: click Fix on other SLR-requiring findings (GuardDuty / Inspector / Macie) if available → same expectation
DB sanity: confirm RemediationAction.status ends in success and the next scan writes a FindingResolution row

🤖 Generated with Claude Code

Summary by cubic

Fixes null AWSServiceName on service-linked roles and {} → {} preview diffs, and adds a one-shot repair that retries validation failures with corrected params before failing.

Bug Fixes
- Added normalizeFixPlan to backfill AWSServiceName for CreateServiceLinkedRoleCommand by inferring from nearby non‑IAM steps; supports Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, Backup; idempotent and applied in all plan paths.
- Made configure‑only plans show meaningful diffs: emit configured:false → configured:true with willChange:[...]; preserve existing exists:false → exists:true for create‑from‑scratch.
- Added REQUIRED_PARAMS checks in validatePlanSteps to fail fast with clear messages (step index + command) for missing top‑level params on SLR, Config recorder, S3 bucket policy, and CloudTrail commands; updated the remediation prompt with mandatory SLR mappings.
New Features
- Introduced one‑shot step repair on validation errors: the executor detects errors with looksLikeValidationError and uses a repairStep callback to retry once with corrected params; guardrails enforce same service/command; wired only for fix‑step execution.
- Implemented refineStepFromError to return a corrected step (same service + command) using the failing error and plan context; added focused tests for detection and repair flow alongside normalizer/validator tests.

^{Written for commit 5cffaff. Summary will update on new commits. Review in cubic}

Adds a pure post-processing step that runs after the AI generates a fix plan to backfill cross-step values the model does not reliably emit. Today the normalizer handles a single concrete bug: when a plan needs a service-linked role (Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, Backup), the AI sometimes generates CreateServiceLinkedRoleCommand without populating AWSServiceName. AWS rejects the call with a cryptic "Member must not be null" error, leaving the customer stuck on the Auto-Remediate dialog with no actionable diagnosis. The normalizer infers the right principal from a nearest-neighbor scan across the plan's other steps (deterministic, idempotent, no mutation of input). Wiring it into the AI remediation service happens in a separate commit so each phase is reviewable in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends validatePlanSteps with a REQUIRED_PARAMS check that runs before the executor makes the AWS SDK call. Today AWS surfaces a cryptic "Member must not be null" when a required top-level param is missing (e.g., AWSServiceName on CreateServiceLinkedRoleCommand, ConfigurationRecorder on PutConfigurationRecorderCommand), which leaves the Auto-Remediate dialog with an unhelpful error and the operation already billed. The check is intentionally narrow — only commands where we've seen the AI omit a required param in practice and where the SDK's error is unhelpful. The error message includes the step index and command name so customers know exactly which step is broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nly plans Two coordinated fixes to the AI remediation pipeline: 1. Broadens enrichEmptyState to handle plans whose actionable steps are not Create*. Previously a plan composed of Put*/Start*/Update*/ Enable*/Attach*/Set*/Modify* steps with both currentState and proposedState empty rendered as "{} → {}" in the Auto-Remediate dialog. The backstop now emits configured:false → configured:true with a willChange list for these plans. Pure Create* plans keep the existing exists:false → exists:true / willCreate language. 2. Wires the plan normalizer into all three sites that produce a FixPlan: generateFixPlan (happy path), refineFixPlan (happy path), and refineFixPlan (catch fallback). The normalizer's CreateServiceLinkedRoleCommand AWSServiceName backfill now runs on every plan that reaches the UI or the executor, so the bug-reported "Member must not be null" failure on the AWS Config recorder finding is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…prompt Adds a mandatory mapping table after the existing service-linked role guidance so the AI emits AWSServiceName values in the first pass for the eight AWS services we currently auto-fix. This is a belt-and- suspenders fix on top of the deterministic plan normalizer — the normalizer guarantees correctness; this reduces the rate at which the bug occurs in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-21T04:24:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
app	Ready	Preview, Comment	May 21, 2026 5:00am
comp-framework-editor	Ready	Preview, Comment	May 21, 2026 5:00am
portal	Ready	Preview, Comment	May 21, 2026 5:00am

cubic-dev-ai

No issues found across 7 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Re-trigger cubic}

The per-command REQUIRED_PARAMS map and the SLR plan normalizer cover the bugs we know about — but the AI can omit a required param on ANY AWS command in ANY future plan, and we shouldn't ship a fix that needs a code change every time a new finding hits this class of bug. This adds a universal escape hatch: 1. New `looksLikeValidationError(message)` detector that pattern-matches AWS's standard validation wording ("Member must not be null", "failed to satisfy constraint", "ValidationException", etc.) — no per-command enumeration. 2. New `StepRepairFn` callback contract on `executePlanSteps`. When a step fails AND the rules-based fixer in `tryAutoFixValidationError` can't recover AND the error looks like a validation error, the executor calls the repair callback exactly once with the failing step, the AWS error, and the step index. If the callback returns a refined step with different params (same service + command), the executor retries with it. 3. New `AiRemediationService.refineStepFromError` that fulfills the contract — sends the failing step, AWS error verbatim, neighbor steps, and finding context to the model and asks for a corrected single step. Defensive guard: discards the refinement if the model swaps to a different service or command. 4. `RemediationService.executeRemediation` wires the repair callback at the fix-step execution site only (not reads or rollback) so the AI cost is paid only on user-initiated fix attempts that actually need it. This is the universal fix the customer asked for: it uses AWS's own validation as ground truth, scales to any future AWS command without code changes, and pays the Opus cost only on the failure path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tofikwest and others added 4 commits May 21, 2026 00:22

vercel Bot deployed to Preview – comp-framework-editor May 21, 2026 04:24 View deployment

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview – portal May 21, 2026 04:49 Inactive

vercel Bot temporarily deployed to Preview – app May 21, 2026 04:49 Inactive

vercel Bot deployed to Preview – comp-framework-editor May 21, 2026 04:52 View deployment

Merge branch 'main' into tofik/fix-auto-remediate-null-params

5cffaff

tofikwest merged commit eb52c8d into main May 21, 2026
8 of 11 checks passed

tofikwest deleted the tofik/fix-auto-remediate-null-params branch May 21, 2026 04:58

vercel Bot deployed to Preview – portal May 21, 2026 04:59 View deployment

vercel Bot deployed to Preview – app May 21, 2026 04:59 View deployment

vercel Bot deployed to Preview – comp-framework-editor May 21, 2026 05:00 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff#2885

fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff#2885
tofikwest merged 6 commits into
mainfrom
tofik/fix-auto-remediate-null-params

tofikwest commented May 21, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tofikwest commented May 21, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed (commit-by-commit)

Why this is safe

Test plan

Summary by cubic

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tofikwest commented May 21, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented May 21, 2026 •

edited

Loading