Skip to content

fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff#2885

Merged
tofikwest merged 6 commits into
mainfrom
tofik/fix-auto-remediate-null-params
May 21, 2026
Merged

fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff#2885
tofikwest merged 6 commits into
mainfrom
tofik/fix-auto-remediate-null-params

Conversation

@tofikwest
Copy link
Copy Markdown
Contributor

@tofikwest tofikwest commented May 21, 2026

Summary

Fixes two coordinated bugs in the AWS Auto-Remediate dialog that customers hit on the AWS Config recorder not configured finding (and others requiring a service-linked role or a configure-only plan):

  • Bug A — Null required AWS SDK param: the AI sometimes generates CreateServiceLinkedRoleCommand without populating AWSServiceName. AWS rejects the call with the cryptic "Member must not be null". Affects Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, AWS Backup.
  • Bug B — Empty {} → {} diff in the Auto-Remediate dialog: enrichEmptyState only fired when fixSteps contained Create* commands. For plans whose actionable steps are mostly Put*/Start*/Update* (e.g., Config recorder configure flow), the dialog showed no meaningful state diff.

Both fixes are systemic, not Config-recorder-specific. The change is fully behind the existing AI remediation pipeline — no schema changes, no migrations, no shared-infra mutations.

What changed (commit-by-commit)

  1. feat(cloud-tests): add deterministic AWS plan normalizer for SLR params — new pure module plan-normalizer.ts that runs after AI plan generation. Backfills AWSServiceName from the nearest non-IAM/STS neighbor service step. Idempotent, doesn't mutate input. 18 unit tests.
  2. feat(cloud-tests): fail fast on missing required AWS command params — extends validatePlanSteps with a narrow REQUIRED_PARAMS check (CreateServiceLinkedRole, PutConfigurationRecorder, PutDeliveryChannel, StartConfigurationRecorder, PutBucketPolicy, CreateTrail). Replaces AWS's cryptic null error with Step N (CommandName): Required param "X" is missing or empty. 11 unit tests.
  3. fix(cloud-tests): show meaningful Auto-Remediate diff for configure-only plans — broadens enrichEmptyState to emit configured:false → configured:true + willChange: [...] for plans whose only actionable steps are non-Create*. Wires normalizeFixPlan into all three plan-producing sites (generateFixPlan happy, refineFixPlan happy, refineFixPlan catch). Existing Create* behavior preserved exactly.
  4. docs(cloud-tests): add SLR AWSServiceName mapping to the remediation prompt — adds an explicit mandatory mapping table to the AI system prompt for the 8 SLR-requiring services. Belt-and-suspenders on top of the deterministic normalizer.

Why this is safe

  • No schema / migration changes. Pure code change inside the AI remediation path.
  • Pure functions, idempotent. Running the normalizer twice produces the same result (test in place).
  • Happy path preserved. When the AI gets the plan right, normalizer is a no-op; when state is non-empty, enrichEmptyState short-circuits; when required params are present, validatePlanSteps adds no errors. Existing willCreate shape for Create-from-scratch plans is unchanged.
  • Fail-closed defense. If the normalizer can't infer AWSServiceName, validatePlanSteps blocks the call with a clear error BEFORE we make a billed AWS request. Old behavior would surface AWS's cryptic message after the call.

Test plan

  • cd apps/api && npx jest src/cloud-security --passWithNoTests → 215/215 pass (one suite failure is the pre-existing remediation.controller.spec.ts Postgres-TLS env issue, identical on main, unrelated to this change)
  • No new typecheck errors introduced (the existing failures in integration-platform, offboarding-checklist.service.spec, risks.controller.spec, timelines.service.spec, training-certificate-pdf.service.spec are pre-existing on main)
  • Manual: click Fix on AWS Config recorder not configured → preview shows non-empty state, Fix succeeds (no aWSServiceName null error)
  • Manual regression: click Fix on a known-working finding (e.g., S3 bucket versioning disabled) → behavior unchanged
  • Manual: click Fix on other SLR-requiring findings (GuardDuty / Inspector / Macie) if available → same expectation
  • DB sanity: confirm RemediationAction.status ends in success and the next scan writes a FindingResolution row

🤖 Generated with Claude Code


Summary by cubic

Fixes null AWSServiceName on service-linked roles and {}{} preview diffs, and adds a one-shot repair that retries validation failures with corrected params before failing.

  • Bug Fixes

    • Added normalizeFixPlan to backfill AWSServiceName for CreateServiceLinkedRoleCommand by inferring from nearby non‑IAM steps; supports Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, Backup; idempotent and applied in all plan paths.
    • Made configure‑only plans show meaningful diffs: emit configured:false → configured:true with willChange:[...]; preserve existing exists:false → exists:true for create‑from‑scratch.
    • Added REQUIRED_PARAMS checks in validatePlanSteps to fail fast with clear messages (step index + command) for missing top‑level params on SLR, Config recorder, S3 bucket policy, and CloudTrail commands; updated the remediation prompt with mandatory SLR mappings.
  • New Features

    • Introduced one‑shot step repair on validation errors: the executor detects errors with looksLikeValidationError and uses a repairStep callback to retry once with corrected params; guardrails enforce same service/command; wired only for fix‑step execution.
    • Implemented refineStepFromError to return a corrected step (same service + command) using the failing error and plan context; added focused tests for detection and repair flow alongside normalizer/validator tests.

Written for commit 5cffaff. Summary will update on new commits. Review in cubic

tofikwest and others added 4 commits May 21, 2026 00:22
Adds a pure post-processing step that runs after the AI generates a fix
plan to backfill cross-step values the model does not reliably emit.

Today the normalizer handles a single concrete bug: when a plan needs a
service-linked role (Config, GuardDuty, Inspector, Macie, Access
Analyzer, Security Hub, Detective, Backup), the AI sometimes generates
CreateServiceLinkedRoleCommand without populating AWSServiceName. AWS
rejects the call with a cryptic "Member must not be null" error, leaving
the customer stuck on the Auto-Remediate dialog with no actionable
diagnosis.

The normalizer infers the right principal from a nearest-neighbor scan
across the plan's other steps (deterministic, idempotent, no mutation of
input). Wiring it into the AI remediation service happens in a separate
commit so each phase is reviewable in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends validatePlanSteps with a REQUIRED_PARAMS check that runs before
the executor makes the AWS SDK call. Today AWS surfaces a cryptic
"Member must not be null" when a required top-level param is missing
(e.g., AWSServiceName on CreateServiceLinkedRoleCommand,
ConfigurationRecorder on PutConfigurationRecorderCommand), which leaves
the Auto-Remediate dialog with an unhelpful error and the operation
already billed.

The check is intentionally narrow — only commands where we've seen the
AI omit a required param in practice and where the SDK's error is
unhelpful. The error message includes the step index and command name so
customers know exactly which step is broken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nly plans

Two coordinated fixes to the AI remediation pipeline:

1. Broadens enrichEmptyState to handle plans whose actionable steps are
   not Create*. Previously a plan composed of Put*/Start*/Update*/
   Enable*/Attach*/Set*/Modify* steps with both currentState and
   proposedState empty rendered as "{} → {}" in the Auto-Remediate
   dialog. The backstop now emits configured:false → configured:true
   with a willChange list for these plans. Pure Create* plans keep the
   existing exists:false → exists:true / willCreate language.

2. Wires the plan normalizer into all three sites that produce a
   FixPlan: generateFixPlan (happy path), refineFixPlan (happy path),
   and refineFixPlan (catch fallback). The normalizer's
   CreateServiceLinkedRoleCommand AWSServiceName backfill now runs on
   every plan that reaches the UI or the executor, so the bug-reported
   "Member must not be null" failure on the AWS Config recorder finding
   is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompt

Adds a mandatory mapping table after the existing service-linked role
guidance so the AI emits AWSServiceName values in the first pass for
the eight AWS services we currently auto-fix. This is a belt-and-
suspenders fix on top of the deterministic plan normalizer — the
normalizer guarantees correctness; this reduces the rate at which the
bug occurs in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment May 21, 2026 5:00am
comp-framework-editor Ready Ready Preview, Comment May 21, 2026 5:00am
portal Ready Ready Preview, Comment May 21, 2026 5:00am

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 7 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

The per-command REQUIRED_PARAMS map and the SLR plan normalizer cover
the bugs we know about — but the AI can omit a required param on ANY
AWS command in ANY future plan, and we shouldn't ship a fix that needs
a code change every time a new finding hits this class of bug.

This adds a universal escape hatch:

1. New `looksLikeValidationError(message)` detector that pattern-matches
   AWS's standard validation wording ("Member must not be null",
   "failed to satisfy constraint", "ValidationException", etc.) — no
   per-command enumeration.

2. New `StepRepairFn` callback contract on `executePlanSteps`. When a
   step fails AND the rules-based fixer in `tryAutoFixValidationError`
   can't recover AND the error looks like a validation error, the
   executor calls the repair callback exactly once with the failing
   step, the AWS error, and the step index. If the callback returns a
   refined step with different params (same service + command), the
   executor retries with it.

3. New `AiRemediationService.refineStepFromError` that fulfills the
   contract — sends the failing step, AWS error verbatim, neighbor
   steps, and finding context to the model and asks for a corrected
   single step. Defensive guard: discards the refinement if the model
   swaps to a different service or command.

4. `RemediationService.executeRemediation` wires the repair callback
   at the fix-step execution site only (not reads or rollback) so the
   AI cost is paid only on user-initiated fix attempts that actually
   need it.

This is the universal fix the customer asked for: it uses AWS's own
validation as ground truth, scales to any future AWS command without
code changes, and pays the Opus cost only on the failure path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tofikwest tofikwest merged commit eb52c8d into main May 21, 2026
8 of 11 checks passed
@tofikwest tofikwest deleted the tofik/fix-auto-remediate-null-params branch May 21, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant