fix(cloud-tests): auto-remediate null AWSServiceName + empty state diff#2885
Merged
Conversation
Adds a pure post-processing step that runs after the AI generates a fix plan to backfill cross-step values the model does not reliably emit. Today the normalizer handles a single concrete bug: when a plan needs a service-linked role (Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, Backup), the AI sometimes generates CreateServiceLinkedRoleCommand without populating AWSServiceName. AWS rejects the call with a cryptic "Member must not be null" error, leaving the customer stuck on the Auto-Remediate dialog with no actionable diagnosis. The normalizer infers the right principal from a nearest-neighbor scan across the plan's other steps (deterministic, idempotent, no mutation of input). Wiring it into the AI remediation service happens in a separate commit so each phase is reviewable in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends validatePlanSteps with a REQUIRED_PARAMS check that runs before the executor makes the AWS SDK call. Today AWS surfaces a cryptic "Member must not be null" when a required top-level param is missing (e.g., AWSServiceName on CreateServiceLinkedRoleCommand, ConfigurationRecorder on PutConfigurationRecorderCommand), which leaves the Auto-Remediate dialog with an unhelpful error and the operation already billed. The check is intentionally narrow — only commands where we've seen the AI omit a required param in practice and where the SDK's error is unhelpful. The error message includes the step index and command name so customers know exactly which step is broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nly plans
Two coordinated fixes to the AI remediation pipeline:
1. Broadens enrichEmptyState to handle plans whose actionable steps are
not Create*. Previously a plan composed of Put*/Start*/Update*/
Enable*/Attach*/Set*/Modify* steps with both currentState and
proposedState empty rendered as "{} → {}" in the Auto-Remediate
dialog. The backstop now emits configured:false → configured:true
with a willChange list for these plans. Pure Create* plans keep the
existing exists:false → exists:true / willCreate language.
2. Wires the plan normalizer into all three sites that produce a
FixPlan: generateFixPlan (happy path), refineFixPlan (happy path),
and refineFixPlan (catch fallback). The normalizer's
CreateServiceLinkedRoleCommand AWSServiceName backfill now runs on
every plan that reaches the UI or the executor, so the bug-reported
"Member must not be null" failure on the AWS Config recorder finding
is resolved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompt Adds a mandatory mapping table after the existing service-linked role guidance so the AI emits AWSServiceName values in the first pass for the eight AWS services we currently auto-fix. This is a belt-and- suspenders fix on top of the deterministic plan normalizer — the normalizer guarantees correctness; this reduces the rate at which the bug occurs in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-command REQUIRED_PARAMS map and the SLR plan normalizer cover
the bugs we know about — but the AI can omit a required param on ANY
AWS command in ANY future plan, and we shouldn't ship a fix that needs
a code change every time a new finding hits this class of bug.
This adds a universal escape hatch:
1. New `looksLikeValidationError(message)` detector that pattern-matches
AWS's standard validation wording ("Member must not be null",
"failed to satisfy constraint", "ValidationException", etc.) — no
per-command enumeration.
2. New `StepRepairFn` callback contract on `executePlanSteps`. When a
step fails AND the rules-based fixer in `tryAutoFixValidationError`
can't recover AND the error looks like a validation error, the
executor calls the repair callback exactly once with the failing
step, the AWS error, and the step index. If the callback returns a
refined step with different params (same service + command), the
executor retries with it.
3. New `AiRemediationService.refineStepFromError` that fulfills the
contract — sends the failing step, AWS error verbatim, neighbor
steps, and finding context to the model and asks for a corrected
single step. Defensive guard: discards the refinement if the model
swaps to a different service or command.
4. `RemediationService.executeRemediation` wires the repair callback
at the fix-step execution site only (not reads or rollback) so the
AI cost is paid only on user-initiated fix attempts that actually
need it.
This is the universal fix the customer asked for: it uses AWS's own
validation as ground truth, scales to any future AWS command without
code changes, and pays the Opus cost only on the failure path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two coordinated bugs in the AWS Auto-Remediate dialog that customers hit on the AWS Config recorder not configured finding (and others requiring a service-linked role or a configure-only plan):
CreateServiceLinkedRoleCommandwithout populatingAWSServiceName. AWS rejects the call with the cryptic "Member must not be null". Affects Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, AWS Backup.{} → {}diff in the Auto-Remediate dialog:enrichEmptyStateonly fired whenfixStepscontainedCreate*commands. For plans whose actionable steps are mostlyPut*/Start*/Update*(e.g., Config recorder configure flow), the dialog showed no meaningful state diff.Both fixes are systemic, not Config-recorder-specific. The change is fully behind the existing AI remediation pipeline — no schema changes, no migrations, no shared-infra mutations.
What changed (commit-by-commit)
feat(cloud-tests): add deterministic AWS plan normalizer for SLR params— new pure moduleplan-normalizer.tsthat runs after AI plan generation. BackfillsAWSServiceNamefrom the nearest non-IAM/STS neighbor service step. Idempotent, doesn't mutate input. 18 unit tests.feat(cloud-tests): fail fast on missing required AWS command params— extendsvalidatePlanStepswith a narrowREQUIRED_PARAMScheck (CreateServiceLinkedRole, PutConfigurationRecorder, PutDeliveryChannel, StartConfigurationRecorder, PutBucketPolicy, CreateTrail). Replaces AWS's cryptic null error withStep N (CommandName): Required param "X" is missing or empty. 11 unit tests.fix(cloud-tests): show meaningful Auto-Remediate diff for configure-only plans— broadensenrichEmptyStateto emitconfigured:false → configured:true+willChange: [...]for plans whose only actionable steps are non-Create*. WiresnormalizeFixPlaninto all three plan-producing sites (generateFixPlanhappy,refineFixPlanhappy,refineFixPlancatch). ExistingCreate*behavior preserved exactly.docs(cloud-tests): add SLR AWSServiceName mapping to the remediation prompt— adds an explicit mandatory mapping table to the AI system prompt for the 8 SLR-requiring services. Belt-and-suspenders on top of the deterministic normalizer.Why this is safe
willCreateshape for Create-from-scratch plans is unchanged.AWSServiceName,validatePlanStepsblocks the call with a clear error BEFORE we make a billed AWS request. Old behavior would surface AWS's cryptic message after the call.Test plan
cd apps/api && npx jest src/cloud-security --passWithNoTests→ 215/215 pass (one suite failure is the pre-existingremediation.controller.spec.tsPostgres-TLS env issue, identical onmain, unrelated to this change)integration-platform,offboarding-checklist.service.spec,risks.controller.spec,timelines.service.spec,training-certificate-pdf.service.specare pre-existing onmain)AWS Config recorder not configured→ preview shows non-empty state, Fix succeeds (noaWSServiceNamenull error)S3 bucket versioning disabled) → behavior unchangedRemediationAction.statusends insuccessand the next scan writes aFindingResolutionrow🤖 Generated with Claude Code
Summary by cubic
Fixes null
AWSServiceNameon service-linked roles and{}→{}preview diffs, and adds a one-shot repair that retries validation failures with corrected params before failing.Bug Fixes
normalizeFixPlanto backfillAWSServiceNameforCreateServiceLinkedRoleCommandby inferring from nearby non‑IAM steps; supports Config, GuardDuty, Inspector, Macie, Access Analyzer, Security Hub, Detective, Backup; idempotent and applied in all plan paths.configured:false → configured:truewithwillChange:[...]; preserve existingexists:false → exists:truefor create‑from‑scratch.REQUIRED_PARAMSchecks invalidatePlanStepsto fail fast with clear messages (step index + command) for missing top‑level params on SLR, Config recorder, S3 bucket policy, and CloudTrail commands; updated the remediation prompt with mandatory SLR mappings.New Features
looksLikeValidationErrorand uses arepairStepcallback to retry once with corrected params; guardrails enforce same service/command; wired only for fix‑step execution.refineStepFromErrorto return a corrected step (same service + command) using the failing error and plan context; added focused tests for detection and repair flow alongside normalizer/validator tests.Written for commit 5cffaff. Summary will update on new commits. Review in cubic