Skip to content

spec: multiline + command-keyed redaction (#10027)#10225

Open
lonexreb wants to merge 4 commits intowarpdotdev:masterfrom
lonexreb:spec/10027-multiline-redaction
Open

spec: multiline + command-keyed redaction (#10027)#10225
lonexreb wants to merge 4 commits intowarpdotdev:masterfrom
lonexreb:spec/10027-multiline-redaction

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 6, 2026

Spec for #10027. Two new redaction mechanisms: command-keyed output rules (e.g. kubectl get secret redacts entire block output) and multiline-bracketed rules (e.g. PEM BEGIN/END blocks). Defaults for common DevOps cases shipped in V1.

Closes (spec-only) #10027.

@cla-bot cla-bot Bot added the cla-signed label May 6, 2026
@github-actions github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label May 6, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 6, 2026

@lonexreb

I'm starting a first review of this spec-only pull request.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec defines command-keyed output redaction and multiline delimiter redaction for agent context. The direction is aligned with the stated secret-redaction gaps, but several requirements are ambiguous enough to produce incomplete or unsafe implementations.

Concerns

  • Command matching is specified too narrowly for common kubectl get secret forms and does not define normalization or argument-order coverage for default rules.
  • Multiline redaction scope conflicts with the acceptance criterion for pasted private-key blocks versus the explicit output-side-only V1 scope.
  • Default override semantics are under-specified, including precedence, identity, and disable behavior.

Security

  • The buffer-cap behavior must explicitly keep redaction active after the cap is reached until the end delimiter or block flush; otherwise oversized secret blocks can leak lines after the capped buffer is replaced.

Verdict

Found: 0 critical, 4 important, 1 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH10027/SPEC.md Outdated

```toml
[[redaction.command_output_rules]]
command_pattern = "^kubectl\\s+get\\s+secret\\b"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] The default command regex only covers literal kubectl get secret; specify normalization or default patterns for common secret reads like kubectl get secrets, kubectl -n prod get secret, and context/namespace flags before get.

Comment thread specs/GH10027/SPEC.md Outdated
start_pattern = "-----BEGIN [A-Z ]+PRIVATE KEY-----"
end_pattern = "-----END [A-Z ]+PRIVATE KEY-----"
replacement = "[redacted: PEM key block]"
inclusive = true # include the BEGIN/END lines in the redaction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] The config exposes inclusive, but the behavior only defines inclusive = true; either define exclusive-boundary behavior or omit the option from V1.

Comment thread specs/GH10027/SPEC.md Outdated
### B5 — Performance bound

Multiline buffering is capped at `WARP_REDACTION_MAX_BUFFER_BYTES`
(default 1 MiB). If the buffer fills before `end_pattern` matches,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] After the buffer cap is reached, require the redactor to stay in a redacted/discarding state until end_pattern or block flush; replacing only the buffered bytes can leak subsequent lines from an oversized secret block.

Comment thread specs/GH10027/SPEC.md Outdated
- A1. With default rules: running `kubectl get secret foo -o yaml`
shows real output in the terminal but the agent context shows
the replacement string.
- A2. With default rules: pasting a private-key block redacts the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This acceptance criterion conflicts with V1 being output-side only: pasting a key is input-path behavior unless the spec defines a rendered-output path; align the scope and acceptance test.

Comment thread specs/GH10027/SPEC.md Outdated
the replacement string.
- A2. With default rules: pasting a private-key block redacts the
entire BEGIN..END range, not just the first line.
- A3. User-defined rule in TOML overrides defaults at the same
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] Override semantics are not precise enough to implement: define precedence, whether equality is by rule name or exact pattern string, and how users disable a default rather than replace it.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

Round-1 fixes for oz-for-oss review: (1) added Command Matching Normalization with argv-set matching and 12 default tool rules; (2) reconciled V1 scope to include PEM-style private-key delimiter redaction in both output AND pasted input; (3) added Default Override Semantics defining id-based merge, disable flag, and precedence; (4) fixed Buffer Cap Behavior to keep redaction state active post-cap until end delimiter or flush.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 8, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec defines command-keyed output redaction and multiline delimiter redaction for agent-bound context. The overall shape covers the main behavior, defaults, override semantics, buffer caps, and tests.

Concerns

  • The sample/default rule ids are inconsistent, which would break the documented override and disable workflow for copied configs.
  • The default command table uses glob-style wildcards even though command matching is defined as regex-based, leaving several default matches ambiguous.
  • Exposed config semantics are incomplete for inclusive, invalid user rules, and duplicate user ids.

Security

  • The spec should explicitly require the new redactors at the centralized agent-context boundary for every path that can send command output or user text to the agent, not only the active block/prompt examples.
  • Malformed user-supplied rules must fail closed per rule without disabling built-in protections.

Verdict

Found: 0 critical, 4 important, 1 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH10027/SPEC.md Outdated

```toml
[[redaction.command_output_rules]]
id = "default-kubectl-secret"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] The example id default-kubectl-secret conflicts with the stable default id default-kubectl-get-secret used later for override/disable semantics; align the example so users copy the id that actually replaces or disables the default.

Comment thread specs/GH10027/SPEC.md
```

When the active block's command (as detected by Warp's existing
command parser) matches `command_pattern`, the entire block output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] This only states redaction for the active block output; require the command-keyed redactor to run at the centralized agent-context boundary for every path that can send block output to the agent, otherwise matched secret commands can bypass redaction through attachments, ambient context, action results, or passive triggers.

Comment thread specs/GH10027/SPEC.md Outdated

| id | subcommand path matched |
| ----------------------------------- | -------------------------------------------------------- |
| `default-kubectl-get-secret` | `kubectl get secret*` (incl. `secrets`, `secret/<name>`) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] secret* is glob syntax, but B1.1 defines command_pattern as regex-matched; in regex this does not cover secrets or secret/<name>. Specify exact regexes or state that the table uses glob semantics for all starred defaults so implementation and tests agree.

Comment thread specs/GH10027/SPEC.md Outdated
start_pattern = "-----BEGIN [A-Z ]+PRIVATE KEY-----"
end_pattern = "-----END [A-Z ]+PRIVATE KEY-----"
replacement = "[redacted: PEM key block]"
inclusive = true # include the BEGIN/END lines in the redaction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] The config exposes inclusive, but the behavior contract only defines the inclusive case; add explicit semantics for inclusive = false, including whether delimiter lines are emitted, redacted separately, or unsupported for defaults.

Comment thread specs/GH10027/SPEC.md

#### B3.1 — Override semantics

User-supplied redaction rules merge with defaults by rule `id`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] Merge semantics do not define invalid or duplicate user-supplied rules. Specify that invalid regexes or scopes are rejected or ignored per rule without disabling defaults, and define duplicate user-id resolution so malformed config cannot accidentally remove protections.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

Round-2 fixes: (1) all rule ids harmonized to default-<tool>-<purpose> naming convention with full table listed; user rules cannot use default- prefix; (2) every default command pattern rewritten as Rust regex with anchors (replaces glob *); pattern syntax subsection added; per-rule regex tests; (3) inclusive field defined (true = include delimiters in redaction, false = between only; default false); invalid user rules drop individually with warning; duplicate ids drop subsequent; reserved prefix rejected; built-in defaults stay active even on full config load failure (fail-closed invariant); (4) added Centralized Agent-Context Boundary enumerating every agent-bound path (block output, prompt, paste, tab-config commands, history replay, tool-call inputs, file reads) with single redact_for_agent choke-point and clippy-lint enforcement.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 8, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec adds command-keyed output redaction and delimiter-based multiline redaction for agent-bound context. The overall structure covers goals, defaults, override semantics, fail-closed config handling, telemetry constraints, acceptance criteria, and tests.

Concerns

  • The built-in PEM regex does not match the bare -----BEGIN PRIVATE KEY----- form called out by the problem statement, so a core default would still leak a common private-key block.
  • The command matching contract is internally inconsistent about whether command_pattern is matched against only the normalized subcommand path or against a string that includes trailing resource-name positionals.

Security

  • The PEM default gap is security-relevant because implementing the spec as written would leave a common private-key format unredacted in both output and pasted input paths.

Verdict

Found: 1 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH10027/SPEC.md Outdated
Comment on lines +138 to +139
start_pattern = "-----BEGIN [A-Z ]+PRIVATE KEY-----"
end_pattern = "-----END [A-Z ]+PRIVATE KEY-----"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [CRITICAL] [SECURITY] This default misses bare PKCS#8 blocks like -----BEGIN PRIVATE KEY-----, which the problem statement explicitly names, so those keys would still reach the agent.

Suggested change
start_pattern = "-----BEGIN [A-Z ]+PRIVATE KEY-----"
end_pattern = "-----END [A-Z ]+PRIVATE KEY-----"
start_pattern = "-----BEGIN (?:[A-Z ]+ )?PRIVATE KEY-----"
end_pattern = "-----END (?:[A-Z ]+ )?PRIVATE KEY-----"

Comment thread specs/GH10027/SPEC.md
V1 ships defaults covering the most common secret-bearing CLIs.
Each is argument-order independent per B1.1. Patterns are matched
against the normalized subcommand path. The `(\s|$)` trailing
boundary admits trailing positionals like resource names without
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] B1.1 says command_pattern is matched only against the normalized subcommand path and resource names live in normalized_argv_set, so this cannot also rely on command_pattern trailing boundaries to admit resource-name positionals. Specify one matching input contract or route resource-name matching through argv_contains/argv_excludes.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

Addressed both concerns from latest oz-for-oss review. The CRITICAL is fully resolved.

CRITICAL — PEM bare-PKCS#8 unredacted (closed)
The round-2 start_pattern was -----BEGIN [A-Z ]+PRIVATE KEY-----[A-Z ]+ requires at least one char before PRIVATE, so the bare -----BEGIN PRIVATE KEY----- (PKCS#8, no algorithm tag) form did not match and a common private-key block was silently leaked to the agent on both output and pasted-input paths.

Fix: changed quantifier to [A-Z ]* (zero-or-more). Spec now contains an explicit algorithm-form coverage table proving every PEM private-key variant is matched: bare PKCS#8, RSA, EC, OPENSSH, ENCRYPTED, DSA. Added inline comment in the TOML example explaining why the quantifier matters.

New test T16 is the regression guard with six BEGIN..END fixtures (one per form), each tested on BOTH the output stream and the pasted-input path. The test MUST fail under [A-Z ]+ and MUST pass under [A-Z ]*. CI fails if any fixture leaks a single key byte. The bare PKCS#8 case is the explicit closure of this critical gap.

Important — Command-matching contract internal inconsistency (closed)
Rewrote B1.1 with a single canonical answer: command_pattern is matched ONLY against the normalized subcommand path string, NEVER against a string that includes trailing resource-name positionals. Added concrete (subcommand_path, argv_set) splits showing exactly what each input becomes:

  • kubectl get secret foo → path kubectl get secret + argv_set {foo}
  • kubectl -n ns get secret foo → path kubectl get secret + argv_set {-n, ns, foo}
  • aws secretsmanager get-secret-value --secret-id x → path aws secretsmanager get-secret-value + argv_set {--secret-id, x}

Added explicit clarification that the trailing (\\s|$) boundary in default patterns is for ZERO-OR-MORE FUTURE SUBCOMMAND TOKENS, not resource names — those are in the argv set, not in the matched string. Resource-name constraints use argv_contains / argv_excludes.

New test T_command_match_contract asserts the contract end-to-end with a recording test double on the matcher: it fails if the matcher EVER sees a string containing a trailing resource positional. Covers all 12 default rules in B1.2.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 8, 2026

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 8, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec defines command-keyed output redaction and multiline delimiter redaction for agent-bound context. The overall scope is clear, but several contract details still need to be resolved before implementation.

Concerns

  • The command-matching section still contradicts itself about whether resource-name positionals are part of the string matched by command_pattern.
  • Duplicate user-rule handling conflicts with the stated default override/disable semantics.

Security

  • Command-keyed redaction does not define how common shell prefixes and compound invocations are handled, which could leave secret-bearing CLI output outside the default protections unless the behavior is specified and tested.

Verdict

Found: 0 critical, 3 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH10027/SPEC.md
string `command_pattern` matches against — they are separated into
the argv set.

1. **Argv parse:** Split using shell-aware tokenization (the same
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] Command-keyed redaction does not specify whether shell prefixes or compound invocations such as AWS_PROFILE=prod aws secretsmanager ..., env ... aws ..., or sudo kubectl ... are normalized or out of scope; without this, common secret-bearing commands may bypass the default redaction rules. Define the behavior and add acceptance/tests for it.

Comment thread specs/GH10027/SPEC.md
V1 ships defaults covering the most common secret-bearing CLIs.
Each is argument-order independent per B1.1. Patterns are matched
against the normalized subcommand path. The `(\s|$)` trailing
boundary admits trailing positionals like resource names without
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This sentence contradicts B1.1/B1.1.4, which state resource-name positionals are never included in the string matched by command_pattern. Rewrite it so the trailing boundary only refers to subcommand-path tokens, not resource names.

Comment thread specs/GH10027/SPEC.md
| -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| Invalid regex in any pattern field | Rule is DROPPED. Other user rules and ALL defaults remain active. Warning logged with rule id and parser error. Surfaced in Settings UI as `Rule '<id>' has invalid pattern: <error>`. |
| Missing required field (`id`, `pattern`, …) | Rule is DROPPED. Other rules unaffected. Warning surfaced in Settings UI. |
| Duplicate user-supplied id (same id twice in user config) | First occurrence is kept; subsequent duplicates are DROPPED with a warning. Defaults are never overridden by a duplicate-id user rule. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This conflicts with B3.1, where the first user rule with a default id replaces or disables that default. Clarify that subsequent duplicate user entries are dropped, while the first valid occurrence still follows the override semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed external-contributor Indicates that a PR has been opened by someone outside the Warp team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant