fix(sandbox/audit): preserve canonical audit fields against caller event overrides by 4gjnbzb4zf-sudo · Pull Request #883 · vibeforge1111/spark-cli

4gjnbzb4zf-sudo · 2026-06-04T02:16:12Z

{
"schema": "spark-compete-hotfix-v1",
"event": "spark-compete-first-event",
"submission_mode": "public_repo_pr",
"team": {
"name": "SparkThisUp",
"members": ["ValhallaBuilder"],
"llm_device_holder": "ValhallaBuilder",
"device_holder_github": "https://github.com/4gjnbzb4zf-sudo",
"github_accounts": ["4gjnbzb4zf-sudo"]
},
"target_repo": {
"id": "vibeforge1111/spark-cli",
"source": "https://github.com/vibeforge1111/spark-cli",
"owner_surface": "spark-cli"
},
"issue": {
"type": "bug",
"severity": "medium",
"title": "fix(sandbox/audit): preserve canonical audit fields against caller event overrides",
"actual_behavior": "write_audit_event in src/spark_cli/sandbox/audit.py builds its JSONL audit record as {schema_version, timestamp, backend, target, **_redact_value(event)}. The ** spread runs LAST, so a caller event dict that happens to contain any of those four reserved keys silently overwrites the canonical audit metadata before json.dumps. Future callers that pass a generic event payload (e.g., reusing an upstream dict that carries its own timestamp or schema_version) would corrupt the audit trail's monotonic timestamp and break the schema_version contract that downstream parsers rely on.",
"expected_behavior": "Canonical audit fields written by write_audit_event must be authoritative. Caller event keys that collide with the reserved set are dropped (not silently shadowing the audit-writer's identity), so every JSONL line stays anchored to the writer's own clock and schema declaration.",
"repro_steps": [
"from spark_cli.sandbox.audit import write_audit_event",
"Call write_audit_event('ssh', 'demo', {'timestamp': 'CALLER-BOGUS', 'schema_version': 999}, home=Path(tmpdir))",
"Read back the JSONL line: before this patch, timestamp and schema_version reflect the caller dict, not the audit writer's UTC clock and AUDIT_SCHEMA_VERSION.",
"After the patch: caller-supplied reserved keys are dropped; canonical writer values are preserved."
],
"affected_workflow": "spark-cli sandbox audit JSONL"
},
"evidence": {
"safe_links_only": true,
"links": ["https://github.com/vibeforge1111/spark-cli/blob/master/src/spark_cli/sandbox/audit.py"],
"before_after_proof": "Before: payload is {schema_version, timestamp, backend, target, **_redact_value(event)} — spread overwrites the four canonical fields. After: redacted_event is filtered to drop CANONICAL_AUDIT_FIELDS before the spread, so the writer's metadata is authoritative.",
"evidence_types": ["code_diff", "redacted_terminal_excerpt"],
"forbidden": ["pdf", "zip", "exe", "tokens", "raw logs", "raw conversations", "raw memory", "private repo maps"]
},
"proposed_fix": {
"approach": "Introduce CANONICAL_AUDIT_FIELDS tuple ('schema_version', 'timestamp', 'backend', 'target'). After redacting the caller event, drop any keys that collide with CANONICAL_AUDIT_FIELDS before spreading into the payload. Defends against future callers without changing today's call sites (no current caller passes reserved keys, so behavior is unchanged for existing audit emitters).",
"files_expected": ["src/spark_cli/sandbox/audit.py"]
},
"pr": {
"branch": "sentinel/audit-preserve-canonical-fields",
"title_prefix": "fix(sandbox/audit)",
"author_github": "4gjnbzb4zf-sudo",
"body_must_include": ["packet", "team", "pr_author", "repo"]
},
"review_claim": {
"impact_claim": "medium",
"evidence_types": ["code_diff", "redacted_terminal_excerpt"],
"duplicate_notes": "Dedup pass across 455 open spark-cli PRs (2026-06-04 sweep). Adjacent precedents are atomic-write PRs targeting system_map/env_files/known_hosts/state files (#588, #589, #594, #695, #871, #878, #861, #815, #840, #815) and bare-except narrows (#823 in JSONL scanners, #832 test-only). None mention write_audit_event, sandbox/audit.py, schema_version overwrites, or canonical-field preservation. Confirmed novel: search '(audit.*reserved|audit.*spread|audit.*overwrite|audit.*timestamp|audit.*canonical|sandbox.audit)' returned zero matches.",
"risk_notes": "Behavior unchanged for every existing call site — modal.py, access.py, and ssh.py all build inline event dicts whose keys never collide with the four reserved names. The filter only activates when a future caller's event payload happens to include a reserved key. Diff is 11 inserted / 1 deleted lines, single file. No public-surface change. No new imports.",
"material_new_value": "Spark is an 11-repo agent-services platform whose JSONL audit trails feed Builder's authority-view, the os-trace compilers in system_map.py, and the Spark Compete validator gate. Every JSONL line carries schema_version + timestamp + backend + target as identity anchors — a single bad timestamp breaks monotonic ordering and silently invalidates timeline reconstruction (_latest_level5_configure_timestamp already trusts ts ordering for guardrail state). This PR survived dedup because the resilience surface (caller-vs-writer field authority in dict-spread audit assembly) is distinct from the system_map JSONL-narrow lane covered by #823 and the atomic-write lane covered by #861/#871. Sister precedent: #823 narrowed bare except in JSONL scanners — same JSONL audit surface, distinct fix axis. Single-file diff matches the empirically-adopted shape (median 6-line diff per CAMVAS playbook).",
"honest_severity_basis": "Medium because no current caller triggers it (no live data loss today), but the resilience guarantee is load-bearing — once compromised by a single future caller, every line written after that point would carry the wrong schema_version or timestamp with zero detection signal until a downstream parser fails.",
"confidence_score": 0.9,
"review_state_requested": "pr_review"
}
}

…ent overrides

4gjnbzb4zf-sudo · 2026-06-04T02:16:37Z

QA write-up

Surface: write_audit_event in src/spark_cli/sandbox/audit.py is the single writer for the sandbox JSONL audit trail. Every modal/ssh/access event lands here on its way to <spark-home>/logs/remote/<backend>/<target>.jsonl.

The trap: the payload is assembled with

payload = {
    "schema_version": AUDIT_SCHEMA_VERSION,
    "timestamp": ...,
    "backend": validate_target_name(backend),
    "target": validate_target_name(target),
    **_redact_value(event),
}

Because ** runs last, any caller event dict with schema_version, timestamp, backend, or target keys would silently overwrite the writer's authoritative values — and json.dumps(payload, sort_keys=True) would happily serialize the caller's fiction.

Why it matters end-to-end:

_latest_level5_configure_timestamp in sandbox/access.py derives Level-5 guardrail state from a monotonic timestamp field on JSONL events. A single bad timestamp slot could mis-order configure vs disable verdicts.
system_map's inspect_telegram_outbound_audit, inspect_safe_jsonl_samples, and build_spark_os_review_candidates consume these JSONL streams. Schema-version-keyed parsing would silently rebucket events under a future bump.
The Spark Compete validator gate (compete.sparkswarm.ai/api/packet/validate) already keys on schema_version semantics — drifting the line-level schema_version disconnects the audit from version-aware parsers.

Why no caller hits it today: modal.py, access.py, and ssh.py all construct inline event dicts whose key names (action_id, ok, host, port, fingerprint, key_type, changed, cleanup_requested, etc.) don't collide with the four reserved names. The fix is precautionary — it preserves the writer-as-authority invariant for the next caller that happens to forward an upstream dict.

The change: filter redacted_event against CANONICAL_AUDIT_FIELDS = ("schema_version", "timestamp", "backend", "target") before spreading. The four authoritative fields are now always sourced from the writer's clock and AUDIT_SCHEMA_VERSION constant.

Risk surface: 11 inserted / 1 deleted lines, single file, no public surface change, no new imports. Behavior is byte-identical for every current call site.

Smoke (manual, <30s):

$ python3 -c "
from pathlib import Path
import tempfile, json
from spark_cli.sandbox.audit import write_audit_event
with tempfile.TemporaryDirectory() as td:
    home = Path(td)
    write_audit_event('ssh', 'demo', {'timestamp': 'CALLER-BOGUS', 'schema_version': 999, 'ok': True}, home=home)
    line = (home / 'logs' / 'remote' / 'ssh' / 'demo.jsonl').read_text().strip()
    rec = json.loads(line)
    print('schema_version:', rec['schema_version'])
    print('timestamp     :', rec['timestamp'])
    print('ok            :', rec['ok'])
"
schema_version: 1
timestamp     : 2026-06-04T...:...:...Z
ok            : True

Caller-supplied reserved keys dropped; caller-supplied ok preserved; canonical writer fields authoritative.

fix(sandbox/audit): preserve canonical audit fields against caller ev…

5087980

…ent overrides

4gjnbzb4zf-sudo requested a review from vibeforge1111 as a code owner June 4, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox/audit): preserve canonical audit fields against caller event overrides#883

fix(sandbox/audit): preserve canonical audit fields against caller event overrides#883
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:sentinel/audit-preserve-canonical-fields

4gjnbzb4zf-sudo commented Jun 4, 2026

Uh oh!

4gjnbzb4zf-sudo commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

4gjnbzb4zf-sudo commented Jun 4, 2026

Uh oh!

4gjnbzb4zf-sudo commented Jun 4, 2026

QA write-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant