Skip to content

fix(cli): harden CSV formula-injection escape (sec)#119

Open
sroussey wants to merge 1 commit into
mainfrom
claude/wonderful-hypatia-Y5xRl
Open

fix(cli): harden CSV formula-injection escape (sec)#119
sroussey wants to merge 1 commit into
mainfrom
claude/wonderful-hypatia-Y5xRl

Conversation

@sroussey
Copy link
Copy Markdown
Contributor

Summary

Hardens escapeCsvValue in src/cli/output/TableRenderer.ts against three classes of CSV formula-injection bypass that the previous ^[=+\-@\t\r]/ check missed. Follows OWASP CSV Injection guidance for neutralizing untrusted cell content emitted by sec query --format csv before a spreadsheet opens it.

Bypasses now closed

  1. Leading ASCII whitespace — Excel/Sheets/Numbers strip leading whitespace before parsing a cell as a formula, so =cmd|'/c calc'!A0 (note the leading space) was reaching the spreadsheet unprefixed and executing.
  2. Leading U+00A0 NBSP — Same as above; NBSP is stripped by the parsers and was not in the previous whitespace check.
  3. Dangerous chars after embedded newlines in a multi-line cell — Quoted multi-line cells are re-parsed per physical line, so "safe\n=cmd" exposed the second line as a formula even when the first line was benign.

Fix

  • New DANGEROUS_LEAD (/^[=+\-@\t\r]/) + LEADING_WS (/^[\s ]+/) constants.
  • needsFormulaPrefix(line) strips leading WS/NBSP before the dangerous-lead test.
  • defuseLine(line) returns "'" + line when dangerous.
  • escapeCsvValue now splits the value on /(\r?\n)/, defuses each data line independently, preserves the original separators, and only then applies RFC 4180 quoting (now also triggered by \r, not just \n).
  • Public API is unchanged; escapeCsvValue is re-exported via __testing purely for unit tests.

Test plan

New describe("escapeCsvValue") block in src/cli/output/TableRenderer.test.ts:

  • All six dangerous leads prefixed: =cmd, +cmd, -cmd, @cmd, \tcmd, \rcmd.
  • Leading ASCII space + =cmd' =cmd.
  • Leading NBSP + =cmd' =cmd.
  • Plain abc and 123 left unchanged.
  • Multi-line LF safe\n=cmd"safe\n'=cmd".
  • Multi-line CRLF safe\r\n=cmd"safe\r\n'=cmd".
  • Multi-line with multiple dangerous lines interleaved with safe ones — every dangerous line defused, safe ones untouched.
  • Cell with , wrapped in quotes.
  • Embedded " doubled inside wrapped cell.
  • escapeCsvValue("") returns "".
  • All existing renderTable tests (json/csv/table format) still pass — header/data rows, comma/quote/null handling, original 5-row formula-prefix test, benign leads, table padding/truncation/pagination.

Out of scope

Plans I and J (PR #118's branch) are intentionally not touched here; the PR author for that branch will apply them.


Generated by Claude Code

… formula injection (sec)

escapeCsvValue's `^[=+\-@\t\r]/` check missed three classes of bypass:
leading ASCII whitespace, U+00A0 NBSP, and dangerous chars after embedded
newlines inside a quoted multi-line cell (Excel/Sheets re-parse each
physical line). A SEC-supplied issuer name like ` =cmd|'/c calc'!A0` passed
through unprefixed.

Now strips leading whitespace (incl. NBSP) before the dangerous-lead test
and defuses every line after `\n`/`\r\n` independently. Regression test
matrix covers all six leading chars (incl. TAB/CR), leading space, NBSP,
and LF/CRLF multi-line cases.
sroussey added a commit that referenced this pull request Jun 2, 2026
Layers two refinements on top of the line-by-line CSV defusing in #119:

1. Tighten LEADING_WS so it only strips space-like characters that
   spreadsheets silently ignore (ASCII space, NBSP, SHY, ZWSP, ZWNJ,
   ZWJ, LRM, RLM, BOM). \t and \r are themselves dangerous formula
   leads, so we no longer strip them away before the DANGEROUS_LEAD
   check — otherwise "\t=cmd" or "\r=cmd" would slip through as
   non-dangerous after the strip.

2. Split on /(\r\n|\r|\n)/ instead of /(\r?\n)/ so that a bare CR
   inside a multi-line cell is also a line boundary; the line after
   the CR is independently defused. Excel re-parses every physical
   line of a quoted cell, including lines separated by lone CR.

Tests cover the zero-width-prefix bypasses (ZWSP/ZWNJ/ZWJ/LRM/RLM/SHY/BOM
+ "=cmd"), mixed-WS bypasses ("ZWSP space =cmd"), bare-CR-followed-by-formula
("safe\r=cmd"), and a negative control to prove ZWSP-then-benign is left
alone. All 44 cases in TableRenderer.test.ts pass.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant