Skip to content

fix: stricter JSON decoding for surrogate pairs and keywords#19807

Open
mattn wants to merge 5 commits intovim:masterfrom
mattn:fix-json-decode-strict
Open

fix: stricter JSON decoding for surrogate pairs and keywords#19807
mattn wants to merge 5 commits intovim:masterfrom
mattn:fix-json-decode-strict

Conversation

@mattn
Copy link
Member

@mattn mattn commented Mar 24, 2026

Tighten json_decode() to reject invalid JSON that was previously accepted.

  1. Lone surrogate rejection: Fix the surrogate pair range check from 0xDFFF to 0xDBFF so that only high surrogates (U+D800-U+DBFF) trigger pair decoding. Additionally, reject lone surrogates (any codepoint in U+D800-U+DFFF that did not form a valid pair) instead of passing them through to utf_char2bytes(), which would produce invalid UTF-8.

  2. Case-sensitive keyword matching: json_decode() used STRNICMP (case-insensitive) for matching true, false, null, NaN, Infinity, and -Infinity. This means "True", "FALSE", "Null", etc. were all silently accepted. RFC 7159 requires these keywords to be lowercase. js_decode() retains the case-insensitive behavior.

Both of these lenient behaviors were intentionally introduced by me in patch 7.4.1434, and I recall that Bram's position at the time was that Vim should not perform strict decoding. However, with the current landscape — LSP support, ch_listen(), and broader use of JSON for inter-process communication — silently accepting invalid JSON is a risk rather than a feature.

mattn added 5 commits March 25, 2026 09:34
1. Fix surrogate pair range check (0xDFFF -> 0xDBFF) so only high
   surrogates trigger pair decoding. Reject lone surrogates that do
   not form a valid pair instead of producing invalid UTF-8.

2. Use case-sensitive matching for JSON keywords (true, false, null,
   NaN, Infinity) in json_decode() per RFC 7159. js_decode() retains
   case-insensitive behavior.
- Replace double ga_append() calls for escape sequences with single
  GA_CONCAT_LITERAL() calls, halving function call and buffer growth
  check overhead.
- Replace vim_snprintf_safelen() for blob byte encoding (0-255) with
  direct digit conversion.
Pre-allocate buffer space based on input string length before the
encoding loop, reducing repeated ga_grow() checks during character
processing.
…code

Verify that json_decode() rejects mixed-case keywords like NULL, nan,
infinity (RFC 7159 requires lowercase), and that js_decode() accepts
them case-insensitively.
…gate handling

- Remove note about case-insensitive keywords in json_decode()
- Add note that js_decode() accepts keywords case-insensitively
- Update surrogate pair docs: lone/invalid surrogates now error
@mattn mattn force-pushed the fix-json-decode-strict branch from f567f70 to 32e6010 Compare March 25, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants