fix: stricter JSON decoding for surrogate pairs and keywords#19807
Open
mattn wants to merge 5 commits intovim:masterfrom
Open
fix: stricter JSON decoding for surrogate pairs and keywords#19807mattn wants to merge 5 commits intovim:masterfrom
mattn wants to merge 5 commits intovim:masterfrom
Conversation
chrisbra
reviewed
Mar 24, 2026
1. Fix surrogate pair range check (0xDFFF -> 0xDBFF) so only high surrogates trigger pair decoding. Reject lone surrogates that do not form a valid pair instead of producing invalid UTF-8. 2. Use case-sensitive matching for JSON keywords (true, false, null, NaN, Infinity) in json_decode() per RFC 7159. js_decode() retains case-insensitive behavior.
- Replace double ga_append() calls for escape sequences with single GA_CONCAT_LITERAL() calls, halving function call and buffer growth check overhead. - Replace vim_snprintf_safelen() for blob byte encoding (0-255) with direct digit conversion.
Pre-allocate buffer space based on input string length before the encoding loop, reducing repeated ga_grow() checks during character processing.
…code Verify that json_decode() rejects mixed-case keywords like NULL, nan, infinity (RFC 7159 requires lowercase), and that js_decode() accepts them case-insensitively.
…gate handling - Remove note about case-insensitive keywords in json_decode() - Add note that js_decode() accepts keywords case-insensitively - Update surrogate pair docs: lone/invalid surrogates now error
f567f70 to
32e6010
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tighten
json_decode()to reject invalid JSON that was previously accepted.Lone surrogate rejection: Fix the surrogate pair range check from
0xDFFFto0xDBFFso that only high surrogates (U+D800-U+DBFF) trigger pair decoding. Additionally, reject lone surrogates (any codepoint in U+D800-U+DFFF that did not form a valid pair) instead of passing them through toutf_char2bytes(), which would produce invalid UTF-8.Case-sensitive keyword matching:
json_decode()usedSTRNICMP(case-insensitive) for matchingtrue,false,null,NaN,Infinity, and-Infinity. This means"True","FALSE","Null", etc. were all silently accepted. RFC 7159 requires these keywords to be lowercase.js_decode()retains the case-insensitive behavior.Both of these lenient behaviors were intentionally introduced by me in patch 7.4.1434, and I recall that Bram's position at the time was that Vim should not perform strict decoding. However, with the current landscape — LSP support,
ch_listen(), and broader use of JSON for inter-process communication — silently accepting invalid JSON is a risk rather than a feature.