-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Improve deserialization of JSON primitives into JsonElement #116419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve deserialization of JSON primitives into JsonElement #116419
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves deserialization of JSON primitives into JsonElement by caching immutable MetadataDb instances for common literal values, strings, and numbers. Key changes include updating Parse methods to use a ref Utf8JsonReader and introducing token-specific caching in MetadataDb, along with a minor adjustment in the metadata buffer sizing logic.
- Updated Parse logic to pass reader by reference and select caching based on token type.
- Introduced new MetadataDb creation methods (for literal, string, and number values) and a locked cache for small primitives.
- Adjusted the condition for enlarging the MetadataDb buffer.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
JsonDocument.Parse.cs | Adjusted parsing logic to use ref Utf8JsonReader and to call new CreateLockedFor* methods based on token type. |
JsonDocument.MetadataDb.cs | Introduced new caching methods for literals, strings, and numbers, and modified the buffer enlargement check. |
Comments suppressed due to low confidence (2)
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.MetadataDb.cs:266
- Changing the condition from '>=' to '>' alters when the buffer is enlarged. Please verify that the new check correctly prevents buffer overflows when appending new rows.
if (Length > _data.Length - DbRow.Size)
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:771
- Subtracting 2 from the payload length assumes that the JSON string always includes both starting and ending quotes. Please confirm that this logic safely handles all edge cases.
MetadataDb database = MetadataDb.CreateLockedForString(utf8Json.Length - 2, reader.ValueIsEscaped);
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis |
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.MetadataDb.cs
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.MetadataDb.cs
Show resolved
Hide resolved
private static readonly MetadataDb LockedNull = | ||
CreateLockedForNonStringPrimitiveImpl(JsonTokenType.Null, JsonConstants.NullValue.Length); | ||
|
||
// Index i is a singleton database for all numbers of length i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "all numbers of length i" mean exactly? Is it that we're caching all possible numbers that are up to 8 characters long? That would be ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the metadata database doesn't store the actual content, only the index into the actual content. The document will still need to store the UTF-8. But for primitive JSON values (like string and number), the metadata is just "what is the start offset of the value" and "how long is the value" (and "is the value escaped" for strings, but I chose not to cache escaped strings). The start offset is always 0 for number and 1 for string (to skip the quote) so we just need to store "how long is the value". So there would only be 8 cached MetadataDb
s which represent 0 <= n <10^8 actual values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. I take it these are only used if the strings or numbers don't have preceding or trailing whitespace or comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is used only in the ParseValue path so the ReadOnlyMemory<byte>
that the JsonDocument
holds will only contain the single value without leading/trailing junk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if we have relevant test coverage just in case?
When creating
JsonElement
there is an extra overhead of creating and storing theMetadataDb
in addition to the required UTF-8 payload. We can reduce this overhead by caching readonly databases for primitives of small length. This PR only affects deserialization ofJsonElement
when it is part of a larger deserialization, like extension data and dictionaries (if the value isobject
,JsonElement
, orJsonNode
). This should cover most places where aJsonElement
of a primitive is created, but there's nothing preventing us from extending it to top levelJsonElement
deserialization as well.Caching is based on the length in bytes of the UTF-8 JSON payload. The threshold was arbitrarily chosen - numbers have threshold of 8 bytes and strings 16 bytes.
The perf results show up to ~20% improvement in some cases.
Benchmarks
Benchmarking code