Skip to content

Improve deserialization of JSON primitives into JsonElement #116419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

PranavSenthilnathan
Copy link
Member

When creating JsonElement there is an extra overhead of creating and storing the MetadataDb in addition to the required UTF-8 payload. We can reduce this overhead by caching readonly databases for primitives of small length. This PR only affects deserialization of JsonElement when it is part of a larger deserialization, like extension data and dictionaries (if the value is object, JsonElement, or JsonNode). This should cover most places where a JsonElement of a primitive is created, but there's nothing preventing us from extending it to top level JsonElement deserialization as well.

Caching is based on the length in bytes of the UTF-8 JSON payload. The threshold was arbitrarily chosen - numbers have threshold of 8 bytes and strings 16 bytes.

The perf results show up to ~20% improvement in some cases.

Benchmarks

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 11 (10.0.26100.4061)
AMD Ryzen 9 9950X 4.30GHz, 1 CPU, 32 logical and 16 physical cores
.NET SDK 10.0.100-preview.3.25201.16
  [Host]     : .NET 10.0.0 (10.0.25.17105), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-TYDCSN : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-ODTNID : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  

Method Toolchain Payload Mean Ratio Allocated Alloc Ratio
Baseline main { "Foo": "foo", "Bar": "barValue" } 163.0 ns 1.00 264 B 1.00
Baseline PR { "Foo": "foo", "Bar": "barValue" } 163.2 ns 1.00 264 B 1.00
DeserializeObjectDictionary main { "Foo": "foo", "Bar": "barValue" } 197.3 ns 1.00 656 B 1.00
DeserializeObjectDictionary PR { "Foo": "foo", "Bar": "barValue" } 175.7 ns 0.89 560 B 0.85
DeserializeJsonNodeDictionary main { "Foo": "foo", "Bar": "barValue" } 152.1 ns 1.00 464 B 1.00
DeserializeJsonNodeDictionary PR { "Foo": "foo", "Bar": "barValue" } 149.7 ns 0.98 464 B 1.00
DeserializeJsonElementDictionary main { "Foo": "foo", "Bar": "barValue" } 189.2 ns 1.00 616 B 1.00
DeserializeJsonElementDictionary PR { "Foo": "foo", "Bar": "barValue" } 156.0 ns 0.82 520 B 0.84
DeserializeExtensionObjectDictionary main { "Foo": "foo", "Bar": "barValue" } 211.8 ns 1.00 680 B 1.00
DeserializeExtensionObjectDictionary PR { "Foo": "foo", "Bar": "barValue" } 175.7 ns 0.83 584 B 0.86
DeserializeExtensionJsonElementDictionary main { "Foo": "foo", "Bar": "barValue" } 206.0 ns 1.00 640 B 1.00
DeserializeExtensionJsonElementDictionary PR { "Foo": "foo", "Bar": "barValue" } 178.1 ns 0.86 544 B 0.85
DeserializeExtensionJsonObject main { "Foo": "foo", "Bar": "barValue" } 173.3 ns 1.00 544 B 1.00
DeserializeExtensionJsonObject PR { "Foo": "foo", "Bar": "barValue" } 176.9 ns 1.02 544 B 1.00
Baseline main { "Foo": 42, "Bar": 3.14 } 176.1 ns 1.00 256 B 1.00
Baseline PR { "Foo": 42, "Bar": 3.14 } 172.6 ns 0.98 256 B 1.00
DeserializeObjectDictionary main { "Foo": 42, "Bar": 3.14 } 208.5 ns 1.00 632 B 1.00
DeserializeObjectDictionary PR { "Foo": 42, "Bar": 3.14 } 165.3 ns 0.79 552 B 0.87
DeserializeJsonNodeDictionary main { "Foo": 42, "Bar": 3.14 } 212.2 ns 1.00 664 B 1.00
DeserializeJsonNodeDictionary PR { "Foo": 42, "Bar": 3.14 } 171.7 ns 0.81 584 B 0.88
DeserializeJsonElementDictionary main { "Foo": 42, "Bar": 3.14 } 207.8 ns 1.00 592 B 1.00
DeserializeJsonElementDictionary PR { "Foo": 42, "Bar": 3.14 } 162.1 ns 0.78 512 B 0.86
DeserializeExtensionObjectDictionary main { "Foo": 42, "Bar": 3.14 } 215.9 ns 1.00 656 B 1.00
DeserializeExtensionObjectDictionary PR { "Foo": 42, "Bar": 3.14 } 189.3 ns 0.88 576 B 0.88
DeserializeExtensionJsonElementDictionary main { "Foo": 42, "Bar": 3.14 } 214.4 ns 1.00 616 B 1.00
DeserializeExtensionJsonElementDictionary PR { "Foo": 42, "Bar": 3.14 } 175.4 ns 0.82 536 B 0.87
DeserializeExtensionJsonObject main { "Foo": 42, "Bar": 3.14 } 235.0 ns 1.00 744 B 1.00
DeserializeExtensionJsonObject PR { "Foo": 42, "Bar": 3.14 } 200.0 ns 0.85 664 B 0.89
Benchmarking code
[HideColumns("Job", "Min", "Max", "Median", "Error", "StdDev", "RatioSD", "Gen0")]
public class ExtensionJson
{
    private const string JsonStringValues = "{ \"Foo\": \"foo\", \"Bar\": \"barValue\" }";
    private const string JsonNumberValues = "{ \"Foo\": 42, \"Bar\": 3.14 }";

    [Params(JsonStringValues, JsonNumberValues)]
    public string Payload { get; set; }

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object Baseline() =>
        JsonSerializer.Deserialize<JsonElement>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeObjectDictionary() =>
        JsonSerializer.Deserialize<Dictionary<string, object>>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeJsonNodeDictionary() =>
        JsonSerializer.Deserialize<Dictionary<string, JsonNode>>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeJsonElementDictionary() =>
        JsonSerializer.Deserialize<Dictionary<string, JsonElement>>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeExtensionObjectDictionary() =>
        JsonSerializer.Deserialize<ExtensionObjectDictionary>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeExtensionJsonElementDictionary() =>
        JsonSerializer.Deserialize<ExtensionJsonElementDictionary>(Payload)!;

    [Benchmark]
    [BenchmarkCategory(Categories.Libraries, Categories.JSON)]
    public object DeserializeExtensionJsonObject() =>
        JsonSerializer.Deserialize<ExtensionJsonObject>(Payload)!;

    public class ExtensionObjectDictionary
    {
        [JsonExtensionData]
        public Dictionary<string, object> Properties { get; set; }
    }

    public class ExtensionJsonElementDictionary
    {
        [JsonExtensionData]
        public Dictionary<string, JsonElement> Properties { get; set; }
    }

    public class ExtensionJsonObject
    {
        [JsonExtensionData]
        public JsonObject Properties { get; set; }
    }
}

@PranavSenthilnathan PranavSenthilnathan self-assigned this Jun 8, 2025
@Copilot Copilot AI review requested due to automatic review settings June 8, 2025 21:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves deserialization of JSON primitives into JsonElement by caching immutable MetadataDb instances for common literal values, strings, and numbers. Key changes include updating Parse methods to use a ref Utf8JsonReader and introducing token-specific caching in MetadataDb, along with a minor adjustment in the metadata buffer sizing logic.

  • Updated Parse logic to pass reader by reference and select caching based on token type.
  • Introduced new MetadataDb creation methods (for literal, string, and number values) and a locked cache for small primitives.
  • Adjusted the condition for enlarging the MetadataDb buffer.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
JsonDocument.Parse.cs Adjusted parsing logic to use ref Utf8JsonReader and to call new CreateLockedFor* methods based on token type.
JsonDocument.MetadataDb.cs Introduced new caching methods for literals, strings, and numbers, and modified the buffer enlargement check.
Comments suppressed due to low confidence (2)

src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.MetadataDb.cs:266

  • Changing the condition from '>=' to '>' alters when the buffer is enlarged. Please verify that the new check correctly prevents buffer overflows when appending new rows.
if (Length > _data.Length - DbRow.Size)

src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:771

  • Subtracting 2 from the payload length assumes that the JSON string always includes both starting and ending quotes. Please confirm that this logic safely handles all edge cases.
MetadataDb database = MetadataDb.CreateLockedForString(utf8Json.Length - 2, reader.ValueIsEscaped);

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

private static readonly MetadataDb LockedNull =
CreateLockedForNonStringPrimitiveImpl(JsonTokenType.Null, JsonConstants.NullValue.Length);

// Index i is a singleton database for all numbers of length i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "all numbers of length i" mean exactly? Is it that we're caching all possible numbers that are up to 8 characters long? That would be ~ $10^8$ different numbers, not accounting for decimal points or exponentials.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the metadata database doesn't store the actual content, only the index into the actual content. The document will still need to store the UTF-8. But for primitive JSON values (like string and number), the metadata is just "what is the start offset of the value" and "how long is the value" (and "is the value escaped" for strings, but I chose not to cache escaped strings). The start offset is always 0 for number and 1 for string (to skip the quote) so we just need to store "how long is the value". So there would only be 8 cached MetadataDbs which represent 0 <= n <10^8 actual values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. I take it these are only used if the strings or numbers don't have preceding or trailing whitespace or comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is used only in the ParseValue path so the ReadOnlyMemory<byte> that the JsonDocument holds will only contain the single value without leading/trailing junk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if we have relevant test coverage just in case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants