-
-
Notifications
You must be signed in to change notification settings - Fork 0
⚡ optimize Encoder.Encode
#16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors internal encoder fast paths and key/value encoding in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Qs as Qs.Encode
participant Enc as Encoder
participant SC as SideChannel
participant OPT as EncodeOptions
Caller->>Qs: Encode(data, options)
Qs->>Enc: Encode(data, OPT)
activate Enc
alt value is leaf
note right of Enc #dff0d8: fast leaf path (skip cycle-check)
Enc->>Enc: IsLeaf -> ToInvariantString / encoder -> append
else container/sequence
Enc->>Enc: compute dotsAndEncode & list format
alt comma-list fast path
note right of Enc #f7f7d9: materialize & join, use AppendCommaEncodedValue when encoding
Enc->>Enc: materialize elements -> join -> return
else collection fast path
note right of Enc #d9edf7: per-parent mark, hoist childEncoder
Enc->>SC: mark parent once
Enc->>Enc: iterate IDictionary/Array/IList -> AddKv (dot-encode keys)
else general path
note right of Enc #f2dede: cycle detection per-node
Enc->>Enc: cycle detection -> recursive encode
end
end
Enc-->>Qs: encoded result
Qs-->>Caller: result
deactivate Enc
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16 +/- ##
==========================================
- Coverage 90.42% 85.47% -4.96%
==========================================
Files 15 15
Lines 1379 1687 +308
Branches 391 513 +122
==========================================
+ Hits 1247 1442 +195
- Misses 132 245 +113
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ved performance and clarity
…performance and correctness
…eration, and side-channel marking; improve value encoding and allocation efficiency
Encoder.Encode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Encoder.cs (1)
214-229
: Handle DateTimeOffset at the top level (invariant “o” format).DateTimeOffset currently bypasses serializeDate and falls through to leaf handling with culture-dependent ToString(), causing inconsistent output vs DateTime and comma-array normalization.
Apply this diff:
- else if (obj is DateTime dt) + else if (obj is DateTime dt) obj = serializeDate is null ? dt.ToString("o") : serializeDate(dt); + else if (obj is DateTimeOffset dto) + obj = dto.ToString("o");
🧹 Nitpick comments (2)
benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs (1)
76-93
: Consider adding a nested list and an empty list case in the generated data.This will exercise AllowEmptyLists and nested array handling in the encoder fast paths.
Apply this diff in Setup to add edge cases:
_data = new Dictionary<string, object?> { ["a"] = list, + ["a_empty"] = new List<object?>(), + ["a_nested"] = new List<object?> { new List<object?> { MakeValue(ValueLen, NeedsEscPercent, rnd) } }, ["b"] = new Dictionary<string, object?> { ["x.y"] = MakeValue(ValueLen, NeedsEscPercent, rnd), ["inner"] = new Dictionary<string, object?> { ["z"] = MakeValue(ValueLen, NeedsEscPercent, rnd) } }, ["c"] = new DateTimeOffset(2024, 1, 2, 3, 4, 5, TimeSpan.Zero), ["d"] = true };QsNet/Internal/Encoder.cs (1)
180-182
: Delegate equality for comma generator: add a guard against factory delegates.If GetGenerator returns a new closure per call, equality may be false. Consider caching the Comma generator once or exposing a stable flag on ListFormat.
Apply this minor change:
- var isCommaGen = gen == ListFormat.Comma.GetGenerator(); + var commaGen = ListFormat.Comma.GetGenerator(); + var isCommaGen = gen == commaGen;And reuse commaGen if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
QsNet/Internal/Encoder.cs
(11 hunks)benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs (1)
QsNet/Qs.cs (1)
Qs
(20-280)
QsNet/Internal/Encoder.cs (2)
QsNet/Internal/SideChannelFrame.cs (3)
MethodImpl
(16-27)MethodImpl
(29-36)SideChannelFrame
(11-37)QsNet/Internal/Utils.cs (3)
Utils
(20-1227)IsNonNullishPrimitive
(755-766)Encode
(381-502)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (5)
benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs (1)
13-23
: Benchmark scaffolding looks solid and parameterized.Good use of Params, deterministic data generation, and realistic object graph. The DotMode mapping to EncodeOptions is clear and lets you probe the dot-handling paths.
Also applies to: 25-34, 67-107
QsNet/Internal/Encoder.cs (4)
192-210
: Cycle-detection: verify single mark-per-parent still prevents cycles in nested graphs.You removed per-iteration sideChannel.Set and now mark once per parent. Ensure self-referential lists/dictionaries still throw “Cyclic object value” in all paths (including comma arrays and fast path 1).
You can add targeted tests like:
// self list var a = new List<object?>(); a.Add(a); Assert.Throws<InvalidOperationException>(() => Qs.Encode(new Dictionary<string, object?> { ["x"] = a })); // self dict var d = new Dictionary<string, object?>(); d["self"] = d; Assert.Throws<InvalidOperationException>(() => Qs.Encode(d));If any path misses the mark, consider restoring per-child Set for non-leaf children only.
Also applies to: 582-586
481-513
: Comma-join pre-materialization: ToString vs invariant conversion.The non-encoder branch uses ToInvariantString for elements (good), but the earlier encoder branch previously used ToInvariantString via custom logic; after switching to encoder (see prior comment), both branches will align.
241-252
: Leaf path: rely on encoder for values-only when Encode=false.When encoder is null and encodeValuesOnly=true, keys are intentionally unencoded, but values still go through ToInvariantString. This matches current behavior; just confirming intent.
If you want parity with Encode=false everywhere, keep as-is. If not, route values through Utils.Encode(value, cs, format) when format-dependent transforms are required.
589-593
: Hoisted child encoder: good optimization and safe for comma arrays.Nulling the encoder for comma arrays prevents double-encoding of elements when joined. Combined with the prior fix (use encoder in the join loop), this stays coherent.
… and centralizing comma escaping
…ng gating logic with comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
QsNet/Internal/Encoder.cs (2)
186-197
: Inconsistent encoder usage: always pass charset/format.
Some paths pass (null, null) to encoder while others pass (cs, format). This can change escaping (e.g., RFC1738 space/+ and ISO‑8859‑1 fallbacks) depending on the encoder implementation.Apply:
- var keyPart = encodeValuesOnly ? keyPrefixStr : encoder(keyPrefixStr, null, null); - var valuePart = encoder(obj, null, null); + var keyPart = encodeValuesOnly ? keyPrefixStr : encoder(keyPrefixStr, cs, format); + var valuePart = encoder(obj, cs, format);
429-462
: Generic comma-join pre-normalization: pass charset/format to encoder for elements.
Otherwise element-encoding can diverge from other paths.Apply:
- foreach (var el in enumerable) - strings.Add(el is null ? "" : encoder(el, null, null)); + foreach (var el in enumerable) + strings.Add(el is null ? "" : encoder(el, cs, format));
♻️ Duplicate comments (2)
QsNet/Internal/Encoder.cs (2)
43-66
: Comma-element encoder now defers to the provided encoder — resolves prior divergence.
This addresses the earlier concern about reimplementing escaping; only the comma replacement remains post-encode, which is fine.
129-131
: Dot-encoding condition is inconsistent between prefixes and keys.
Prefixes gate on encodeDotInKeys; keys gate on allowDots && encodeDotInKeys (dotsAndEncode). This can yield partial encoding when AllowDots=false but EncodeDotInKeys=true.If intentional, add a brief comment near dotsAndEncode explaining the legacy contract and add tests for:
- AllowDots=false, EncodeDotInKeys=true (no partial surprises on nested prefixes)
- AllowDots=true, EncodeDotInKeys=true (both prefixes and keys encode '.')
Otherwise, switch prefix gating to dotsAndEncode.Also applies to: 214-229, 354-365, 514-524, 672-680
🧹 Nitpick comments (2)
QsNet/Internal/Encoder.cs (2)
67-72
: Propagate skipNulls into IsLeaf for correctness.
Empty strings should respect skipNulls semantics consistently with Utils.IsNonNullishPrimitive.Apply:
- return Utils.IsNonNullishPrimitive(v) || v is byte[]; + return Utils.IsNonNullishPrimitive(v, skipNulls) || v is byte[];
125-127
: Delegate equality for Comma generator may be brittle; cache once or compare Method/Target.
If GetGenerator() returns new delegate instances, gen == ListFormat.Comma.GetGenerator() can intermittently fail.Consider:
+ private static readonly ListFormatGenerator CommaGen = ListFormat.Comma.GetGenerator(); ... - var isCommaGen = gen == ListFormat.Comma.GetGenerator(); + var isCommaGen = gen == CommaGen || (gen.Method == CommaGen.Method && Equals(gen.Target, CommaGen.Target));Please verify behavior across .NET targets you support.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Encoder.cs
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Encoder.cs (4)
QsNet/Internal/SideChannelFrame.cs (3)
MethodImpl
(16-27)MethodImpl
(29-36)SideChannelFrame
(11-37)QsNet/Internal/Utils.cs (4)
Utils
(20-1227)Encode
(381-502)IsNonNullishPrimitive
(755-766)List
(695-723)QsNet/Qs.cs (1)
Encode
(126-279)QsNet/Models/Filter.cs (2)
IterableFilter
(36-51)IterableFilter
(42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (4)
QsNet/Internal/Encoder.cs (4)
20-41
: Helper is good; invariant numeric/boolean formatting avoids culture bugs.
137-155
: Cycle detection logic: step-based compare risks false negatives/positives.
The Set uses step, but the upward scan resets step to 0 at root; equality (pos == step) depends on precise Set/scan pairing. Edge cases with shared parents and reused frames could slip through.Please add unit tests that must throw:
- Direct self-cycle: var a = new Dictionary<string, object?>(); a["x"]=a;
- Two-object cycle: a->b->a;
- Sibling re-reference: parent -> { x: child, y: child } (should not throw).
I can draft the tests if helpful.
533-537
: Good: mark side-channel once per parent to cut per-child overhead.
716-723
: Efficient aggregation of mixed return shapes.
Covers List<object?> and IEnumerable paths without allocations—looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Encoder.cs (1)
195-197
: Pass charset/format to encoder (avoids silent defaults).Calls to the injected encoder pass (null, null). This can ignore caller-provided Charset/Format and lead to inconsistent escaping vs other paths that pass (cs, format).
- var keyPart = encodeValuesOnly ? keyPrefixStr : encoder(keyPrefixStr, null, null); - var valuePart = encoder(obj, null, null); + var keyPart = encodeValuesOnly ? keyPrefixStr : encoder(keyPrefixStr, cs, format); + var valuePart = encoder(obj, cs, format);
♻️ Duplicate comments (2)
QsNet/Internal/Encoder.cs (2)
424-427
: Unify encoder parameter passing in comma path.These calls still pass (null, null). Use (cs, format) like other paths to honor options.
- var keyPartC = encoder(adjustedPrefixC, null, null); - var valuePartC = encoder(joinedC, null, null); + var keyPartC = encoder(adjustedPrefixC, cs, format); + var valuePartC = encoder(joinedC, cs, format);
540-545
: Values-only + comma lists: generic path emits unencoded values.Setting childEncoderForElements to null for (isCommaGen && encodeValuesOnly) causes nested leaves to bypass percent-encoding when fast-path (#2) doesn’t trigger (e.g., Sort set, IterableFilter). This breaks EncodeValuesOnly semantics.
- var childEncoderForElements = - isCommaGen && encodeValuesOnly && obj is IEnumerable and not string - ? null - : encoder; + var childEncoderForElements = encoder;Please add a test: ListFormat=Comma, EncodeValuesOnly=true, with Sort or IterableFilter present, verifying values are still encoded.
🧹 Nitpick comments (2)
QsNet/Internal/Encoder.cs (2)
221-230
: Dot-in-keys prefix handling inconsistent with key handling; consider using dotsAndEncode everywhere.Prefixes gate on encodeDotInKeys only, while keys gate on allowDots && encodeDotInKeys (dotsAndEncode). This can yield partial encoding when AllowDots=false but EncodeDotInKeys=true. If legacy behavior isn’t required, unify to dotsAndEncode for consistency.
- var encodedPrefixFast = encodeDotInKeys && keyPrefixStr.Contains('.', StringComparison.Ordinal) + var encodedPrefixFast = dotsAndEncode && keyPrefixStr.Contains('.', StringComparison.Ordinal)- var encodedPrefixC = encodeDotInKeys && keyPrefixStr.Contains('.', StringComparison.Ordinal) + var encodedPrefixC = dotsAndEncode && keyPrefixStr.Contains('.', StringComparison.Ordinal)- var encodedPrefix = encodeDotInKeys && keyPrefixStr.Contains('.', StringComparison.Ordinal) + var encodedPrefix = dotsAndEncode && keyPrefixStr.Contains('.', StringComparison.Ordinal)Also applies to: 362-366, 522-525
67-73
: IsLeaf may treat empty strings as non-leaves when SkipNulls=true.Utils.IsNonNullishPrimitive returns false for empty strings under skipNulls=true, causing unnecessary side-channel frames/visits. Consider treating string as leaf regardless of emptiness.
- return Utils.IsNonNullishPrimitive(v) || v is byte[]; + return v is string || v is byte[] || Utils.IsNonNullishPrimitive(v, skipNulls);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
QsNet/Internal/Encoder.cs
(11 hunks)benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Encoder.cs (4)
QsNet/Internal/SideChannelFrame.cs (3)
MethodImpl
(16-27)MethodImpl
(29-36)SideChannelFrame
(11-37)QsNet/Internal/Utils.cs (4)
Utils
(20-1227)Encode
(381-502)IsNonNullishPrimitive
(755-766)List
(695-723)QsNet/Qs.cs (1)
Encode
(126-279)QsNet/Models/Filter.cs (2)
IterableFilter
(36-51)IterableFilter
(42-45)
🔇 Additional comments (2)
QsNet/Internal/Encoder.cs (2)
138-156
: Cycle-detection step logic: please verify invariants.The algorithm compares stored step with a walking step that resets at the root. Ensure Set(objKey, step) invariants align with this lookup, especially when fast-paths skip per-child Set calls.
Would you like a small unit test to assert detection across nested arrays/dictionaries with/without leaves?
352-376
: LGTM: comma fast-path short-circuiting and empty-list handling.Efficient materialization and early returns look correct and reduce allocations on hot paths.
…ss encoding formats and data shapes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Encoder.cs (1)
430-447
: Generic comma path: avoid re-encoding already-encoded values (values-only mode).You pre-encode elements into strings when encodeValuesOnly && encoder != null, but later still pass encoder down (childEncoderForElements), causing double-encoding.
Apply:
- List<object?> objKeys; + List<object?> objKeys; + var commaElementsAlreadyEncoded = false; @@ - if (encodeValuesOnly && encoder != null) + if (encodeValuesOnly && encoder != null) { foreach (var el in enumerable) strings.Add(el is null ? "" : encoder(el, cs, format)); + commaElementsAlreadyEncoded = true; } else foreach (var el in enumerable) strings.Add(el?.ToString() ?? ""); @@ - // For comma-joined arrays in values-only mode, do not re-encode children. - var childEncoderForElements = encoder; + // For comma-joined arrays in values-only mode, do not re-encode the joined string. + var childEncoderForElements = commaElementsAlreadyEncoded ? null : encoder;Add a test for options { ListFormat: Comma, EncodeValuesOnly: true } with Sort non-null or IterableFilter present to guard against regressions.
Also applies to: 539-542
♻️ Duplicate comments (2)
QsNet/Internal/Encoder.cs (2)
43-66
: AppendCommaEncodedValue now defers to Utils.Encode — good consistency.This keeps ISO‑8859‑1 numeric-entity fallback and RFC1738 parentheses behavior aligned with Utils.Encode.
352-427
: Comma fast-path encodes separator commas and can double-encode values.When encoder != null and EncodeValuesOnly == false, the code encodes the entire joined string, turning commas into %2C. Also, joining already-encoded elements and then re-encoding joinedC double-encodes percent sequences.
Apply:
string joinedC; - if (encodeValuesOnly && encoder != null) + if (encodeValuesOnly && encoder != null) { // Stream-encode each element and append literal commas between them. var sbJoined = new StringBuilder(listC.Count * 8); for (var i = 0; i < listC.Count; i++) { if (i > 0) sbJoined.Append(','); // separator comma is never re-encoded AppendCommaEncodedValue(sbJoined, listC[i], cs, format, encoder); } joinedC = sbJoined.ToString(); // Match legacy semantics: if the joined value is empty, treat it like `null`. if (!string.IsNullOrEmpty(joinedC)) return $"{fmt(adjustedPrefixC)}={fmt(joinedC)}"; if (skipNulls) return Array.Empty<object?>(); if (strictNullHandling) return !encodeValuesOnly ? fmt(encoder(adjustedPrefixC, cs, format)) : adjustedPrefixC; // not strict: fall through to return `key=` below // In values-only mode we do not encode the key via `encoder`. return $"{fmt(adjustedPrefixC)}={fmt(joinedC)}"; } - // Join raw string representations; apply encoder to the full result if provided. - var tmp = new List<string>(listC.Count); - foreach (var el in listC) - tmp.Add(ToInvariantString(el)); - joinedC = string.Join(",", tmp); + // Encode each element (if encoder provided) and keep separator commas literal. + if (encoder != null) + { + var sbJoined = new StringBuilder(listC.Count * 8); + for (var i = 0; i < listC.Count; i++) + { + if (i > 0) sbJoined.Append(','); + AppendCommaEncodedValue(sbJoined, listC[i], cs, format, encoder); + } + joinedC = sbJoined.ToString(); + } + else + { + var tmp = new List<string>(listC.Count); + foreach (var el in listC) + tmp.Add(ToInvariantString(el)); + joinedC = string.Join(",", tmp); + } // Match legacy semantics: if the joined value is empty, treat it like `null`. if (string.IsNullOrEmpty(joinedC)) { if (skipNulls) return Array.Empty<object?>(); if (strictNullHandling) return encoder != null && !encodeValuesOnly ? fmt(encoder(adjustedPrefixC, cs, format)) : adjustedPrefixC; // not strict: fall through to return `key=` below } - if (encoder == null) return $"{fmt(adjustedPrefixC)}={fmt(joinedC)}"; - var keyPartC = encoder(adjustedPrefixC, cs, format); - var valuePartC = encoder(joinedC, cs, format); - return $"{fmt(keyPartC)}={fmt(valuePartC)}"; + if (encoder == null) return $"{fmt(adjustedPrefixC)}={fmt(joinedC)}"; + var keyPartC = encoder(adjustedPrefixC, cs, format); + // Value has already been element-encoded; keep commas literal. + return $"{fmt(keyPartC)}={fmt(joinedC)}";
🧹 Nitpick comments (1)
benchmarks/QsNet.Benchmarks/UtilsEncodeBenchmarks.cs (1)
34-41
: Ensure Latin1 exercises the ISO‑8859‑1 fallback path in Utils.Encode.Utils.Encode checks Equals(encoding, Encoding.GetEncoding("ISO-8859-1")). Depending on framework, Encoding.Latin1 may not be reference-equal. Prefer GetEncoding to be safe.
Apply:
- _enc = EncName == "Latin1" ? Encoding.Latin1 : new UTF8Encoding(false); + _enc = EncName == "Latin1" ? Encoding.GetEncoding("ISO-8859-1") : new UTF8Encoding(false);If you want to keep Encoding.Latin1, please confirm the fallback branch is hit at runtime for the benchmark.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
QsNet/Internal/Encoder.cs
(10 hunks)benchmarks/QsNet.Benchmarks/UtilsEncodeBenchmarks.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
benchmarks/QsNet.Benchmarks/UtilsEncodeBenchmarks.cs (1)
QsNet/Internal/Utils.cs (1)
Utils
(20-1227)
QsNet/Internal/Encoder.cs (4)
QsNet/Internal/SideChannelFrame.cs (3)
MethodImpl
(16-27)MethodImpl
(29-36)SideChannelFrame
(11-37)QsNet/Internal/Utils.cs (4)
Utils
(20-1227)Encode
(381-502)IsNonNullishPrimitive
(755-766)List
(695-723)QsNet/Qs.cs (1)
Encode
(126-279)QsNet/Models/Filter.cs (2)
IterableFilter
(36-51)IterableFilter
(42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (4)
QsNet/Internal/Encoder.cs (4)
125-131
: Delegate equality for comma detection looks good.Using “==” (structural delegate equality) is correct and fixes the ReferenceEquals pitfall.
215-350
: Direct-iteration fast path is solid; nice allocation cuts.Skipping objKeys and reusing the side channel per parent is a good win. No issues spotted.
221-230
: Dot-in-keys prefix gating differs from key gating; confirm intended behavior.Prefixes use encodeDotInKeys, while nested keys use allowDots && encodeDotInKeys (dotsAndEncode). This can yield partial encoding when AllowDots=false and EncodeDotInKeys=true.
If you want uniform behavior, flip prefix gating to dotsAndEncode; otherwise, consider a comment or test locking in the legacy behavior.
Also applies to: 357-366, 515-525
138-156
: Cycle detection skip for leaves is a safe micro-optimization.Good win; avoids walking the side-channel for primitives.
…hm with lazy allocation
…accurate percent-encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Utils.cs (1)
400-409
: Culture-sensitive number formatting in Encode()Numbers will be formatted with the current culture (e.g., 3,14 in some locales), which is a correctness bug for query strings. Emit invariant representations.
Apply:
- var str = value switch - { - bool b => b ? "true" : "false", - byte[] bytes => encoding.GetString(bytes), - _ => value?.ToString() - }; + var str = value switch + { + bool b => b ? "true" : "false", + byte[] bytes => encoding.GetString(bytes), + sbyte v => v.ToString(CultureInfo.InvariantCulture), + byte v => v.ToString(CultureInfo.InvariantCulture), + short v => v.ToString(CultureInfo.InvariantCulture), + ushort v => v.ToString(CultureInfo.InvariantCulture), + int v => v.ToString(CultureInfo.InvariantCulture), + uint v => v.ToString(CultureInfo.InvariantCulture), + long v => v.ToString(CultureInfo.InvariantCulture), + ulong v => v.ToString(CultureInfo.InvariantCulture), + float v => v.ToString(CultureInfo.InvariantCulture), + double v => v.ToString(CultureInfo.InvariantCulture), + decimal v => v.ToString(CultureInfo.InvariantCulture), + char ch => ch.ToString(), + _ => value?.ToString() + };
🧹 Nitpick comments (3)
QsNet/Internal/Utils.cs (3)
435-573
: Reduce duplication between RFC1738 and RFC3986 loops (optional)Both branches differ only by the unreserved predicate and the space-to-'+' rule. Consider extracting a small local static helper to DRY the switch body while preserving JIT inlining for hot paths.
Also applies to: 575-703
265-301
: Obsolete Escape/Unescape retained for Latin-1 path: document rationaleKeeping these for Latin-1 is fine; add a short comment linking to the tests/benchmarks that justify this path and its constraints (e.g., numeric entities for >0xFF).
383-705
: Would you like targeted tests for the changed encoding paths?I can add tests covering:
- RFC1738: "a b+c" => "a+b%2Bc"
- RFC3986: "a b+c" => "a%20b%2Bc"
- Latin-1 RFC1738: "ñ + ç" in 28591 with proper '+' and "%2B"
- Unpaired surrogate: "\uD800x" => "%EF%BF%BDx"
- Invariant numbers: 1.23m in cultures with comma decimal.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (2)
QsNet/Internal/Encoder.cs (3)
MethodImpl
(20-41)MethodImpl
(45-65)MethodImpl
(67-72)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Test 10.0.x on ubuntu-latest
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (5)
QsNet/Internal/Utils.cs (5)
369-374
: Helpers for unreserved checks look goodThe inlined predicates are correct for RFC3986 and RFC1738 (paren extension).
Also applies to: 376-382
706-740
: HttpUtility.UrlDecode exception handling: OKGraceful fallback to raw string is reasonable for malformed inputs.
1005-1100
: InterpretNumericEntities(): solid and allocation-awareThe span/Substring split is handled correctly across TFMs; boundary checks look good.
951-968
: IsNonNullishPrimitive(): consistent with Encoder.IsLeaf usageMatches library expectations; no issues.
717-731
: Plus-to-space behavior for Decode is intentional and covered by tests
Tests such as Qs.Decode("foo=c++") expect "c ", confirming '+' is always replaced by space regardless of encoding, and no callers depend on literal '+' preservation.
…ng fast ASCII scan for RFC1738 and RFC3986 formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Utils.cs (1)
862-890
: Introduce Format-aware Decode overload
Add a new overload in QsNet/Internal/Utils.cs:public static string? Decode(string? str, Encoding? encoding) => Decode(str, encoding, Format.Rfc1738); public static string? Decode(string? str, Encoding? encoding, Format format) { encoding ??= Encoding.UTF8; var input = format == Format.Rfc1738 ? str?.Replace('+', ' ') : str; if (encoding.CodePage == 28591) try { /* existing ISO-8859-1 branch, using input */ } catch { return input; } try { /* existing UrlDecode branch, using input */ } catch { return input; } }Only replace
'+' → ' '
whenformat == Format.Rfc1738
. ImportQsNet.Enums
if missing.
♻️ Duplicate comments (3)
QsNet/Internal/Utils.cs (3)
422-504
: RFC1738: space must become '+', literal '+' must become '%2B' (Latin‑1 branch)Encode currently emits "%20" for spaces and may leave '+' literal. Implement the canonical form: ' ' → '+', '+' → "%2B". Mirrors Decode’s '+' handling and prevents data loss. (This reiterates prior feedback.)
if (!escapeHeavy) { var lastSafe = 0; for (var idx = 0; idx < len; idx++) { int c = s[idx]; var safeAscii = c <= 0x7F && IsSafeLatin1Ascii1738((char)c); if (safeAscii) continue; // flush preceding safe run if (idx > lastSafe) sb.Append(s, lastSafe, idx - lastSafe); - if (c <= 0xFF) - { - sb.Append(table[c]); // %XX for Latin-1 bytes - } + if (c <= 0xFF) + { + if (c == 0x20) // space + { + sb.Append('+'); + } + else if (c == 0x2B) // '+' + { + sb.Append(table[0x2B]); // "%2B" + } + else + { + sb.Append(table[c]); // %XX for Latin-1 bytes + } + } else { // For non-Latin1 code units, emit percent-encoded numeric entity: %26%23{code}%3B sb.Append("%26%23"); sb.Append(c); sb.Append("%3B"); }else { // Escape-heavy mode: no run bookkeeping if (i > 0) sb.Append(s, 0, i); for (var j = i; j < len; j++) { int c = s[j]; switch (c) { + case 0x20: // space + sb.Append('+'); + continue; case <= 0x7F when IsSafeLatin1Ascii1738((char)c): sb.Append((char)c); continue; case <= 0xFF: - sb.Append(table[c]); + if (c == 0x2B) sb.Append(table[0x2B]); else sb.Append(table[c]); break; default: sb.Append("%26%23"); sb.Append(c); sb.Append("%3B"); break; } } }
590-723
: RFC1738 (UTF‑8): map spaces to '+', keep literal '+' percent‑encodedSpaces currently become "%20". Make ' ' → '+' in both loops; '+' is already percent‑encoded via the table (correct). (Reiterates earlier feedback.)
switch (c) { + case 0x20: + sb.Append('+'); + break; case < 0x80: sb.Append(table[c]); break;switch (c) { + case 0x20: + sb.Append('+'); + continue; case <= 0x7F when IsUnreservedAscii1738((char)c): sb.Append((char)c); continue;
642-659
: Unpaired surrogate must encode as U+FFFD (EF BF BD), not as a 3‑byte sequence of the surrogate code unitCurrent branches percent‑encode the UTF‑8 for the surrogate code unit, yielding ill‑formed UTF‑8. Replace with the UTF‑8 for U+FFFD in all four places. (Reiterates earlier feedback.)
- sb.Append(table[0xE0 | (c >> 12)]); - sb.Append(table[0x80 | ((c >> 6) & 0x3F)]); - sb.Append(table[0x80 | (c & 0x3F)]); + // invalid surrogate => U+FFFD + sb.Append(table[0xEF]); + sb.Append(table[0xBF]); + sb.Append(table[0xBD]);- sb.Append(table[0xE0 | (c >> 12)]); - sb.Append(table[0x80 | ((c >> 6) & 0x3F)]); - sb.Append(table[0x80 | (c & 0x3F)]); + // invalid surrogate => U+FFFD + sb.Append(table[0xEF]); + sb.Append(table[0xBF]); + sb.Append(table[0xBD]);- sb.Append(table[0xE0 | (c >> 12)]); - sb.Append(table[0x80 | ((c >> 6) & 0x3F)]); - sb.Append(table[0x80 | (c & 0x3F)]); + // invalid surrogate => U+FFFD + sb.Append(table[0xEF]); + sb.Append(table[0xBF]); + sb.Append(table[0xBD]);- sb.Append(table[0xE0 | (c >> 12)]); - sb.Append(table[0x80 | ((c >> 6) & 0x3F)]); - sb.Append(table[0x80 | (c & 0x3F)]); + // invalid surrogate => U+FFFD + sb.Append(table[0xEF]); + sb.Append(table[0xBF]); + sb.Append(table[0xBD]);Also applies to: 710-714, 786-790, 840-844
🧹 Nitpick comments (1)
QsNet/Internal/Utils.cs (1)
507-514
: Remove stale comment: '+' is no longer safe in Latin‑1 RFC3986 pathTo avoid confusion after fixing '+' handling, drop the “treat '+' as safe” remark here.
- // Legacy behavior: in Latin-1 mode, treat '+' as safe (do not percent-encode) - // RFC3986 path (no parentheses allowed) + // RFC3986 path (no parentheses allowed)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (2)
QsNet/Internal/Encoder.cs (3)
MethodImpl
(20-41)MethodImpl
(45-65)MethodImpl
(67-72)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
…FFFD and improve clarity in encoding logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
QsNet/Internal/Utils.cs (1)
889-903
: Latin‑1 Decode should interpret numeric entities produced by Encode.
Encode (Latin‑1) emits “%26%23{code}%3B”, which decodes to “&#code;”. Finish by interpreting numeric entities so round‑trips restore code points.- try - { - return MyRegex() - .Replace(strWithoutPlus ?? string.Empty, + try + { + var decoded = MyRegex() + .Replace(strWithoutPlus ?? string.Empty, #pragma warning disable CS0618 - match => Unescape(match.Value) + match => Unescape(match.Value) #pragma warning restore CS0618 - ); + ); + return InterpretNumericEntities(decoded); }
♻️ Duplicate comments (3)
QsNet/Internal/Utils.cs (3)
368-387
: Fix '+' handling in Latin‑1 safe sets (lossy round‑trip with Decode).
Decode() always maps '+' → space; keeping '+' “safe” here makes literal '+' round‑trip as space. Remove '+' from both Latin‑1 safe sets so it percent‑encodes as “%2B”.Apply:
private static bool IsSafeLatin1Ascii1738(char ch) { - // Legacy Latin-1 encode behavior: - // - treat '+' as safe (do not encode) - // - treat '~' as unsafe (percent-encode) - // - RFC1738 adds '(' and ')' - return ch is '+' or '(' or ')' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' + // Latin-1 encode behavior: + // - '+' is NOT safe (encode as %2B) + // - '~' is unsafe (percent-encode) + // - RFC1738 adds '(' and ')' + return ch is '(' or ')' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; } private static bool IsSafeLatin1Ascii3986(char ch) { - // Legacy Latin-1 encode behavior: - // - treat '+' as safe (do not encode) - // - treat '~' as unsafe (percent-encode) - return ch is '+' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; + // Latin-1 encode behavior: + // - '+' is NOT safe (encode as %2B) + // - '~' is unsafe (percent-encode) + return ch is '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; }
451-476
: RFC1738 (Latin‑1): output '+' for spaces and encode integer entities culture‑invariant.
Spaces must become '+', and numeric-entity codepoints must use InvariantCulture. This fixes RFC1738 compliance and prevents locale‑dependent digits.- if (c <= 0xFF) - { - sb.Append(table[c]); // %XX for Latin-1 bytes - } - else - { - // For non-Latin1 code units, emit percent-encoded numeric entity: %26%23{code}%3B - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - } + if (c == 0x20) + { + // RFC1738: space => '+' + sb.Append('+'); + } + else if (c <= 0xFF) + { + sb.Append(table[c]); // %XX for Latin-1 bytes + } + else + { + // For non-Latin1 code units, emit percent-encoded numeric entity: %26%23{code}%3B + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + }- case <= 0x7F when IsSafeLatin1Ascii1738((char)c): + case <= 0x7F when IsSafeLatin1Ascii1738((char)c): sb.Append((char)c); continue; + case 0x20: // space => '+' + sb.Append('+'); + break; case <= 0xFF: sb.Append(table[c]); break; default: - sb.Append("%26%23"); - sb.Append(c); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); sb.Append("%3B"); break;Also applies to: 485-503
616-677
: RFC1738 (UTF‑8): output '+' for spaces in both loops.
Space must be '+'. Literal '+' already becomes “%2B” via the unreserved set.- // fast UTF-8 encode, surrogate-aware - if ((uint)c < 0x80) + // fast UTF-8 encode, surrogate-aware + if (c == 0x20) + { + // RFC1738: space => '+' + sb.Append('+'); + } + else if ((uint)c < 0x80) { sb.Append(table[c]); }- if ((uint)c < 0x80) + if ((uint)c < 0x80) { + if (c == 0x20) { sb.Append('+'); continue; } // RFC1738 if (IsUnreservedAscii1738((char)c)) { sb.Append((char)c); continue; } sb.Append(table[c]); }Also applies to: 685-731
🧹 Nitpick comments (1)
QsNet/Internal/Utils.cs (1)
419-422
: Comment is misleading; no longer a “%u-fast-skip” branch.
This block no longer checks for %u; rephrase to “Latin‑1 (ISO‑8859‑1) path with run‑copy/escape‑heavy strategies.” Also, the WebName check is redundant given CodePage==28591.- // Latin-1 (ISO-8859-1) path with fast skip when no %u-escapes are present - if (encoding.CodePage == 28591 || - string.Equals(encoding.WebName, "iso-8859-1", StringComparison.OrdinalIgnoreCase)) + // Latin-1 (ISO-8859-1) path with run-copy / escape-heavy strategies + if (encoding.CodePage == 28591) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (2)
QsNet/Internal/Encoder.cs (3)
MethodImpl
(20-41)MethodImpl
(45-65)MethodImpl
(67-72)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
QsNet/Internal/Utils.cs (2)
354-367
: Helpers look good; unreserved ASCII sets are correct.
The RFC3986 and RFC1738 unreserved predicates are accurate and fast-inlineable.
744-875
: UTF‑8 path surrogate handling is correct.
Unpaired surrogates emit U+FFFD (percent‑encoded), fixing ill‑formed UTF‑8. Loops are efficient with run‑copy vs escape‑heavy split.
…computed lookup tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
QsNet/Internal/Utils.cs (3)
368-386
: Fix '+' handling in Latin‑1: remove from “safe” sets to avoid lossy Decode()Decode() unconditionally maps '+' → space, so keeping '+' “safe” in Latin‑1 makes round‑trips lossy. Encode should emit "%2B" for literal '+' in both 1738 and 3986. Remove '+' from both Latin‑1 safe sets.
private static bool IsSafeLatin1Ascii1738(char ch) { - // Legacy Latin-1 encode behavior: - // - treat '+' as safe (do not encode) - // - treat '~' as unsafe (percent-encode) - // - RFC1738 adds '(' and ')' - return ch is '+' or '(' or ')' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' + // Legacy Latin-1 encode behavior: + // - '+' is not safe (encode as %2B to preserve round-trips) + // - '~' is unsafe (percent-encode) + // - RFC1738 adds '(' and ')' + return ch is '(' or ')' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; } private static bool IsSafeLatin1Ascii3986(char ch) { - // Legacy Latin-1 encode behavior: - // - treat '+' as safe (do not encode) - // - treat '~' as unsafe (percent-encode) - return ch is '+' or '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; + // Legacy Latin-1 encode behavior: + // - '+' is not safe (encode as %2B to preserve round-trips) + // - '~' is unsafe (percent-encode) + return ch is '-' or '.' or '_' or >= '0' and <= '9' or >= 'A' and <= 'Z' or >= 'a' and <= 'z'; }
489-501
: RFC1738 must emit '+' for spaces (both Latin‑1 and UTF‑8 paths)Per application/x-www-form-urlencoded, spaces => '+' and literal '+' => "%2B". Add explicit handling for 0x20 in both 1738 loops.
@@ - if (c <= 0xFF) + if (c == 0x20) // space => '+' + { + sb.Append('+'); + } + else if (c <= 0xFF) { sb.Append(table[c]); // %XX for Latin-1 bytes } @@ - switch (c) + switch (c) { + case 0x20: // space => '+' + sb.Append('+'); + break; case <= 0x7F when IsSafeLatin1Ascii1738((char)c): sb.Append((char)c); continue; @@ - if ((uint)c < 0x80) + if ((uint)c < 0x80) { - sb.Append(table[c]); + if (c == 0x20) // space => '+' + { + sb.Append('+'); + } + else + { + sb.Append(table[c]); + } } @@ - if ((uint)c < 0x80) + if ((uint)c < 0x80) { + if (c == 0x20) { sb.Append('+'); continue; } if (IsUnreservedAscii1738((char)c)) { sb.Append((char)c); continue; } sb.Append(table[c]); }Also applies to: 516-530, 663-704, 719-729
495-499
: Latin‑1 numeric-entity digits should be culture‑invariantAppend(int) is culture-sensitive. Use InvariantCulture when emitting “&#{code};”.
- sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); @@ - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); @@ - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); @@ - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B");Also applies to: 524-528, 576-579, 604-607
🧹 Nitpick comments (1)
QsNet/Internal/Utils.cs (1)
923-951
: Optional: Decode() applies '+' → space for all formatsConsider adding a Decode option/overload to disable '+' → space when consuming RFC3986 data, or make it Format‑aware to match Encode.
Do we ever call Decode() on RFC3986 inputs from public APIs? If yes, I can draft a minimal, non‑breaking overload (e.g., Decode(str, encoding, treatPlusAsSpace: true)).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (2)
QsNet/Internal/Encoder.cs (3)
MethodImpl
(20-41)MethodImpl
(45-65)MethodImpl
(67-72)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
QsNet/Internal/Utils.cs (2)
673-698
: Good: invalid surrogates encoded as U+FFFDReplacing unpaired surrogates with percent-encoded EF BF BD matches .NET fallback behavior and avoids ill‑formed UTF‑8.
Also applies to: 820-844
388-416
: Good: precomputed ASCII tables and fast scanThe Unreserved*Ascii tables and run‑copy vs escape‑heavy split are clear and should reduce branches and allocations on hot paths.
… RFC1738 and RFC3986
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
QsNet/Internal/Utils.cs (4)
363-373
: Fix '+' safety in Latin‑1: round‑trip loss (Decode maps '+' → space).Treating '+' as safe in Latin‑1 causes lossy round‑trips because Decode() unconditionally converts '+' to space. Encode literal '+' as "%2B" in both 1738 and 3986 Latin‑1 modes by removing '+' from the safe tables and updating the comment. This also aligns Latin‑1 behavior with the UTF‑8 paths here.
- // Legacy Latin-1 safe sets: - // - '+' is safe (NOT encoded) - // - '~' is NOT safe (WILL be encoded) + // Legacy Latin-1 safe sets: + // - '+' is NOT safe (must encode as %2B to avoid '+'→space lossy decode) + // - '~' is NOT safe (WILL be encoded) @@ - private static readonly bool[] Latin1SafeTable3986 = - CreateAsciiTable("+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable3986 = + CreateAsciiTable("-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); @@ - private static readonly bool[] Latin1SafeTable1738 = - CreateAsciiTable("()+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable1738 = + CreateAsciiTable("()-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
539-549
: Latin‑1 RFC3986: numeric‑entity digits must be culture‑invariant.Append(int) is culture‑sensitive; use InvariantCulture to ensure ASCII digits.
- else - { - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - } + else + { + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + }
564-577
: Latin‑1 RFC3986 (escape‑heavy): numeric‑entity digits should be culture‑invariant.- default: - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - break; + default: + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + break;
480-499
: RFC1738 (Latin‑1): output '+' for spaces in escape‑heavy path and use invariant digits.- switch (c) + switch (c) { + case 0x20: + sb.Append('+'); // RFC1738 space + break; case <= 0x7F when asciiSafe[c]: sb.Append((char)c); continue; case <= 0xFF: sb.Append(table[c]); break; default: sb.Append("%26%23"); - sb.Append(c); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); sb.Append("%3B"); break; }
🧹 Nitpick comments (2)
QsNet/Internal/Utils.cs (2)
887-900
: Decode Latin‑1 detection: mirror the Encode check.Encode accepts both CodePage 28591 and WebName "iso-8859-1"; Decode checks only CodePage. Make detection symmetric.
- if (encoding.CodePage == 28591) // ISO-8859-1 (Latin-1) + if (encoding.CodePage == 28591 || + string.Equals(encoding.WebName, "iso-8859-1", StringComparison.OrdinalIgnoreCase)) // ISO-8859-1 (Latin-1)
413-582
: DRY the numeric‑entity emission and RFC1738 space handling.The same logic appears in four loops. Extract tiny helpers (e.g., AppendNumericEntityPercent(sb, code) and AppendAsciiSpace1738(sb)) to reduce divergence risk and improve maintainability.
Also applies to: 585-873
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (1)
QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
…ion for percent-encoded numeric entities
…ring conversion for percent-encoded numeric entities" This reverts commit 4a71214.
… for UTF-8 and Latin-1 modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
QsNet/Internal/Utils.cs (4)
363-373
: Round-trip bug: '+' marked safe in Latin‑1 while Decode() maps '+' → space; encode must emit '%2B'.Leaving '+' unencoded in Latin‑1 causes lossy decoding. Remove '+' from both Latin‑1 safe sets so literal '+' becomes %2B. This also aligns Latin‑1 with UTF‑8 behavior. Regression from earlier fix.
Apply:
- private static readonly bool[] Latin1SafeTable3986 = - CreateAsciiTable("+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable3986 = + CreateAsciiTable("-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); - private static readonly bool[] Latin1SafeTable1738 = - CreateAsciiTable("()+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable1738 = + CreateAsciiTable("()-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
434-517
: RFC1738 (Latin‑1): output '+' for spaces; numeric‑entity digits must be culture‑invariant.Spaces should become '+', and numeric-entity digits must use InvariantCulture. Also fixes regression previously flagged.
Run‑copy path:
- if (c <= 0xFF) - { - sb.Append(table[c]); // %XX for Latin-1 bytes - } - else - { - // For non-Latin1 code units, emit percent-encoded numeric entity: %26%23{code}%3B - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - } + if (c == 0x20) + { + sb.Append('+'); // RFC1738 space + } + else if (c <= 0xFF) + { + sb.Append(table[c]); // %XX for Latin-1 bytes + } + else + { + // Percent-encoded numeric entity: %26%23{code}%3B + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + }Escape‑heavy path:
- switch (c) + switch (c) { + case 0x20: + sb.Append('+'); // RFC1738 space + break; case <= 0x7F when asciiSafe[c]: sb.Append((char)c); continue; case <= 0xFF: sb.Append(table[c]); break; default: - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); break; }
520-597
: Latin‑1 RFC3986: numeric‑entity digits should be InvariantCulture.Avoid culture-dependent digits when emitting “%26%23{code}%3B”.
Run‑copy path:
- sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B");Escape‑heavy path:
- sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B");
604-747
: RFC1738 (UTF‑8): emit '+' for spaces in both fast and escape‑heavy loops.Spaces currently percent‑encode to “%20”. RFC1738 form semantics require '+'. Regression from earlier feedback.
Run‑copy path:
- // fast UTF-8 encode, surrogate-aware - if ((uint)c < 0x80) + // fast UTF-8 encode, surrogate-aware + if (c == 0x20) + { + sb.Append('+'); // RFC1738 space + } + else if ((uint)c < 0x80) { sb.Append(table[c]); }Escape‑heavy path (inside ASCII branch):
- if ((uint)c < 0x80) + if ((uint)c < 0x80) { + if (c == 0x20) + { + sb.Append('+'); // RFC1738 space + continue; + } if (asciiUnreserved[c]) { sb.Append((char)c); continue; }
🧹 Nitpick comments (2)
QsNet/Internal/Utils.cs (2)
906-926
: Decode always converts '+' to space; verify round‑trip with RFC3986 inputs.With the above encode fixes, literal '+' should always be "%2B". Please confirm no call sites rely on keeping '+' literal during decode. If needed, consider a format-aware Decode overload to gate '+'→space.
Would you like me to draft unit tests that assert round‑trip for:
- UTF‑8/RFC3986: "a+b c" → "a%2Bb%20c"
- Latin‑1/RFC1738: "a+b c" → "a%2Bb+c"
?
766-769
: Minor: extract shared capacity logic into a small helper to reduce duplication.The “escapeHeavy ? len*3 : len+16” calc is repeated across branches.
Example:
+ static int PreallocCap(int len, bool escapeHeavy) + => escapeHeavy ? (len >= int.MaxValue / 3 ? int.MaxValue : len * 3) : (len + 16); ... - var cap = escapeHeavy ? len >= int.MaxValue / 3 ? int.MaxValue : len * 3 : len + 16; + var cap = PreallocCap(len, escapeHeavy);Also applies to: 624-626, 538-540, 455-457
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (3)
QsNet/Internal/Encoder.cs (1)
Encode
(98-730)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)QsNet/Models/DecodeOptions.cs (1)
Decode
(203-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
QsNet/Internal/Utils.cs (2)
382-383
: Good fix: unpaired surrogates → percent‑encoded U+FFFD.Using a single precomputed "%EF%BF%BD" avoids ill‑formed UTF‑8. Nice.
407-410
: Non‑scalar early return makes sense for value‑encoder contract.Keeps Utils.Encode focused on scalar values. LGTM.
/// <param name="format">The encoding format to use. Defaults to RFC 3986.</param> | ||
/// <returns>The encoded string.</returns> | ||
/// <remarks> | ||
/// UTF‑8 mode uses precomputed ASCII lookups and a two‑strategy loop (copy runs of safe ASCII or escape‑heavy). | ||
/// Latin‑1 mode preserves legacy behavior: <c>'+'</c> is considered safe; <c>'~'</c> is not. | ||
/// Characters beyond 0xFF are emitted as percent‑encoded numeric entities (e.g., <c>%26%23{code}%3B</c>), | ||
/// which decode back to <c>&#{code};</c>. Use <see cref="InterpretNumericEntities" /> after decoding | ||
/// if you need those entities resolved to Unicode. | ||
/// </remarks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update remarks: Latin‑1 should not treat '+' as safe; RFC1738 uses '+' only for spaces.
Docs currently state “'+' is considered safe” for Latin‑1. Please update to reflect: encode literal '+' as "%2B"; only spaces become '+' under RFC1738.
- /// Latin‑1 mode preserves legacy behavior: <c>'+'</c> is considered safe; <c>'~'</c> is not.
+ /// Latin‑1 mode: literal <c>'+'</c> is NOT safe (encoded as <c>%2B</c> to preserve Decode’s '+'→space mapping);
+ /// <c>'~'</c> is not safe; under RFC1738, ASCII space is emitted as <c>'+'</c>.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// <param name="format">The encoding format to use. Defaults to RFC 3986.</param> | |
/// <returns>The encoded string.</returns> | |
/// <remarks> | |
/// UTF‑8 mode uses precomputed ASCII lookups and a two‑strategy loop (copy runs of safe ASCII or escape‑heavy). | |
/// Latin‑1 mode preserves legacy behavior: <c>'+'</c> is considered safe; <c>'~'</c> is not. | |
/// Characters beyond 0xFF are emitted as percent‑encoded numeric entities (e.g., <c>%26%23{code}%3B</c>), | |
/// which decode back to <c>&#{code};</c>. Use <see cref="InterpretNumericEntities" /> after decoding | |
/// if you need those entities resolved to Unicode. | |
/// </remarks> | |
/// <param name="format">The encoding format to use. Defaults to RFC 3986.</param> | |
/// <returns>The encoded string.</returns> | |
/// <remarks> | |
/// UTF-8 mode uses precomputed ASCII lookups and a two-strategy loop (copy runs of safe ASCII or escape-heavy). | |
/// Latin-1 mode: literal <c>'+'</c> is NOT safe (encoded as <c>%2B</c> to preserve Decode’s '+'→space mapping); | |
/// <c>'~'</c> is not safe; under RFC1738, ASCII space is emitted as <c>'+'</c>. | |
/// Characters beyond 0xFF are emitted as percent-encoded numeric entities (e.g., <c>%26%23{code}%3B</c>), | |
/// which decode back to <c>&#{code};</c>. Use <see cref="InterpretNumericEntities" /> after decoding | |
/// if you need those entities resolved to Unicode. | |
/// </remarks> |
🤖 Prompt for AI Agents
In QsNet/Internal/Utils.cs around lines 392 to 400, update the XML remarks:
change the statement that Latin‑1 mode treats '+' as safe to reflect RFC1738
rules—literal '+' must be percent‑encoded as "%2B" and only spaces are encoded
as '+' under RFC1738; adjust the wording accordingly (and keep the rest of the
paragraph about '~' and numeric entity behavior unchanged).
…r and performance considerations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
QsNet/Internal/Encoder.cs (2)
279-281
: Consider extracting dot-encoding logic to a helper method.The dot-encoding logic is repeated multiple times throughout the code with platform-specific branching. Consider extracting this into a helper method to reduce duplication and improve maintainability.
Add a helper method at the top of the class:
+ [MethodImpl(MethodImplOptions.AggressiveInlining)] + private static string EncodeDotInKey(string key) + { +#if NETSTANDARD2_0 + return key.IndexOf('.') >= 0 ? key.Replace(".", "%2E") : key; +#else + return key.Contains('.', StringComparison.Ordinal) + ? key.Replace(".", "%2E", StringComparison.Ordinal) + : key; +#endif + }Then replace all occurrences with:
-#if NETSTANDARD2_0 - var encodedPrefixFast = encodeDotInKeys && keyPrefixStr.IndexOf('.') >= 0 - ? keyPrefixStr.Replace(".", "%2E") - : keyPrefixStr; -#else - var encodedPrefixFast = encodeDotInKeys && keyPrefixStr.Contains('.', StringComparison.Ordinal) - ? keyPrefixStr.Replace(".", "%2E", StringComparison.Ordinal) - : keyPrefixStr; -#endif + var encodedPrefixFast = encodeDotInKeys ? EncodeDotInKey(keyPrefixStr) : keyPrefixStr;Also applies to: 284-286, 311-312, 314-316, 415-417, 420-422, 582-584, 587-589, 738-739, 741-742
49-70
: Consider using switch expressions for numeric types.The
ToInvariantString
method could be slightly more concise using pattern matching for numeric types.- sbyte v => v.ToString(CultureInfo.InvariantCulture), - byte v => v.ToString(CultureInfo.InvariantCulture), - short v => v.ToString(CultureInfo.InvariantCulture), - ushort v => v.ToString(CultureInfo.InvariantCulture), - int v => v.ToString(CultureInfo.InvariantCulture), - uint v => v.ToString(CultureInfo.InvariantCulture), - long v => v.ToString(CultureInfo.InvariantCulture), - ulong v => v.ToString(CultureInfo.InvariantCulture), - float v => v.ToString(CultureInfo.InvariantCulture), - double v => v.ToString(CultureInfo.InvariantCulture), - decimal v => v.ToString(CultureInfo.InvariantCulture), + IFormattable f when f is sbyte or byte or short or ushort or int or uint + or long or ulong or float or double or decimal + => f.ToString(null, CultureInfo.InvariantCulture),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Encoder.cs
(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Encoder.cs (5)
QsNet/Enums/Format.cs (1)
Formatter
(38-46)QsNet/Internal/SideChannelFrame.cs (3)
MethodImpl
(16-27)MethodImpl
(29-36)SideChannelFrame
(11-37)QsNet/Internal/Utils.cs (4)
Utils
(20-1622)Encode
(401-889)IsNonNullishPrimitive
(1151-1162)List
(1088-1116)QsNet/Qs.cs (1)
Encode
(126-279)QsNet/Models/Filter.cs (2)
IterableFilter
(36-51)IterableFilter
(42-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (7)
QsNet/Internal/Encoder.cs (7)
182-183
: LGTM! Good optimization for value comparison.Using
==
operator comparison instead ofReferenceEquals
for the generator check is more robust and aligns well with value-based equality semantics.
187-188
: Good encapsulation of dot-encoding logic.The
dotsAndEncode
flag cleanly couples theallowDots
andencodeDotInKeys
conditions, making the intent clear and avoiding repeated compound checks throughout the code.
196-213
: LGTM! Smart optimization for cycle detection.The
IsLeaf
check effectively skips cycle detection for leaf values that cannot participate in cycles. This is a nice performance optimization that avoids unnecessary traversal for primitives and strings.
73-97
: LGTM! Well-documented comma encoding helper.The
AppendCommaEncodedValue
method correctly preserves the semantics of comma separators in list format by ensuring only commas inside values are percent-encoded as "%2C", while separator commas remain literal. The extensive documentation makes the intent crystal clear.
276-407
: Excellent fast path implementation for unsorted iteration.Fast path #1 elegantly avoids building
objKeys
and directly iterates over common container types. The implementation correctly:
- Handles dot-encoding consistently via
dotsAndEncode
- Marks side-channel once per parent (fast path #5)
- Uses
IsLeaf
to optimize child side-channel allocation- Supports all major collection types with optimized paths
This should significantly reduce allocations and improve performance for the common case.
409-487
: Well-optimized comma-joined arrays fast path.Fast path #2 correctly handles comma-joined arrays with proper semantics:
- Materializes the list once for efficient access
- Uses
AppendCommaEncodedValue
to properly encode elements while keeping separator commas literal- Handles both encoded and non-encoded value paths correctly
- Preserves legacy null/empty semantics
The StringBuilder approach for encoded values is a good optimization over repeated string concatenations.
604-607
: Good optimization: hoisting encoder decision out of the loop.Fast path #4 correctly identifies that the child encoder decision doesn't change per iteration and hoists it out of the loop. The
commaElementsAlreadyEncoded
flag properly tracks when elements were already encoded in the comma path to avoid double-encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
QsNet/Internal/Utils.cs (5)
363-373
: Do not treat '+' as safe in Latin‑1; it breaks round‑trip (Decode maps '+' → space).This regresses the previously fixed behavior. Encode literal '+' as %2B in both Latin‑1 tables. Also update the comments accordingly.
- // Legacy Latin-1 safe sets: - // - '+' is safe (NOT encoded) - // - '~' is NOT safe (WILL be encoded) + // Legacy Latin-1 safe sets: + // - '+' is NOT safe (must encode as %2B to avoid '+'→space loss on Decode) + // - '~' is NOT safe (WILL be encoded) @@ - private static readonly bool[] Latin1SafeTable3986 = - CreateAsciiTable("+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable3986 = + CreateAsciiTable("-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); @@ - private static readonly bool[] Latin1SafeTable1738 = - CreateAsciiTable("()+-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + private static readonly bool[] Latin1SafeTable1738 = + CreateAsciiTable("()-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
389-400
: Doc remark is inaccurate: '+' is not safe in Latin‑1; only spaces become '+' under RFC1738.Please align the XML docs with the intended behavior.
- /// Latin‑1 mode preserves legacy behavior: <c>'+'</c> is considered safe; <c>'~'</c> is not. + /// Latin‑1 mode: literal <c>'+'</c> is NOT safe (encoded as <c>%2B</c> so Decode’s '+'→space mapping is lossless); + /// <c>'~'</c> is not; under RFC1738, ASCII space is emitted as <c>'+'</c>.
503-517
: RFC1738 (Latin‑1 escape‑heavy): emit '+' for spaces and use InvariantCulture for numeric entities.Currently spaces become "%20" and numeric-entity digits are culture-sensitive.
switch (c) { + case 0x20: + sb.Append('+'); // RFC1738 space + break; case <= 0x7F when asciiSafe[c]: sb.Append((char)c); continue; case <= 0xFF: sb.Append(table[c]); break; default: sb.Append("%26%23"); - sb.Append(c); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); sb.Append("%3B"); break; }
558-567
: Latin‑1 RFC3986 run‑copy: numeric‑entity digits must be culture‑invariant.- else - { - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - } + else + { + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + }
585-596
: Latin‑1 RFC3986 escape‑heavy: numeric‑entity digits must be culture‑invariant.- default: - sb.Append("%26%23"); - sb.Append(c); - sb.Append("%3B"); - break; + default: + sb.Append("%26%23"); + sb.Append(c.ToString(CultureInfo.InvariantCulture)); + sb.Append("%3B"); + break;
🧹 Nitpick comments (3)
QsNet/Internal/Utils.cs (3)
524-531
: Comment says '+' safe, but we must encode '+' as %2B.Remove/adjust this comment to avoid reintroducing the bug.
- // Legacy behavior: in Latin-1 mode, treat '+' as safe (do not percent-encode) - // RFC3986 path (no parentheses allowed) + // RFC3986 path (no parentheses allowed). In Latin‑1, '+' is NOT safe and must be encoded as %2B.
916-944
: Verify Decode’s unconditional '+'→space doesn’t surprise RFC3986 callers.Given Encode now percent‑encodes literal '+', this should be safe, but please add tests covering:
- "a+b" ↔ "a%2Bb"
- "a b" (RFC1738) ↔ "a+b"
I can add focused unit tests in the benchmarks or main test suite to lock this in. Want me to draft them?
1281-1293
: Optional: avoid producing invalid UTF‑16 when interpreting numeric entities in the surrogate range.Replace unpaired surrogate code points with U+FFFD to prevent ill‑formed strings.
- switch (code) - { - case <= 0xFFFF: - sb.Append((char)code); - break; + switch (code) + { + case <= 0xFFFF: + // Avoid emitting lone surrogate code units + if (code is >= 0xD800 and <= 0xDFFF) + sb.Append('\uFFFD'); + else + sb.Append((char)code); + break;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
QsNet/Internal/Utils.cs
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QsNet/Internal/Utils.cs (3)
QsNet/Internal/Encoder.cs (1)
Encode
(154-789)QsNet/Qs.cs (1)
Encode
(126-279)QsNet/Constants/HexTable.cs (1)
HexTable
(6-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Consumer smoke (.NET Framework 4.6.1 & 4.8.1 on Windows)
- GitHub Check: Consumer smoke (net5.0; net6.0; net7.0; net8.0; net9.0)
- GitHub Check: Codacy Security Scan
- GitHub Check: Analyze (csharp)
🔇 Additional comments (3)
QsNet/Internal/Utils.cs (3)
474-485
: RFC1738 (Latin‑1 run‑copy): space → '+' handling looks good.
646-651
: UTF‑8 RFC1738 run‑copy: correct '+' for space handling.
706-721
: UTF‑8 RFC1738 escape‑heavy: correct '+' for space handling and unreserved fast‑path.
This pull request introduces a new benchmark class to the
QsNet.Benchmarks
project for measuring the performance of theQs.Encode
method under various scenarios. The new benchmarks allow testing with different data sizes, value shapes, escaping needs, and encoding options, providing comprehensive coverage for the encoder's behavior.Benchmarking enhancements
EncodeBenchmarks
class tobenchmarks/QsNet.Benchmarks/EncodeBenchmarks.cs
, utilizing BenchmarkDotNet to measure encoding performance with configurable parameters for data size, value length, escaping percentage, list formatting, value encoding, and dot-handling options.Setup
method to create diverse test data, including lists, nested dictionaries with dotted keys, dates, and booleans, to thoroughly exercise the encoder.DotMode
enum and related benchmark parameters to test the effects of dot-handling and dot-encoding options on performance.Encode_Public
benchmark method to measure the performance of the publicQs.Encode
API with the generated data and selected options.Big picture
CommaLists=False
and/orEncodeValuesOnly=True
.Gen1
/Gen2
almost always lower or zero. A fewCommaLists=True
&EncodeValuesOnly=False
cases remain high-alloc.Representative cases (Mean time & alloc per op)
Comma=True
/EncodeValuesOnly=True
/Dots=None
4.876 μs → 2.684 μs (-45%, 1.82× faster), 8.20 KB → 6.09 KB (-25.7%)
Comma=True
/EncodeValuesOnly=True
/AllowDots
4.631 μs → 2.794 μs (-39.7%, 1.66×)
Comma=True
/EncodeValuesOnly=False
/AllowDots
4.745 μs → 3.129 μs (-34.1%, 1.52×)
Comma=True
/EncodeValuesOnly=False
/Dots=None
5.036 μs → 3.432 μs (-31.9%, 1.47×)
Comma=True
/EncodeValuesOnly=True
/Dots=None
9.622 μs → 4.340 μs (-54.9%, 2.22×), 30.88 KB → 17.53 KB (-43.2%)
Comma=True
/EncodeValuesOnly=True
/Dots=None
17.790 μs → 8.871 μs (-50.1%, 2.01×), 92.32 KB → 59.39 KB (-35.7%)
Comma=True
/EncodeValuesOnly=True
/Dots=None
57.491 μs → 19.703 μs (-65.7%, 2.92×), 257.65 KB → 131.79 KB (-48.8%)
Comma=True
/EncodeValuesOnly=True
/Dots=None
144.018 μs → 60.539 μs (-58.0%, 2.38×), 867.55 KB → 508.16 KB (-41.4%)
Comma=True
/EncodeValuesOnly=True
/Dots=None
413.890 μs → 276.463 μs (-33.2%, 1.50×), 1707.79 KB → 1295.30 KB (-24.2%)
Notable regression (time)
Comma=True
/EncodeValuesOnly=False
/Dots=None
230.543 μs → 257.340 μs (+11.6% slower).
(Alloc actually down a hair: 1355.48 KB → 1346.35 KB, -0.7%.)
The sibling variants here are flat or slightly better:
AllowDots
: 234.976 → 230.027 μs (-2.1%)AllowDotsAndEncode
: 240.866 → 240.766 μs (~equal)Takeaways
Comma=True
,EncodeValuesOnly=True
) now run in ~2.7–3.0 μs vs ~4.6–4.9 μs before.Count=100
) generally see ~45–55% speedups; large batches (Count=1000) withEncodeValuesOnly=True
see ~2–3× speedups and ~25–50% fewer allocations.Count=1000
,ValueLen=40
,NeedsEsc=50
,Comma=True
,EncodeValuesOnly=False
,Dots=None
(+11.6% time). Everything adjacent is OK.Summary by CodeRabbit
New Features
Refactor
Chores