Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces recursive encode/merge implementations with iterative, frame-based state machines; clarifies Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Encoder as Encoder
participant FrameStack as FrameStack
participant CycleState as CycleState
Caller->>Encoder: encode(value, options)
Encoder->>FrameStack: push root _EncodeFrame
loop while FrameStack not empty
FrameStack->>Encoder: provide current frame
Encoder->>CycleState: bootstrap/check ancestry
alt primitive fast-path
Encoder->>FrameStack: emit token -> parent accumulation
else container
Encoder->>FrameStack: iterate keys/indices -> push child _EncodeFrame
FrameStack->>CycleState: push ancestry entry for child
end
Encoder->>FrameStack: compose child results into parent frame
FrameStack->>CycleState: pop ancestry entry when child done
end
Encoder->>Caller: return final query string
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #42 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 17 17
Lines 1247 1436 +189
==========================================
+ Hits 1247 1436 +189 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR significantly improves deep encode performance by converting recursive traversal to iterative stack-based implementations, addressing stack overflow issues and dramatically improving performance for deeply nested inputs.
Changes:
- Reworked encoder traversal (
_encode) to use iterative frames with explicit state machine phases - Reworked
Utils.mergeto use iterative frames to prevent stack overflow on deep conflicting merges - Optimized cycle detection from O(n) parent-chain scans to O(1) ancestry state lookups
- Updated
max_depthsemantics:Noneis now unbounded, explicit values enforced directly - Added comprehensive deep nesting regression tests at depth 12,000
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/qs_codec/encode.py |
Converted recursive _encode to iterative state machine with _EncodeFrame, added O(1) cycle detection via _CycleState, updated _get_max_encode_depth to return sys.maxsize when unbounded |
src/qs_codec/utils/utils.py |
Converted recursive merge to iterative state machine with _MergeFrame, preserves legacy key collision semantics by capturing existing keys before processing source |
src/qs_codec/models/encode_options.py |
Updated max_depth docstring to reflect new unbounded semantics when None |
tests/unit/encode_test.py |
Added deep nesting test (12K depth), updated recursion limit test, added cycle state compatibility tests, imported new internal APIs |
tests/unit/utils_test.py |
Added deep merge test (12K depth), added tests for key collision semantics preservation |
tests/unit/decode_test.py |
Added deep conflicting merge test (12K depth) |
README.rst |
Updated max_depth documentation to reflect unbounded semantics |
docs/README.rst |
Updated max_depth documentation to reflect unbounded semantics |
CHANGELOG.md |
Added comprehensive entry documenting fixes, optimizations, and behavioral changes |
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/encode_test.py (1)
875-889:⚠️ Potential issue | 🟡 Minor
monkeypatch.setattrofgetrecursionlimitis a dead no-op and misleads readers
_get_max_encode_depth(encode.py lines 168–171) no longer callssys.getrecursionlimit(), so patching it to return16has zero effect on the test. The test passes identically without the monkeypatch block, which risks misleading future readers into thinking the function still readsgetrecursionlimitsomewhere.Consider either removing the monkeypatch entirely (with a comment explaining why) or strengthening the test to directly assert the uncapped value:
🧪 Proposed stronger assertion
- limit = 16 - monkeypatch.setattr(encode_module.sys, "getrecursionlimit", lambda: limit) - data: t.Dict[str, t.Any] = {} current = data for _ in range(5): nxt: t.Dict[str, t.Any] = {} current["a"] = nxt current = nxt current["leaf"] = "x" + # _get_max_encode_depth no longer caps to sys.getrecursionlimit(); + # it returns the explicit value unchanged. + assert encode_module._get_max_encode_depth(10_000) == 10_000 with does_not_raise(): result = encode(data, options=EncodeOptions(max_depth=10_000, encode=False)) assert result.endswith("=x")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/encode_test.py` around lines 875 - 889, The test currently monkeypatches sys.getrecursionlimit but _get_max_encode_depth no longer reads that, so remove the misleading monkeypatch.setattr call (the lambda returning 16) and either add a short comment explaining it’s unnecessary or replace the monkeypatch with an explicit assertion that the encoder uses the provided max_depth (call encode(data, options=EncodeOptions(max_depth=10_000, encode=False)) and assert the result endswith("=x") as before) to demonstrate uncapped behavior; refer to the functions/_symbols _get_max_encode_depth, encode, and EncodeOptions to locate and update the test.
🧹 Nitpick comments (3)
src/qs_codec/utils/utils.py (1)
282-283: Ruff TRY003: consider suppressing or extracting inlineRuntimeErrormessagesBoth guards are behind
# pragma: no coverinternal invariants that should never fire in production. Ruff TRY003 flags them because the message is inlined rather than in a custom exception class. The simplest fix is a per-line suppression; alternatively, define a private internal exception.♻️ Minimal suppression approach
- if merge_target is None: # pragma: no cover - internal invariant - raise RuntimeError("merge target is not initialized") + if merge_target is None: # pragma: no cover - internal invariant + raise RuntimeError("merge target is not initialized") # noqa: TRY003(Apply the same suppression to line 313.)
Also applies to: 312-313
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/utils/utils.py` around lines 282 - 283, The inline RuntimeError raise in the guard "if merge_target is None: raise RuntimeError('merge target is not initialized')" is flagged by Ruff TRY003; suppress it by adding a per-line Ruff noqa comment to the raise (e.g. append the appropriate "# noqa: TRY003" suppression to that raise statement) and apply the same suppression to the other identical guard further down (the other "merge_target is None" RuntimeError raise). Ensure you only add the noqa suppression to those raise lines and do not change exception types or surrounding logic.src/qs_codec/encode.py (2)
568-572: Ruff BLE001: broadexcept Exceptionis intentional but could be narrowedThe catch is deliberately wide to handle arbitrary
__getitem__implementations that raise unexpected exceptions (tested bytest_exception_in_getitemandtest_encode_handles_mapping_get_exception). If narrowing is acceptable, usingexcept (AttributeError, TypeError, KeyError, LookupError, RuntimeError)would still cover all known test cases while satisfying Ruff; alternatively, add# noqa: BLE001with a comment explaining the intent.♻️ Suppression approach
- except Exception: # pylint: disable=W0718 + except Exception: # noqa: BLE001 # broad catch is intentional: caller __getitem__ may raise anything🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 568 - 572, The broad except in the block that assigns _value from frame.obj[_key] (used in the encoding routine where _value_undefined is set) should be changed to avoid BLE001: replace the bare "except Exception:" with a narrowed tuple covering expected failures, e.g. "except (AttributeError, TypeError, KeyError, LookupError, RuntimeError):" to preserve the tested behavior (see tests test_exception_in_getitem and test_encode_handles_mapping_get_exception) and keep the intent clear; if you intentionally want to catch any Exception, instead add "# noqa: BLE001" on that except line with a short comment explaining that arbitrary __getitem__ implementations may raise unexpected exceptions and the broad catch is deliberate.
590-592: O(total_nodes)WeakKeyDictionaryallocations for backward-compat side-channel chainA new
WeakKeyDictionaryis created for every child key processed in the iterate phase. For a tree with N total nodes, this produces N WKD instances — e.g., 12,000 for the depth-12,000 regression test. Each WKD is a Python dict under the hood, so this is non-trivial allocation even though the objects are small.These WKDs exist purely to maintain the legacy side-channel chain structure for callers that call
_encodedirectly and inspect the chain (e.g., the cycle-detection unit tests). The new_CycleStatepath handles all actual cycle detection without them.A follow-up optimization could skip constructing
value_side_channelentirely on the iterative path and only build it when_encodeis invoked directly (detectable via an explicit flag or by checking thatcycle_statewas bootstrapped from a non-empty chain). This would further reduce allocation pressure on deep trees without breaking any public contracts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 590 - 592, The per-child allocation of a new WeakKeyDictionary (value_side_channel) to thread the parent side-channel via _sentinel is causing O(total_nodes) allocations; change this to lazily create or reuse the side-channel mapping only when callers actually invoke _encode directly or when cycle detection needs the legacy chain: in the iterative path (where frame.side_channel is propagated) avoid instantiating value_side_channel at each child and instead defer creation until either an explicit new parameter/flag (e.g., force_legacy_side_channel) is set or when the incoming cycle_state/_CycleState indicates a non-empty legacy chain; update the code that currently does value_side_channel: WeakKeyDictionary = WeakKeyDictionary(); value_side_channel[_sentinel] = frame.side_channel to first check the condition and only then allocate and set _sentinel, leaving existing behavior for callers of _encode and for cycle detection intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/qs_codec/encode.py`:
- Around line 168-171: Update the repository coding guideline and the CR
learning note to reflect the new behavior of max_depth: replace any assertion
that "max_depth is capped to the current recursion limit" with the new text
"`max_depth=None` results in unbounded depth (sys.maxsize); explicit values are
used directly without recursion-limit capping." Ensure references to the
function _get_max_encode_depth and the EncodeOptions.max_depth docstring are
consistent with this wording, and update any files matching src/qs_codec/**/*.py
and other **/*.py where the old guideline text appears.
In `@src/qs_codec/models/encode_options.py`:
- Around line 120-124: The docstring and behavior for max_depth must enforce the
project cap to the Python recursion limit: update the description for max_depth
in encode_options.py to state that None means "use sys.getrecursionlimit()" and
that any explicit value is capped to that recursion limit; implement the cap
where the option is normalized (e.g., EncodeOptions.__post_init__ or the code
path that reads encode_options.max_depth) by replacing the stored value with
min(provided_value, sys.getrecursionlimit()) (or setting it to
sys.getrecursionlimit() when None), and adjust any tests that assume an uncapped
None accordingly.
---
Outside diff comments:
In `@tests/unit/encode_test.py`:
- Around line 875-889: The test currently monkeypatches sys.getrecursionlimit
but _get_max_encode_depth no longer reads that, so remove the misleading
monkeypatch.setattr call (the lambda returning 16) and either add a short
comment explaining it’s unnecessary or replace the monkeypatch with an explicit
assertion that the encoder uses the provided max_depth (call encode(data,
options=EncodeOptions(max_depth=10_000, encode=False)) and assert the result
endswith("=x") as before) to demonstrate uncapped behavior; refer to the
functions/_symbols _get_max_encode_depth, encode, and EncodeOptions to locate
and update the test.
---
Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 568-572: The broad except in the block that assigns _value from
frame.obj[_key] (used in the encoding routine where _value_undefined is set)
should be changed to avoid BLE001: replace the bare "except Exception:" with a
narrowed tuple covering expected failures, e.g. "except (AttributeError,
TypeError, KeyError, LookupError, RuntimeError):" to preserve the tested
behavior (see tests test_exception_in_getitem and
test_encode_handles_mapping_get_exception) and keep the intent clear; if you
intentionally want to catch any Exception, instead add "# noqa: BLE001" on that
except line with a short comment explaining that arbitrary __getitem__
implementations may raise unexpected exceptions and the broad catch is
deliberate.
- Around line 590-592: The per-child allocation of a new WeakKeyDictionary
(value_side_channel) to thread the parent side-channel via _sentinel is causing
O(total_nodes) allocations; change this to lazily create or reuse the
side-channel mapping only when callers actually invoke _encode directly or when
cycle detection needs the legacy chain: in the iterative path (where
frame.side_channel is propagated) avoid instantiating value_side_channel at each
child and instead defer creation until either an explicit new parameter/flag
(e.g., force_legacy_side_channel) is set or when the incoming
cycle_state/_CycleState indicates a non-empty legacy chain; update the code that
currently does value_side_channel: WeakKeyDictionary = WeakKeyDictionary();
value_side_channel[_sentinel] = frame.side_channel to first check the condition
and only then allocate and set _sentinel, leaving existing behavior for callers
of _encode and for cycle detection intact.
In `@src/qs_codec/utils/utils.py`:
- Around line 282-283: The inline RuntimeError raise in the guard "if
merge_target is None: raise RuntimeError('merge target is not initialized')" is
flagged by Ruff TRY003; suppress it by adding a per-line Ruff noqa comment to
the raise (e.g. append the appropriate "# noqa: TRY003" suppression to that
raise statement) and apply the same suppression to the other identical guard
further down (the other "merge_target is None" RuntimeError raise). Ensure you
only add the noqa suppression to those raise lines and do not change exception
types or surrounding logic.
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/copilot-instructions.md (1)
16-16: LGTM — accurate retirement of the recursion-limit cap.The updated invariant correctly reflects the iterative implementation:
max_depth=None → sys.maxsize(unbounded) and explicit values enforced directly, with no recursion-limit margin needed. The"For encoding"qualifier properly scopes this toEncodeOptionsso agents don't misapply it to decode-side depth semantics.One optional improvement for symmetry with the cycle-detection note on line 15: you could append the expected error message so agents know to preserve it — e.g., "exceeding an explicit
max_depthraisesValueError('Maximum encoding depth exceeded')." It's already documented inencode_options.pyand the README, so this is purely a completeness nit.📝 Optional: add the error-message note inline for agent completeness
- Depth, list, and parameter limits are security/safety features: respect `depth`, `max_depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert. For encoding, `max_depth=None` means unbounded traversal (`sys.maxsize`); explicit values are enforced directly without recursion-limit capping. + Depth, list, and parameter limits are security/safety features: respect `depth`, `max_depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert. For encoding, `max_depth=None` means unbounded traversal (`sys.maxsize`); explicit values are enforced directly without recursion-limit capping—exceeding an explicit `max_depth` must raise `ValueError("Maximum encoding depth exceeded")`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md at line 16, Update the EncodeOptions documentation line to also state the exact error raised when an explicit max_depth is exceeded: mention that exceeding an explicit `max_depth` raises ValueError('Maximum encoding depth exceeded'); locate the wording around `EncodeOptions`, `max_depth`, `sys.maxsize` in the `.github/copilot-instructions.md` diff and mirror the message used in `encode_options.py` to keep docs and implementation consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/copilot-instructions.md:
- Line 16: Update the EncodeOptions documentation line to also state the exact
error raised when an explicit max_depth is exceeded: mention that exceeding an
explicit `max_depth` raises ValueError('Maximum encoding depth exceeded');
locate the wording around `EncodeOptions`, `max_depth`, `sys.maxsize` in the
`.github/copilot-instructions.md` diff and mirror the message used in
`encode_options.py` to keep docs and implementation consistent.
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/qs_codec/utils/utils.py (1)
164-182: List mutation is intentional per guidelines but worth a brief inline note.At Line 177, when
current_targetis alist(not a tuple), the code extends it in place. This is consistent with the project guideline thatUtils.mergemay reuse targets for performance. A one-line comment like# Mutates in-place; see coding guidelineswould help future readers who aren't familiar with this contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/utils/utils.py` around lines 164 - 182, Add a brief inline comment clarifying intentional in-place mutation: in the merge logic where current_target is a list (inside Utils.merge or the surrounding function handling frame.list_target/frame.list_source), annotate the branch that calls mutable_target.extend(...) with a one-line comment such as "# Mutates in-place per utils.merge guidelines" to signal this is by design and not a bug; place the comment immediately above the mutable_target.extend(...) call so future readers understand the reuse-for-performance contract.src/qs_codec/encode.py (3)
468-502: Inconsistentisvs==for generator identity checks (cosmetic).Lines 412 and 603 use
isto compareframe.generate_array_prefixagainstListFormat.COMMA.generator, while Lines 470 and 510 use==. For plain function objects both are equivalent, but mixing styles hurts readability. Consider standardising onisthroughout, since you're checking object identity of an enum-derived function.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 468 - 502, Standardize identity checks by replacing equality comparisons that compare frame.generate_array_prefix to ListFormat.COMMA.generator with identity checks: wherever you see "frame.generate_array_prefix == ListFormat.COMMA.generator" (e.g., in the encode logic that determines comma-mode array handling), change it to "frame.generate_array_prefix is ListFormat.COMMA.generator" so all generator comparisons use "is" consistently for the enum-derived generator function.
391-457: WeakWrapper is allocated for primitives before the fast-path exit.
WeakWrapper(frame.value)at Line 395 creates a proxy + wrapper for every node, including scalars and bytes that immediately exit via the fast path at Lines 434-457. For wide, shallow structures this adds unnecessary allocation overhead.Moving the
WeakWrappercreation (and the cycle-detection block) below the primitive/bytes fast path—or guarding it with anisinstancecontainer check—would avoid the extra work for leaf values.Suggested reordering sketch
# Work with the original; we never mutate in place. obj: t.Any = frame.value - # --- Cycle detection via ancestry lookup state -------------------------------- - obj_wrapper: WeakWrapper = WeakWrapper(frame.value) - if frame.cycle_state is None or frame.cycle_level is None: - frame.cycle_state, frame.cycle_level = _bootstrap_cycle_state_from_side_channel(frame.side_channel) - step = _compute_step_and_check_cycle(frame.cycle_state, obj_wrapper, frame.cycle_level) - - frame.obj_wrapper = obj_wrapper - frame.step = step - # --- Pre-processing: filter & datetime handling ------------------------------- filter_opt = frame.filter_ ... + # --- Fast path for primitives/bytes ----------------------------------------- + if Utils.is_non_nullish_primitive(obj, frame.skip_nulls) or isinstance(obj, bytes): + ... # (unchanged, pop + continue) + + # --- Cycle detection (containers only) ------------------------------------ + obj_wrapper: WeakWrapper = WeakWrapper(obj) + if frame.cycle_state is None or frame.cycle_level is None: + frame.cycle_state, frame.cycle_level = _bootstrap_cycle_state_from_side_channel(frame.side_channel) + step = _compute_step_and_check_cycle(frame.cycle_state, obj_wrapper, frame.cycle_level) + + frame.obj_wrapper = obj_wrapper + frame.step = step🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 391 - 457, The WeakWrapper is being created for every node (WeakWrapper(frame.value)) before we hit the primitive/bytes fast-path in encode.py, causing unnecessary allocations for leaf values; move the WeakWrapper creation and the cycle-detection calls (_bootstrap_cycle_state_from_side_channel and _compute_step_and_check_cycle that set frame.cycle_state/frame.cycle_level/frame.obj_wrapper/frame.step) to after the fast-path check (i.e., after the Utils.is_non_nullish_primitive(...) or isinstance(obj, bytes) branch) or wrap that block in a guard that only runs for container types (list/tuple/dict/set/custom iterables) so primitives/bytes exit immediately without allocating WeakWrapper or running cycle logic. Ensure you update any later uses that expect frame.obj_wrapper/step to be set (compute them lazily when you first need them).
174-209: Consider splitting constant options from per-frame mutable state.The
_EncodeFramedataclass carries ~15 option fields that are identical across every frame in a traversal (e.g.,encoder,serialize_date,sort,filter_,formatter,charset,strict_null_handling, …). For depth-12,000 nesting, that's 12,000 copies of the same references.Extracting the immutable option fields into a single shared context object (passed by reference to each frame) would cut per-frame allocation roughly in half and improve cache locality. Not urgent given the reported perf wins, but worth keeping in mind if memory becomes a concern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 174 - 209, The _EncodeFrame dataclass currently stores many immutable option fields per frame; extract those constant fields into a new shared dataclass (e.g., EncodeOptions) and replace the duplicated fields in _EncodeFrame with a single reference like options: EncodeOptions; move fields such as encoder, serialize_date, sort, filter_, formatter, format, generate_array_prefix, charset, comma_round_trip, comma_compact_nulls, allow_empty_lists, strict_null_handling, skip_nulls, encode_dot_in_keys, allow_dots, encode_values_only, add_query_prefix, and any other config-like attributes into EncodeOptions; update all constructors/places that instantiate _EncodeFrame to supply the shared options instance and adjust usages in functions that read those fields to access frame.options.<field> instead of frame.<field>, ensuring behavior remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 468-502: Standardize identity checks by replacing equality
comparisons that compare frame.generate_array_prefix to
ListFormat.COMMA.generator with identity checks: wherever you see
"frame.generate_array_prefix == ListFormat.COMMA.generator" (e.g., in the encode
logic that determines comma-mode array handling), change it to
"frame.generate_array_prefix is ListFormat.COMMA.generator" so all generator
comparisons use "is" consistently for the enum-derived generator function.
- Around line 391-457: The WeakWrapper is being created for every node
(WeakWrapper(frame.value)) before we hit the primitive/bytes fast-path in
encode.py, causing unnecessary allocations for leaf values; move the WeakWrapper
creation and the cycle-detection calls (_bootstrap_cycle_state_from_side_channel
and _compute_step_and_check_cycle that set
frame.cycle_state/frame.cycle_level/frame.obj_wrapper/frame.step) to after the
fast-path check (i.e., after the Utils.is_non_nullish_primitive(...) or
isinstance(obj, bytes) branch) or wrap that block in a guard that only runs for
container types (list/tuple/dict/set/custom iterables) so primitives/bytes exit
immediately without allocating WeakWrapper or running cycle logic. Ensure you
update any later uses that expect frame.obj_wrapper/step to be set (compute them
lazily when you first need them).
- Around line 174-209: The _EncodeFrame dataclass currently stores many
immutable option fields per frame; extract those constant fields into a new
shared dataclass (e.g., EncodeOptions) and replace the duplicated fields in
_EncodeFrame with a single reference like options: EncodeOptions; move fields
such as encoder, serialize_date, sort, filter_, formatter, format,
generate_array_prefix, charset, comma_round_trip, comma_compact_nulls,
allow_empty_lists, strict_null_handling, skip_nulls, encode_dot_in_keys,
allow_dots, encode_values_only, add_query_prefix, and any other config-like
attributes into EncodeOptions; update all constructors/places that instantiate
_EncodeFrame to supply the shared options instance and adjust usages in
functions that read those fields to access frame.options.<field> instead of
frame.<field>, ensuring behavior remains identical.
In `@src/qs_codec/utils/utils.py`:
- Around line 164-182: Add a brief inline comment clarifying intentional
in-place mutation: in the merge logic where current_target is a list (inside
Utils.merge or the surrounding function handling
frame.list_target/frame.list_source), annotate the branch that calls
mutable_target.extend(...) with a one-line comment such as "# Mutates in-place
per utils.merge guidelines" to signal this is by design and not a bug; place the
comment immediately above the mutable_target.extend(...) call so future readers
understand the reuse-for-performance contract.
|
@coderabbitai resume |
|
@codex review |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/qs_codec/encode.py`:
- Around line 168-171: The comment/guideline about capping max_depth to the
recursion limit is now stale given the iterative implementation; update any
documentation and inline comments referring to that behavior and ensure they
reflect that _get_max_encode_depth(max_depth) returns sys.maxsize when max_depth
is None (i.e., unbounded) and otherwise returns the explicit value; search for
references to "capped to the current recursion limit" or similar and replace
them with wording that max_depth=None means unbounded (sys.maxsize) and that no
recursion-limit-based cap is applied.
✅ Actions performedReviews resumed. |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/qs_codec/utils/utils.py (1)
259-269: Simplify unreachable branch inresult_listconstruction.At line 260,
isinstance(current_target, (list, tuple))is alwaysFalsehere: a list/tuplecurrent_targetwould have been handled by theisinstancecheck at line 226 and alreadycontinued. The ternary's first branch is dead code.♻️ Proposed simplification
- first_iterable = current_target if isinstance(current_target, (list, tuple)) else [current_target] - for element in first_iterable: + for element in [current_target]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/utils/utils.py` around lines 259 - 269, The branch that checks isinstance(current_target, (list, tuple)) is unreachable here because list/tuple targets were handled earlier; simplify the result_list construction in utils.py by removing the dead ternary and treating current_target as a single element (e.g., use a single-element iterable for current_target), then append non-Undefined elements from that and from current_source, pop the stack, and set last_result; update references to result_list, current_target, current_source, Undefined, stack.pop, and last_result accordingly to keep behavior identical but without the unreachable branch.src/qs_codec/encode.py (1)
263-265: Add# noqa: TRY003to suppress Ruff warnings for consistency.Lines 265 and 382 both trigger Ruff's
TRY003(long message inraiseoutside an exception class). The equivalent locations inutils.pyalready carry# noqa: TRY003suppression comments—add them here too for consistency.♻️ Proposed suppressions
- raise ValueError("Circular reference detected") + raise ValueError("Circular reference detected") # noqa: TRY003🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/qs_codec/encode.py` around lines 263 - 265, Add a Ruff suppression comment "# noqa: TRY003" to the raise statements that use long literal messages in src/qs_codec/encode.py; specifically append the comment to the ValueError raised at the circular-reference check (the line with raise ValueError("Circular reference detected")) and to the other similar raise with a long message later in the file so they match the existing suppressions in utils.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/qs_codec/encode.py`:
- Around line 380-382: The duplicate-comment warning for the raise on the depth
check should be suppressed inline: locate the block that sets frame.max_depth
with _get_max_encode_depth and the line raising ValueError("Maximum encoding
depth exceeded") and append the project's preferred inline linter suppression
token (e.g., the configured "# noqa" / pragma or specific duplicate-comment
suppression) to that raise line so the duplicate_comment warning is ignored
without changing behavior.
---
Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 263-265: Add a Ruff suppression comment "# noqa: TRY003" to the
raise statements that use long literal messages in src/qs_codec/encode.py;
specifically append the comment to the ValueError raised at the
circular-reference check (the line with raise ValueError("Circular reference
detected")) and to the other similar raise with a long message later in the file
so they match the existing suppressions in utils.py.
In `@src/qs_codec/utils/utils.py`:
- Around line 259-269: The branch that checks isinstance(current_target, (list,
tuple)) is unreachable here because list/tuple targets were handled earlier;
simplify the result_list construction in utils.py by removing the dead ternary
and treating current_target as a single element (e.g., use a single-element
iterable for current_target), then append non-Undefined elements from that and
from current_source, pop the stack, and set last_result; update references to
result_list, current_target, current_source, Undefined, stack.pop, and
last_result accordingly to keep behavior identical but without the unreachable
branch.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/qs_codec/encode.py`:
- Around line 381-382: Extract the inline exception message into a module-level
constant (e.g., MAX_ENCODING_DEPTH_EXCEEDED = "Maximum encoding depth exceeded")
and replace the literal in the raise ValueError(...) expression so the check
that compares frame.depth > frame.max_depth now raises
ValueError(MAX_ENCODING_DEPTH_EXCEEDED); add the constant near other
module-level constants and keep the existing raise site (the line referencing
frame.depth and frame.max_depth) unchanged except for using the constant.
In `@src/qs_codec/utils/utils.py`:
- Around line 296-305: The checks against frame.merge_existing_keys can use a
stale merge_target value when a prior sibling merge updated pending_updates;
change the two branches that push a child _MergeFrame to look up the current
target from frame.pending_updates if present (e.g., target =
frame.pending_updates.get(key, merge_target[key])) before creating the
_MergeFrame (and do the same for normalized_key), so the child merges into the
updated pending_updates entry rather than the original merge_target entry.
- Around line 330-349: The branch inside list_iter that currently does "if
has_target and has_source: stack.append(_MergeFrame(...))" must short-circuit
when either frame.list_target[idx] or frame.list_source[idx] is an Undefined
sentinel: before creating the child _MergeFrame, read target_value =
frame.list_target[idx] and source_value = frame.list_source[idx]; if
target_value is an Undefined, append source_value to frame.list_merged (unless
source_value is Undefined), and if source_value is an Undefined, append
target_value to frame.list_merged (unless target_value is Undefined); only when
neither side is Undefined should you push the _MergeFrame (preserving existing
options and using list_wait_child/list_iter phases).
Refactors encode and Utils.merge to iterative traversal for deep-input safety and removes recursion-limit capping semantics for EncodeOptions.max_depth (None is unbounded, explicit limits enforced directly). Also includes encode hot-path optimizations and merge compatibility adjustments discovered during review.
Adds deep stack-safety regressions (depth 12_000), direct merge/decoder deep-path checks, and internal cycle-state compatibility tests for _encode/Utils.merge behavior.
Updates README/docs and internal guidance files to reflect max_depth semantics and adds 1.4.1-wip changelog entries for iterative hardening/performance work.
de7ea3b to
9695cb4
Compare
Summary
This PR hardens
qs_codecagainst deep-input stack overflows and significantly improves deep encode performance, while preserving existing behavior.Motivation
Deep conflicting decode merges and deep encode traversal could previously hit recursion limits. We aligned Python behavior with the iterative hardening approach used in other ports and added deep regressions.
What Changed
src/qs_codec/encode.py.Utils.mergetraversal to iterative frames insrc/qs_codec/utils/utils.py.EncodeOptions.max_depthsemantics:max_depth=Nonemeans unbounded by this option.max_depthis enforced directly.ValueError("Maximum encoding depth exceeded")remains unchanged.'1'vs1) in iterative map merge.README.rstdocs/README.rstsrc/qs_codec/models/encode_options.pydocstring12_000.Behavioral Notes
_encode(...)signature remains unchanged._encodecallers is preserved.Performance
Local deep-encode benchmark (
EncodeOptions(encode=False)) improved substantially:(Exact values depend on machine/environment.)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores