Skip to content

⚡ improve deep encode performance#42

Merged
techouse merged 3 commits intomainfrom
feat/iterative
Feb 22, 2026
Merged

⚡ improve deep encode performance#42
techouse merged 3 commits intomainfrom
feat/iterative

Conversation

@techouse
Copy link
Owner

@techouse techouse commented Feb 21, 2026

Summary

This PR hardens qs_codec against deep-input stack overflows and significantly improves deep encode performance, while preserving existing behavior.

Motivation

Deep conflicting decode merges and deep encode traversal could previously hit recursion limits. We aligned Python behavior with the iterative hardening approach used in other ports and added deep regressions.

What Changed

  • Reworked encoder traversal to iterative frames in src/qs_codec/encode.py.
  • Reworked Utils.merge traversal to iterative frames in src/qs_codec/utils/utils.py.
  • Optimized encode cycle detection from parent-chain scans to O(1) ancestry state lookups.
  • Updated EncodeOptions.max_depth semantics:
    • max_depth=None means unbounded by this option.
    • Explicit max_depth is enforced directly.
    • ValueError("Maximum encoding depth exceeded") remains unchanged.
  • Restored legacy map-merge collision semantics for mixed keys (e.g. '1' vs 1) in iterative map merge.
  • Updated docs:
    • README.rst
    • docs/README.rst
    • src/qs_codec/models/encode_options.py docstring
  • Added/updated deep regression tests at depth 12_000.

Behavioral Notes

  • Public API remains unchanged.
  • Private _encode(...) signature remains unchanged.
  • Internal side-channel/cycle-detection compatibility for direct _encode callers is preserved.

Performance

Local deep-encode benchmark (EncodeOptions(encode=False)) improved substantially:

  • Depth 2000: ~1.46s -> ~0.022s
  • Depth 5000: ~8.86s -> ~0.054s
  • Depth 12000: ~50.35s -> ~0.146s

(Exact values depend on machine/environment.)

Summary by CodeRabbit

  • New Features

    • Iterative, stack-safe processing for reliably handling extremely deep structures.
  • Bug Fixes

    • Preserved existing target slots on key collisions during merges.
    • Safer merge behavior and improved cycle-detection for complex nested objects.
  • Documentation

    • Simplified max_depth semantics: unset = unbounded; set = enforced directly.
  • Tests

    • Added tests for deep nesting, merge edge cases, and cycle scenarios.
  • Chores

    • Updated changelog with iterative fixes and optimizations.

@techouse techouse self-assigned this Feb 21, 2026
@techouse techouse added the enhancement New feature or request label Feb 21, 2026
@techouse techouse requested a review from Copilot February 21, 2026 17:05
@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces recursive encode/merge implementations with iterative, frame-based state machines; clarifies max_depth semantics (unset = unbounded, set = enforced directly); adds deep-nesting and cycle-related tests. No public API signature changes.

Changes

Cohort / File(s) Summary
Changelog & Docs
CHANGELOG.md, README.rst, docs/README.rst, .github/copilot-instructions.md, AGENTS.md
Appended unreleased changelog entries and updated documentation to state that max_depth=None is unbounded (implemented as sys.maxsize) and that explicit max_depth values are enforced directly (no recursion-limit capping).
Encoder refactor
src/qs_codec/encode.py
Replaced recursive _encode with an iterative, frame-driven _EncodeFrame state machine; added _CycleState and cycle helpers, phase-based traversal, per-frame depth tracking, and bootstrap logic to map legacy side-channel data. Encoder public signature unchanged.
Merge refactor
src/qs_codec/utils/utils.py
Reimplemented merging as an iterative, multi-phase _MergeFrame workflow for maps/lists to avoid Python recursion; preserves merge semantics and pending-update handling.
Options docs
src/qs_codec/models/encode_options.py
Clarified max_depth documentation to reflect new semantics (unset → unbounded; set → enforced directly).
Tests
tests/unit/decode_test.py, tests/unit/encode_test.py, tests/unit/utils_test.py
Added deep-nesting (12,000 depth) and stack-safety tests, cycle-state/bootstrap tests, and merge-collision/ordering tests; some tests import new internal helpers (_CycleState, _pop_current_node).
Misc docs
.github/..., AGENTS.md
Minor guidance and testing-note updates to align docs with the new depth semantics and iterative implementations.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Encoder as Encoder
    participant FrameStack as FrameStack
    participant CycleState as CycleState

    Caller->>Encoder: encode(value, options)
    Encoder->>FrameStack: push root _EncodeFrame
    loop while FrameStack not empty
        FrameStack->>Encoder: provide current frame
        Encoder->>CycleState: bootstrap/check ancestry
        alt primitive fast-path
            Encoder->>FrameStack: emit token -> parent accumulation
        else container
            Encoder->>FrameStack: iterate keys/indices -> push child _EncodeFrame
            FrameStack->>CycleState: push ancestry entry for child
        end
        Encoder->>FrameStack: compose child results into parent frame
        FrameStack->>CycleState: pop ancestry entry when child done
    end
    Encoder->>Caller: return final query string
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

test

Poem

🐰 I hopped through frames both deep and wide,
Iteration sprung where recursion died.
Ancestors counted, tokens stitched in line,
Deep nests tamed, no stack to whine. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description provides comprehensive context: summary, motivation, detailed changes, behavioral notes, and performance metrics. However, it does not follow the provided template structure with Type of change, How Has This Been Tested, and Checklist sections. Consider completing the template structure by adding 'Type of change' checkbox selections, 'How Has This Been Tested' section with test details, and marking the checklist items for code quality, tests, and documentation updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately conveys the main performance improvement objective (deep encode performance) and uses a clear, concise emoji-prefixed format common to the project.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/iterative

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

codacy-production bot commented Feb 21, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (560c47f) 1247 1247 100.00%
Head commit (57ac26e) 1430 (+183) 1430 (+183) 100.00% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#42) 401 401 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (560c47f) to head (9695cb4).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #42    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           17        17            
  Lines         1247      1436   +189     
==========================================
+ Hits          1247      1436   +189     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly improves deep encode performance by converting recursive traversal to iterative stack-based implementations, addressing stack overflow issues and dramatically improving performance for deeply nested inputs.

Changes:

  • Reworked encoder traversal (_encode) to use iterative frames with explicit state machine phases
  • Reworked Utils.merge to use iterative frames to prevent stack overflow on deep conflicting merges
  • Optimized cycle detection from O(n) parent-chain scans to O(1) ancestry state lookups
  • Updated max_depth semantics: None is now unbounded, explicit values enforced directly
  • Added comprehensive deep nesting regression tests at depth 12,000

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/qs_codec/encode.py Converted recursive _encode to iterative state machine with _EncodeFrame, added O(1) cycle detection via _CycleState, updated _get_max_encode_depth to return sys.maxsize when unbounded
src/qs_codec/utils/utils.py Converted recursive merge to iterative state machine with _MergeFrame, preserves legacy key collision semantics by capturing existing keys before processing source
src/qs_codec/models/encode_options.py Updated max_depth docstring to reflect new unbounded semantics when None
tests/unit/encode_test.py Added deep nesting test (12K depth), updated recursion limit test, added cycle state compatibility tests, imported new internal APIs
tests/unit/utils_test.py Added deep merge test (12K depth), added tests for key collision semantics preservation
tests/unit/decode_test.py Added deep conflicting merge test (12K depth)
README.rst Updated max_depth documentation to reflect unbounded semantics
docs/README.rst Updated max_depth documentation to reflect unbounded semantics
CHANGELOG.md Added comprehensive entry documenting fixes, optimizations, and behavioral changes

@techouse
Copy link
Owner Author

@codex review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/encode_test.py (1)

875-889: ⚠️ Potential issue | 🟡 Minor

monkeypatch.setattr of getrecursionlimit is a dead no-op and misleads readers

_get_max_encode_depth (encode.py lines 168–171) no longer calls sys.getrecursionlimit(), so patching it to return 16 has zero effect on the test. The test passes identically without the monkeypatch block, which risks misleading future readers into thinking the function still reads getrecursionlimit somewhere.

Consider either removing the monkeypatch entirely (with a comment explaining why) or strengthening the test to directly assert the uncapped value:

🧪 Proposed stronger assertion
-        limit = 16
-        monkeypatch.setattr(encode_module.sys, "getrecursionlimit", lambda: limit)
-
         data: t.Dict[str, t.Any] = {}
         current = data
         for _ in range(5):
             nxt: t.Dict[str, t.Any] = {}
             current["a"] = nxt
             current = nxt
         current["leaf"] = "x"

+        # _get_max_encode_depth no longer caps to sys.getrecursionlimit();
+        # it returns the explicit value unchanged.
+        assert encode_module._get_max_encode_depth(10_000) == 10_000

         with does_not_raise():
             result = encode(data, options=EncodeOptions(max_depth=10_000, encode=False))

         assert result.endswith("=x")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/encode_test.py` around lines 875 - 889, The test currently
monkeypatches sys.getrecursionlimit but _get_max_encode_depth no longer reads
that, so remove the misleading monkeypatch.setattr call (the lambda returning
16) and either add a short comment explaining it’s unnecessary or replace the
monkeypatch with an explicit assertion that the encoder uses the provided
max_depth (call encode(data, options=EncodeOptions(max_depth=10_000,
encode=False)) and assert the result endswith("=x") as before) to demonstrate
uncapped behavior; refer to the functions/_symbols _get_max_encode_depth,
encode, and EncodeOptions to locate and update the test.
🧹 Nitpick comments (3)
src/qs_codec/utils/utils.py (1)

282-283: Ruff TRY003: consider suppressing or extracting inline RuntimeError messages

Both guards are behind # pragma: no cover internal invariants that should never fire in production. Ruff TRY003 flags them because the message is inlined rather than in a custom exception class. The simplest fix is a per-line suppression; alternatively, define a private internal exception.

♻️ Minimal suppression approach
-                if merge_target is None:  # pragma: no cover - internal invariant
-                    raise RuntimeError("merge target is not initialized")
+                if merge_target is None:  # pragma: no cover - internal invariant
+                    raise RuntimeError("merge target is not initialized")  # noqa: TRY003

(Apply the same suppression to line 313.)

Also applies to: 312-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/utils/utils.py` around lines 282 - 283, The inline RuntimeError
raise in the guard "if merge_target is None: raise RuntimeError('merge target is
not initialized')" is flagged by Ruff TRY003; suppress it by adding a per-line
Ruff noqa comment to the raise (e.g. append the appropriate "# noqa: TRY003"
suppression to that raise statement) and apply the same suppression to the other
identical guard further down (the other "merge_target is None" RuntimeError
raise). Ensure you only add the noqa suppression to those raise lines and do not
change exception types or surrounding logic.
src/qs_codec/encode.py (2)

568-572: Ruff BLE001: broad except Exception is intentional but could be narrowed

The catch is deliberately wide to handle arbitrary __getitem__ implementations that raise unexpected exceptions (tested by test_exception_in_getitem and test_encode_handles_mapping_get_exception). If narrowing is acceptable, using except (AttributeError, TypeError, KeyError, LookupError, RuntimeError) would still cover all known test cases while satisfying Ruff; alternatively, add # noqa: BLE001 with a comment explaining the intent.

♻️ Suppression approach
-                except Exception:  # pylint: disable=W0718
+                except Exception:  # noqa: BLE001  # broad catch is intentional: caller __getitem__ may raise anything
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 568 - 572, The broad except in the block
that assigns _value from frame.obj[_key] (used in the encoding routine where
_value_undefined is set) should be changed to avoid BLE001: replace the bare
"except Exception:" with a narrowed tuple covering expected failures, e.g.
"except (AttributeError, TypeError, KeyError, LookupError, RuntimeError):" to
preserve the tested behavior (see tests test_exception_in_getitem and
test_encode_handles_mapping_get_exception) and keep the intent clear; if you
intentionally want to catch any Exception, instead add "# noqa: BLE001" on that
except line with a short comment explaining that arbitrary __getitem__
implementations may raise unexpected exceptions and the broad catch is
deliberate.

590-592: O(total_nodes) WeakKeyDictionary allocations for backward-compat side-channel chain

A new WeakKeyDictionary is created for every child key processed in the iterate phase. For a tree with N total nodes, this produces N WKD instances — e.g., 12,000 for the depth-12,000 regression test. Each WKD is a Python dict under the hood, so this is non-trivial allocation even though the objects are small.

These WKDs exist purely to maintain the legacy side-channel chain structure for callers that call _encode directly and inspect the chain (e.g., the cycle-detection unit tests). The new _CycleState path handles all actual cycle detection without them.

A follow-up optimization could skip constructing value_side_channel entirely on the iterative path and only build it when _encode is invoked directly (detectable via an explicit flag or by checking that cycle_state was bootstrapped from a non-empty chain). This would further reduce allocation pressure on deep trees without breaking any public contracts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 590 - 592, The per-child allocation of a
new WeakKeyDictionary (value_side_channel) to thread the parent side-channel via
_sentinel is causing O(total_nodes) allocations; change this to lazily create or
reuse the side-channel mapping only when callers actually invoke _encode
directly or when cycle detection needs the legacy chain: in the iterative path
(where frame.side_channel is propagated) avoid instantiating value_side_channel
at each child and instead defer creation until either an explicit new
parameter/flag (e.g., force_legacy_side_channel) is set or when the incoming
cycle_state/_CycleState indicates a non-empty legacy chain; update the code that
currently does value_side_channel: WeakKeyDictionary = WeakKeyDictionary();
value_side_channel[_sentinel] = frame.side_channel to first check the condition
and only then allocate and set _sentinel, leaving existing behavior for callers
of _encode and for cycle detection intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/qs_codec/encode.py`:
- Around line 168-171: Update the repository coding guideline and the CR
learning note to reflect the new behavior of max_depth: replace any assertion
that "max_depth is capped to the current recursion limit" with the new text
"`max_depth=None` results in unbounded depth (sys.maxsize); explicit values are
used directly without recursion-limit capping." Ensure references to the
function _get_max_encode_depth and the EncodeOptions.max_depth docstring are
consistent with this wording, and update any files matching src/qs_codec/**/*.py
and other **/*.py where the old guideline text appears.

In `@src/qs_codec/models/encode_options.py`:
- Around line 120-124: The docstring and behavior for max_depth must enforce the
project cap to the Python recursion limit: update the description for max_depth
in encode_options.py to state that None means "use sys.getrecursionlimit()" and
that any explicit value is capped to that recursion limit; implement the cap
where the option is normalized (e.g., EncodeOptions.__post_init__ or the code
path that reads encode_options.max_depth) by replacing the stored value with
min(provided_value, sys.getrecursionlimit()) (or setting it to
sys.getrecursionlimit() when None), and adjust any tests that assume an uncapped
None accordingly.

---

Outside diff comments:
In `@tests/unit/encode_test.py`:
- Around line 875-889: The test currently monkeypatches sys.getrecursionlimit
but _get_max_encode_depth no longer reads that, so remove the misleading
monkeypatch.setattr call (the lambda returning 16) and either add a short
comment explaining it’s unnecessary or replace the monkeypatch with an explicit
assertion that the encoder uses the provided max_depth (call encode(data,
options=EncodeOptions(max_depth=10_000, encode=False)) and assert the result
endswith("=x") as before) to demonstrate uncapped behavior; refer to the
functions/_symbols _get_max_encode_depth, encode, and EncodeOptions to locate
and update the test.

---

Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 568-572: The broad except in the block that assigns _value from
frame.obj[_key] (used in the encoding routine where _value_undefined is set)
should be changed to avoid BLE001: replace the bare "except Exception:" with a
narrowed tuple covering expected failures, e.g. "except (AttributeError,
TypeError, KeyError, LookupError, RuntimeError):" to preserve the tested
behavior (see tests test_exception_in_getitem and
test_encode_handles_mapping_get_exception) and keep the intent clear; if you
intentionally want to catch any Exception, instead add "# noqa: BLE001" on that
except line with a short comment explaining that arbitrary __getitem__
implementations may raise unexpected exceptions and the broad catch is
deliberate.
- Around line 590-592: The per-child allocation of a new WeakKeyDictionary
(value_side_channel) to thread the parent side-channel via _sentinel is causing
O(total_nodes) allocations; change this to lazily create or reuse the
side-channel mapping only when callers actually invoke _encode directly or when
cycle detection needs the legacy chain: in the iterative path (where
frame.side_channel is propagated) avoid instantiating value_side_channel at each
child and instead defer creation until either an explicit new parameter/flag
(e.g., force_legacy_side_channel) is set or when the incoming
cycle_state/_CycleState indicates a non-empty legacy chain; update the code that
currently does value_side_channel: WeakKeyDictionary = WeakKeyDictionary();
value_side_channel[_sentinel] = frame.side_channel to first check the condition
and only then allocate and set _sentinel, leaving existing behavior for callers
of _encode and for cycle detection intact.

In `@src/qs_codec/utils/utils.py`:
- Around line 282-283: The inline RuntimeError raise in the guard "if
merge_target is None: raise RuntimeError('merge target is not initialized')" is
flagged by Ruff TRY003; suppress it by adding a per-line Ruff noqa comment to
the raise (e.g. append the appropriate "# noqa: TRY003" suppression to that
raise statement) and apply the same suppression to the other identical guard
further down (the other "merge_target is None" RuntimeError raise). Ensure you
only add the noqa suppression to those raise lines and do not change exception
types or surrounding logic.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/copilot-instructions.md (1)

16-16: LGTM — accurate retirement of the recursion-limit cap.

The updated invariant correctly reflects the iterative implementation: max_depth=None → sys.maxsize (unbounded) and explicit values enforced directly, with no recursion-limit margin needed. The "For encoding" qualifier properly scopes this to EncodeOptions so agents don't misapply it to decode-side depth semantics.

One optional improvement for symmetry with the cycle-detection note on line 15: you could append the expected error message so agents know to preserve it — e.g., "exceeding an explicit max_depth raises ValueError('Maximum encoding depth exceeded')." It's already documented in encode_options.py and the README, so this is purely a completeness nit.

📝 Optional: add the error-message note inline for agent completeness
- Depth, list, and parameter limits are security/safety features: respect `depth`, `max_depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert. For encoding, `max_depth=None` means unbounded traversal (`sys.maxsize`); explicit values are enforced directly without recursion-limit capping.
+ Depth, list, and parameter limits are security/safety features: respect `depth`, `max_depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert. For encoding, `max_depth=None` means unbounded traversal (`sys.maxsize`); explicit values are enforced directly without recursion-limit capping—exceeding an explicit `max_depth` must raise `ValueError("Maximum encoding depth exceeded")`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/copilot-instructions.md at line 16, Update the EncodeOptions
documentation line to also state the exact error raised when an explicit
max_depth is exceeded: mention that exceeding an explicit `max_depth` raises
ValueError('Maximum encoding depth exceeded'); locate the wording around
`EncodeOptions`, `max_depth`, `sys.maxsize` in the
`.github/copilot-instructions.md` diff and mirror the message used in
`encode_options.py` to keep docs and implementation consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/copilot-instructions.md:
- Line 16: Update the EncodeOptions documentation line to also state the exact
error raised when an explicit max_depth is exceeded: mention that exceeding an
explicit `max_depth` raises ValueError('Maximum encoding depth exceeded');
locate the wording around `EncodeOptions`, `max_depth`, `sys.maxsize` in the
`.github/copilot-instructions.md` diff and mirror the message used in
`encode_options.py` to keep docs and implementation consistent.

@techouse
Copy link
Owner Author

@coderabbitai resume

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

✅ Actions performed

Reviews resumed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/qs_codec/utils/utils.py (1)

164-182: List mutation is intentional per guidelines but worth a brief inline note.

At Line 177, when current_target is a list (not a tuple), the code extends it in place. This is consistent with the project guideline that Utils.merge may reuse targets for performance. A one-line comment like # Mutates in-place; see coding guidelines would help future readers who aren't familiar with this contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/utils/utils.py` around lines 164 - 182, Add a brief inline
comment clarifying intentional in-place mutation: in the merge logic where
current_target is a list (inside Utils.merge or the surrounding function
handling frame.list_target/frame.list_source), annotate the branch that calls
mutable_target.extend(...) with a one-line comment such as "# Mutates in-place
per utils.merge guidelines" to signal this is by design and not a bug; place the
comment immediately above the mutable_target.extend(...) call so future readers
understand the reuse-for-performance contract.
src/qs_codec/encode.py (3)

468-502: Inconsistent is vs == for generator identity checks (cosmetic).

Lines 412 and 603 use is to compare frame.generate_array_prefix against ListFormat.COMMA.generator, while Lines 470 and 510 use ==. For plain function objects both are equivalent, but mixing styles hurts readability. Consider standardising on is throughout, since you're checking object identity of an enum-derived function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 468 - 502, Standardize identity checks
by replacing equality comparisons that compare frame.generate_array_prefix to
ListFormat.COMMA.generator with identity checks: wherever you see
"frame.generate_array_prefix == ListFormat.COMMA.generator" (e.g., in the encode
logic that determines comma-mode array handling), change it to
"frame.generate_array_prefix is ListFormat.COMMA.generator" so all generator
comparisons use "is" consistently for the enum-derived generator function.

391-457: WeakWrapper is allocated for primitives before the fast-path exit.

WeakWrapper(frame.value) at Line 395 creates a proxy + wrapper for every node, including scalars and bytes that immediately exit via the fast path at Lines 434-457. For wide, shallow structures this adds unnecessary allocation overhead.

Moving the WeakWrapper creation (and the cycle-detection block) below the primitive/bytes fast path—or guarding it with an isinstance container check—would avoid the extra work for leaf values.

Suggested reordering sketch
             # Work with the original; we never mutate in place.
             obj: t.Any = frame.value
 
-            # --- Cycle detection via ancestry lookup state --------------------------------
-            obj_wrapper: WeakWrapper = WeakWrapper(frame.value)
-            if frame.cycle_state is None or frame.cycle_level is None:
-                frame.cycle_state, frame.cycle_level = _bootstrap_cycle_state_from_side_channel(frame.side_channel)
-            step = _compute_step_and_check_cycle(frame.cycle_state, obj_wrapper, frame.cycle_level)
-
-            frame.obj_wrapper = obj_wrapper
-            frame.step = step
-
             # --- Pre-processing: filter & datetime handling -------------------------------
             filter_opt = frame.filter_
             ...
 
+            # --- Fast path for primitives/bytes -----------------------------------------
+            if Utils.is_non_nullish_primitive(obj, frame.skip_nulls) or isinstance(obj, bytes):
+                ...  # (unchanged, pop + continue)
+
+            # --- Cycle detection (containers only) ------------------------------------
+            obj_wrapper: WeakWrapper = WeakWrapper(obj)
+            if frame.cycle_state is None or frame.cycle_level is None:
+                frame.cycle_state, frame.cycle_level = _bootstrap_cycle_state_from_side_channel(frame.side_channel)
+            step = _compute_step_and_check_cycle(frame.cycle_state, obj_wrapper, frame.cycle_level)
+
+            frame.obj_wrapper = obj_wrapper
+            frame.step = step
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 391 - 457, The WeakWrapper is being
created for every node (WeakWrapper(frame.value)) before we hit the
primitive/bytes fast-path in encode.py, causing unnecessary allocations for leaf
values; move the WeakWrapper creation and the cycle-detection calls
(_bootstrap_cycle_state_from_side_channel and _compute_step_and_check_cycle that
set frame.cycle_state/frame.cycle_level/frame.obj_wrapper/frame.step) to after
the fast-path check (i.e., after the Utils.is_non_nullish_primitive(...) or
isinstance(obj, bytes) branch) or wrap that block in a guard that only runs for
container types (list/tuple/dict/set/custom iterables) so primitives/bytes exit
immediately without allocating WeakWrapper or running cycle logic. Ensure you
update any later uses that expect frame.obj_wrapper/step to be set (compute them
lazily when you first need them).

174-209: Consider splitting constant options from per-frame mutable state.

The _EncodeFrame dataclass carries ~15 option fields that are identical across every frame in a traversal (e.g., encoder, serialize_date, sort, filter_, formatter, charset, strict_null_handling, …). For depth-12,000 nesting, that's 12,000 copies of the same references.

Extracting the immutable option fields into a single shared context object (passed by reference to each frame) would cut per-frame allocation roughly in half and improve cache locality. Not urgent given the reported perf wins, but worth keeping in mind if memory becomes a concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 174 - 209, The _EncodeFrame dataclass
currently stores many immutable option fields per frame; extract those constant
fields into a new shared dataclass (e.g., EncodeOptions) and replace the
duplicated fields in _EncodeFrame with a single reference like options:
EncodeOptions; move fields such as encoder, serialize_date, sort, filter_,
formatter, format, generate_array_prefix, charset, comma_round_trip,
comma_compact_nulls, allow_empty_lists, strict_null_handling, skip_nulls,
encode_dot_in_keys, allow_dots, encode_values_only, add_query_prefix, and any
other config-like attributes into EncodeOptions; update all constructors/places
that instantiate _EncodeFrame to supply the shared options instance and adjust
usages in functions that read those fields to access frame.options.<field>
instead of frame.<field>, ensuring behavior remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 468-502: Standardize identity checks by replacing equality
comparisons that compare frame.generate_array_prefix to
ListFormat.COMMA.generator with identity checks: wherever you see
"frame.generate_array_prefix == ListFormat.COMMA.generator" (e.g., in the encode
logic that determines comma-mode array handling), change it to
"frame.generate_array_prefix is ListFormat.COMMA.generator" so all generator
comparisons use "is" consistently for the enum-derived generator function.
- Around line 391-457: The WeakWrapper is being created for every node
(WeakWrapper(frame.value)) before we hit the primitive/bytes fast-path in
encode.py, causing unnecessary allocations for leaf values; move the WeakWrapper
creation and the cycle-detection calls (_bootstrap_cycle_state_from_side_channel
and _compute_step_and_check_cycle that set
frame.cycle_state/frame.cycle_level/frame.obj_wrapper/frame.step) to after the
fast-path check (i.e., after the Utils.is_non_nullish_primitive(...) or
isinstance(obj, bytes) branch) or wrap that block in a guard that only runs for
container types (list/tuple/dict/set/custom iterables) so primitives/bytes exit
immediately without allocating WeakWrapper or running cycle logic. Ensure you
update any later uses that expect frame.obj_wrapper/step to be set (compute them
lazily when you first need them).
- Around line 174-209: The _EncodeFrame dataclass currently stores many
immutable option fields per frame; extract those constant fields into a new
shared dataclass (e.g., EncodeOptions) and replace the duplicated fields in
_EncodeFrame with a single reference like options: EncodeOptions; move fields
such as encoder, serialize_date, sort, filter_, formatter, format,
generate_array_prefix, charset, comma_round_trip, comma_compact_nulls,
allow_empty_lists, strict_null_handling, skip_nulls, encode_dot_in_keys,
allow_dots, encode_values_only, add_query_prefix, and any other config-like
attributes into EncodeOptions; update all constructors/places that instantiate
_EncodeFrame to supply the shared options instance and adjust usages in
functions that read those fields to access frame.options.<field> instead of
frame.<field>, ensuring behavior remains identical.

In `@src/qs_codec/utils/utils.py`:
- Around line 164-182: Add a brief inline comment clarifying intentional
in-place mutation: in the merge logic where current_target is a list (inside
Utils.merge or the surrounding function handling
frame.list_target/frame.list_source), annotate the branch that calls
mutable_target.extend(...) with a one-line comment such as "# Mutates in-place
per utils.merge guidelines" to signal this is by design and not a bug; place the
comment immediately above the mutable_target.extend(...) call so future readers
understand the reuse-for-performance contract.

@techouse
Copy link
Owner Author

@coderabbitai resume

@techouse techouse requested a review from Copilot February 21, 2026 19:01
@techouse
Copy link
Owner Author

@codex review

@codacy-production
Copy link

codacy-production bot commented Feb 21, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (560c47f) 1247 1247 100.00%
Head commit (9695cb4) 1436 (+189) 1436 (+189) 100.00% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#42) 407 407 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/qs_codec/encode.py`:
- Around line 168-171: The comment/guideline about capping max_depth to the
recursion limit is now stale given the iterative implementation; update any
documentation and inline comments referring to that behavior and ensure they
reflect that _get_max_encode_depth(max_depth) returns sys.maxsize when max_depth
is None (i.e., unbounded) and otherwise returns the explicit value; search for
references to "capped to the current recursion limit" or similar and replace
them with wording that max_depth=None means unbounded (sys.maxsize) and that no
recursion-limit-based cap is applied.

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

✅ Actions performed

Reviews resumed.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/qs_codec/utils/utils.py (1)

259-269: Simplify unreachable branch in result_list construction.

At line 260, isinstance(current_target, (list, tuple)) is always False here: a list/tuple current_target would have been handled by the isinstance check at line 226 and already continued. The ternary's first branch is dead code.

♻️ Proposed simplification
- first_iterable = current_target if isinstance(current_target, (list, tuple)) else [current_target]
- for element in first_iterable:
+ for element in [current_target]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/utils/utils.py` around lines 259 - 269, The branch that checks
isinstance(current_target, (list, tuple)) is unreachable here because list/tuple
targets were handled earlier; simplify the result_list construction in utils.py
by removing the dead ternary and treating current_target as a single element
(e.g., use a single-element iterable for current_target), then append
non-Undefined elements from that and from current_source, pop the stack, and set
last_result; update references to result_list, current_target, current_source,
Undefined, stack.pop, and last_result accordingly to keep behavior identical but
without the unreachable branch.
src/qs_codec/encode.py (1)

263-265: Add # noqa: TRY003 to suppress Ruff warnings for consistency.

Lines 265 and 382 both trigger Ruff's TRY003 (long message in raise outside an exception class). The equivalent locations in utils.py already carry # noqa: TRY003 suppression comments—add them here too for consistency.

♻️ Proposed suppressions
- raise ValueError("Circular reference detected")
+ raise ValueError("Circular reference detected")  # noqa: TRY003
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/qs_codec/encode.py` around lines 263 - 265, Add a Ruff suppression
comment "# noqa: TRY003" to the raise statements that use long literal messages
in src/qs_codec/encode.py; specifically append the comment to the ValueError
raised at the circular-reference check (the line with raise ValueError("Circular
reference detected")) and to the other similar raise with a long message later
in the file so they match the existing suppressions in utils.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/qs_codec/encode.py`:
- Around line 380-382: The duplicate-comment warning for the raise on the depth
check should be suppressed inline: locate the block that sets frame.max_depth
with _get_max_encode_depth and the line raising ValueError("Maximum encoding
depth exceeded") and append the project's preferred inline linter suppression
token (e.g., the configured "# noqa" / pragma or specific duplicate-comment
suppression) to that raise line so the duplicate_comment warning is ignored
without changing behavior.

---

Nitpick comments:
In `@src/qs_codec/encode.py`:
- Around line 263-265: Add a Ruff suppression comment "# noqa: TRY003" to the
raise statements that use long literal messages in src/qs_codec/encode.py;
specifically append the comment to the ValueError raised at the
circular-reference check (the line with raise ValueError("Circular reference
detected")) and to the other similar raise with a long message later in the file
so they match the existing suppressions in utils.py.

In `@src/qs_codec/utils/utils.py`:
- Around line 259-269: The branch that checks isinstance(current_target, (list,
tuple)) is unreachable here because list/tuple targets were handled earlier;
simplify the result_list construction in utils.py by removing the dead ternary
and treating current_target as a single element (e.g., use a single-element
iterable for current_target), then append non-Undefined elements from that and
from current_source, pop the stack, and set last_result; update references to
result_list, current_target, current_source, Undefined, stack.pop, and
last_result accordingly to keep behavior identical but without the unreachable
branch.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/qs_codec/encode.py`:
- Around line 381-382: Extract the inline exception message into a module-level
constant (e.g., MAX_ENCODING_DEPTH_EXCEEDED = "Maximum encoding depth exceeded")
and replace the literal in the raise ValueError(...) expression so the check
that compares frame.depth > frame.max_depth now raises
ValueError(MAX_ENCODING_DEPTH_EXCEEDED); add the constant near other
module-level constants and keep the existing raise site (the line referencing
frame.depth and frame.max_depth) unchanged except for using the constant.

In `@src/qs_codec/utils/utils.py`:
- Around line 296-305: The checks against frame.merge_existing_keys can use a
stale merge_target value when a prior sibling merge updated pending_updates;
change the two branches that push a child _MergeFrame to look up the current
target from frame.pending_updates if present (e.g., target =
frame.pending_updates.get(key, merge_target[key])) before creating the
_MergeFrame (and do the same for normalized_key), so the child merges into the
updated pending_updates entry rather than the original merge_target entry.
- Around line 330-349: The branch inside list_iter that currently does "if
has_target and has_source: stack.append(_MergeFrame(...))" must short-circuit
when either frame.list_target[idx] or frame.list_source[idx] is an Undefined
sentinel: before creating the child _MergeFrame, read target_value =
frame.list_target[idx] and source_value = frame.list_source[idx]; if
target_value is an Undefined, append source_value to frame.list_merged (unless
source_value is Undefined), and if source_value is an Undefined, append
target_value to frame.list_merged (unless target_value is Undefined); only when
neither side is Undefined should you push the _MergeFrame (preserving existing
options and using list_wait_child/list_iter phases).

Refactors encode and Utils.merge to iterative traversal for deep-input safety and removes recursion-limit capping semantics for EncodeOptions.max_depth (None is unbounded, explicit limits enforced directly). Also includes encode hot-path optimizations and merge compatibility adjustments discovered during review.
Adds deep stack-safety regressions (depth 12_000), direct merge/decoder deep-path checks, and internal cycle-state compatibility tests for _encode/Utils.merge behavior.
Updates README/docs and internal guidance files to reflect max_depth semantics and adds 1.4.1-wip changelog entries for iterative hardening/performance work.
@techouse techouse merged commit c8067b6 into main Feb 22, 2026
7 checks passed
@techouse techouse deleted the feat/iterative branch February 22, 2026 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants