♻️ Introduce NeedLink structured internal representation for links#1670
Merged
chrisjsewell merged 4 commits intomasterfrom Mar 16, 2026
Merged
♻️ Introduce NeedLink structured internal representation for links#1670chrisjsewell merged 4 commits intomasterfrom
NeedLink structured internal representation for links#1670chrisjsewell merged 4 commits intomasterfrom
Conversation
# PR: Introduce `NeedLink` structured internal representation for links ## Motivation This refactor introduces `NeedLink` — a frozen dataclass with `id` and `part` fields — as the internal representation for links and backlinks in `NeedItem`. This is a preparatory step for the **constrained link syntax** (`ADDRESS[filter_expr]`), where `NeedLink` will gain a `constraint` field to support inline validation of linked needs. Previously, links were stored internally as `dict[str, list[str]]` — flat string lists with no structure. The id/part split (`"NEED-1.part"`) was re-parsed at every usage site via `split_need_id()`. This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere. ## What changed ### New `NeedLink` dataclass ```python @DataClass(slots=True, frozen=True, kw_only=True) class NeedLink: id: str part: str | None = None ``` - `from_string("NEED-1.part")` → `NeedLink(id="NEED-1", part="part")` - `to_filter_string()` → `"NEED-1.part"` (round-trips to the original string) ### Internal storage changed | Before | After | |--------|-------| | `_links: dict[str, list[str]]` | `_links: dict[str, list[NeedLink]]` | | `_backlinks: dict[str, list[str]]` | `_backlinks: dict[str, list[NeedLink]]` | ### External API unchanged All public-facing access points (`__getitem__`, `items()`, `values()`, `get_links()`, `get_backlinks()`, `iter_links_items()`, `iter_backlinks_items()`, filter context via `{**need}`) continue to return `list[str]` through `to_filter_string()`. JSON serialization, filter evaluation, and directive processing see no change. ### `__setitem__` accepts both formats `need["links"] = ["NEED-1", NeedLink(id="NEED-1")]` — both strings and `NeedLink` instances are accepted and normalized to `NeedLink` internally. Same for `add_backlink()`. ### Validation error messages updated Internal validation messages now correctly reference `NeedLink` instances instead of strings. ### `parent_need` computed field fixed `_recompute()` now calls `to_filter_string()` on the first `parent_needs` link, since the internal list is now `list[NeedLink]` rather than `list[str]`. ## Back-compatibility considerations ### 1. Link list mutation via `__getitem__` is now a no-op (MEDIUM risk) **Before**: `need["links"]` returned the actual internal `list[str]`, so `need["links"].append("x")` mutated the stored data. **After**: `need["links"]` returns a freshly constructed `list[str]` (via list comprehension over `NeedLink` objects), so `.append()` mutates a throwaway copy. **Impact assessment**: No internal sphinx-needs code relies on this pattern. All link mutations use either: - `__setitem__` with read-copy-write: `need[k] = [*need[k], ...]` (needextend) - Typed methods: `add_backlink()`, `reset_backlinks()` - Direct `NeedPartData.backlinks` dict mutation (for parts) The only indexed mutation found (`utils.py:import_prefix_link_edit`) operates on plain dicts from JSON imports, not `NeedItem` objects. **External extensions** that relied on `need["links"].append("x")` would silently break. This is an inherent consequence of the structured internal storage. **Possible mitigation** (not yet implemented): Return `tuple` instead of `list` from `__getitem__` for link fields, turning silent failure into a loud `AttributeError`. This makes the immutability contract explicit. ### 2. `get_links()` / `get_backlinks()` now return copies (LOW risk) Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers. ### 3. `iter_links_items()` / `iter_backlinks_items()` return string lists (LOW risk) Previously yielded `(key, list[str])` where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual. ## Design decisions and future direction ### Why `to_filter_string()` excludes constraints When constraints are added (e.g. `NeedLink(id="REQ-1", constraint="status==approved")`), the filter string representation should **not** include the constraint. Filter expressions like `"REQ-1" in links` should match based on the target ID, not the constraint predicate. Constraints are a property of the link *declaration* that gets validated separately. This naturally leads to two serialization methods: - `to_filter_string()` → `"ID"` or `"ID.part"` — for filter context, backward compat - `to_string()` (future) → `"ID[constraint]"` — full round-trip representation ### `NeedLink` replaces `split_need_id()` The `split_need_id()` utility (called in `update_back_links`, `needuml`, `need_ref`, `need_outgoing`) parses `"NEED-1.part"` into `(id, part)` — exactly what `NeedLink.from_string()` does. Once `NeedLink` is propagated through the pipeline, `split_need_id()` can be removed. ### Equality semantics for deduplication `add_backlink()` deduplicates via `if backlink not in self._backlinks[link_type]`, relying on `NeedLink.__eq__`. Since `NeedLink` is a frozen dataclass, equality is structural — two `NeedLink(id="X", part=None)` are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom `__eq__` or a separate dedup key. ### `from_string()` dot-splitting edge case `NeedLink.from_string("A.B.C")` splits on the first `.` → `id="A", part="B.C"`. This matches current `split_need_id` behavior. If `id_regex` allows dots, there is inherent ambiguity — same as today, but documented here for awareness.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1670 +/- ##
==========================================
+ Coverage 86.87% 88.89% +2.01%
==========================================
Files 56 71 +15
Lines 6532 10027 +3495
==========================================
+ Hits 5675 8914 +3239
- Misses 857 1113 +256
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ubmarco
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This refactor introduces
NeedLink— a frozen dataclass withidandpartfields — as the internal representation for links and backlinks inNeedItem. This is a preparatory step for the constrained link syntax (ADDRESS[filter_expr]), whereNeedLinkwill gain aconstraintfield to support inline validation of linked needs.Previously, links were stored internally as
dict[str, list[str]]— flat string lists with no structure. The id/part split ("NEED-1.part") was re-parsed at every usage site viasplit_need_id(). This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere.What changed
New
NeedLinkdataclassfrom_string("NEED-1.part")→NeedLink(id="NEED-1", part="part")to_filter_string()→"NEED-1.part"(round-trips to the original string)Internal storage changed
_links: dict[str, list[str]]_links: dict[str, list[NeedLink]]External API unchanged
All public-facing access points (
__getitem__,items(),values(),get_links(),get_backlinks(),iter_links_items(),iter_backlinks_items(), filter context via{**need}) continue to returnlist[str]throughto_filter_string(). JSON serialization, filter evaluation, and directive processing see no change.__setitem__accepts both formatsneed["links"] = ["NEED-1", NeedLink(id="NEED-1")]— both strings andNeedLinkinstances are accepted and normalized toNeedLinkinternally. Same foradd_backlink().Validation error messages updated
Internal validation messages now correctly reference
NeedLinkinstances instead of strings.parent_needcomputed field fixed_recompute()now callsto_filter_string()on the firstparent_needslink, since the internal list is nowlist[NeedLink]rather thanlist[str].Back-compatibility considerations
1. Link list mutation via
__getitem__is now a no-op (MEDIUM risk)Before:
need["links"]returned the actual internallist[str], soneed["links"].append("x")mutated the stored data.After:
need["links"]returns a freshly constructedlist[str](via list comprehension overNeedLinkobjects), so.append()mutates a throwaway copy.Impact assessment: No internal sphinx-needs code relies on this pattern. All link mutations use either:
__setitem__with read-copy-write:need[k] = [*need[k], ...](needextend)add_backlink(),reset_backlinks()NeedPartData.backlinksdict mutation (for parts)The only indexed mutation found (
utils.py:import_prefix_link_edit) operates on plain dicts from JSON imports, notNeedItemobjects.External extensions that relied on
need["links"].append("x")would silently break. This is an inherent consequence of the structured internal storage.Possible mitigation (not yet implemented): Return
tupleinstead oflistfrom__getitem__for link fields, turning silent failure into a loudAttributeError. This makes the immutability contract explicit.2.
get_links()/get_backlinks()now return copies (LOW risk)Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers.
3.
iter_links_items()/iter_backlinks_items()return string lists (LOW risk)Previously yielded
(key, list[str])where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual.Design decisions and future direction
Why
to_filter_string()excludes constraintsWhen constraints are added (e.g.
NeedLink(id="REQ-1", constraint="status==approved")), the filter string representation should not include the constraint. Filter expressions like"REQ-1" in linksshould match based on the target ID, not the constraint predicate. Constraints are a property of the link declaration that gets validated separately.This naturally leads to two serialization methods:
to_filter_string()→"ID"or"ID.part"— for filter context, backward compatto_string()(future) →"ID[constraint]"— full round-trip representationNeedLinkreplacessplit_need_id()The
split_need_id()utility (called inupdate_back_links,needuml,need_ref,need_outgoing) parses"NEED-1.part"into(id, part)— exactly whatNeedLink.from_string()does. OnceNeedLinkis propagated through the pipeline,split_need_id()can be removed.Equality semantics for deduplication
add_backlink()deduplicates viaif backlink not in self._backlinks[link_type], relying onNeedLink.__eq__. SinceNeedLinkis a frozen dataclass, equality is structural — twoNeedLink(id="X", part=None)are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom__eq__or a separate dedup key.from_string()dot-splitting edge caseNeedLink.from_string("A.B.C")splits on the first.→id="A", part="B.C". This matches currentsplit_need_idbehavior. Ifid_regexallows dots, there is inherent ambiguity — same as today, but documented here for awareness.