♻️ Introduce `NeedLink` structured internal representation for links by chrisjsewell · Pull Request #1670 · useblocks/sphinx-needs

chrisjsewell · 2026-03-16T09:47:15Z

Motivation

This refactor introduces NeedLink — a frozen dataclass with id and part fields — as the internal representation for links and backlinks in NeedItem. This is a preparatory step for the constrained link syntax (ADDRESS[filter_expr]), where NeedLink will gain a constraint field to support inline validation of linked needs.

Previously, links were stored internally as dict[str, list[str]] — flat string lists with no structure. The id/part split ("NEED-1.part") was re-parsed at every usage site via split_need_id(). This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere.

What changed

New `NeedLink` dataclass

@dataclass(slots=True, frozen=True, kw_only=True)
class NeedLink:
    id: str
    part: str | None = None

from_string("NEED-1.part") → NeedLink(id="NEED-1", part="part")
to_filter_string() → "NEED-1.part" (round-trips to the original string)

Internal storage changed

Before	After
`_links: dict[str, list[str]]`	`_links: dict[str, list[NeedLink]]`

External API unchanged

All public-facing access points (__getitem__, items(), values(), get_links(), get_backlinks(), iter_links_items(), iter_backlinks_items(), filter context via {**need}) continue to return list[str] through to_filter_string(). JSON serialization, filter evaluation, and directive processing see no change.

`setitem` accepts both formats

need["links"] = ["NEED-1", NeedLink(id="NEED-1")] — both strings and NeedLink instances are accepted and normalized to NeedLink internally. Same for add_backlink().

Validation error messages updated

Internal validation messages now correctly reference NeedLink instances instead of strings.

`parent_need` computed field fixed

_recompute() now calls to_filter_string() on the first parent_needs link, since the internal list is now list[NeedLink] rather than list[str].

Back-compatibility considerations

1. Link list mutation via `getitem` is now a no-op (MEDIUM risk)

Before: need["links"] returned the actual internal list[str], so need["links"].append("x") mutated the stored data.

After: need["links"] returns a freshly constructed list[str] (via list comprehension over NeedLink objects), so .append() mutates a throwaway copy.

Impact assessment: No internal sphinx-needs code relies on this pattern. All link mutations use either:

__setitem__ with read-copy-write: need[k] = [*need[k], ...] (needextend)
Typed methods: add_backlink(), reset_backlinks()
Direct NeedPartData.backlinks dict mutation (for parts)

The only indexed mutation found (utils.py:import_prefix_link_edit) operates on plain dicts from JSON imports, not NeedItem objects.

External extensions that relied on need["links"].append("x") would silently break. This is an inherent consequence of the structured internal storage.

Possible mitigation (not yet implemented): Return tuple instead of list from __getitem__ for link fields, turning silent failure into a loud AttributeError. This makes the immutability contract explicit.

2. `get_links()` / `get_backlinks()` now return copies (LOW risk)

Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers.

3. `iter_links_items()` / `iter_backlinks_items()` return string lists (LOW risk)

Previously yielded (key, list[str]) where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual.

Design decisions and future direction

Why `to_filter_string()` excludes constraints

When constraints are added (e.g. NeedLink(id="REQ-1", constraint="status==approved")), the filter string representation should not include the constraint. Filter expressions like "REQ-1" in links should match based on the target ID, not the constraint predicate. Constraints are a property of the link declaration that gets validated separately.

This naturally leads to two serialization methods:

to_filter_string() → "ID" or "ID.part" — for filter context, backward compat
to_string() (future) → "ID[constraint]" — full round-trip representation

`NeedLink` replaces `split_need_id()`

The split_need_id() utility (called in update_back_links, needuml, need_ref, need_outgoing) parses "NEED-1.part" into (id, part) — exactly what NeedLink.from_string() does. Once NeedLink is propagated through the pipeline, split_need_id() can be removed.

Equality semantics for deduplication

add_backlink() deduplicates via if backlink not in self._backlinks[link_type], relying on NeedLink.__eq__. Since NeedLink is a frozen dataclass, equality is structural — two NeedLink(id="X", part=None) are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom __eq__ or a separate dedup key.

`from_string()` dot-splitting edge case

NeedLink.from_string("A.B.C") splits on the first . → id="A", part="B.C". This matches current split_need_id behavior. If id_regex allows dots, there is inherent ambiguity — same as today, but documented here for awareness.

# PR: Introduce `NeedLink` structured internal representation for links ## Motivation This refactor introduces `NeedLink` — a frozen dataclass with `id` and `part` fields — as the internal representation for links and backlinks in `NeedItem`. This is a preparatory step for the **constrained link syntax** (`ADDRESS[filter_expr]`), where `NeedLink` will gain a `constraint` field to support inline validation of linked needs. Previously, links were stored internally as `dict[str, list[str]]` — flat string lists with no structure. The id/part split (`"NEED-1.part"`) was re-parsed at every usage site via `split_need_id()`. This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere. ## What changed ### New `NeedLink` dataclass ```python @DataClass(slots=True, frozen=True, kw_only=True) class NeedLink: id: str part: str | None = None ``` - `from_string("NEED-1.part")` → `NeedLink(id="NEED-1", part="part")` - `to_filter_string()` → `"NEED-1.part"` (round-trips to the original string) ### Internal storage changed | Before | After | |--------|-------| | `_links: dict[str, list[str]]` | `_links: dict[str, list[NeedLink]]` | | `_backlinks: dict[str, list[str]]` | `_backlinks: dict[str, list[NeedLink]]` | ### External API unchanged All public-facing access points (`__getitem__`, `items()`, `values()`, `get_links()`, `get_backlinks()`, `iter_links_items()`, `iter_backlinks_items()`, filter context via `{**need}`) continue to return `list[str]` through `to_filter_string()`. JSON serialization, filter evaluation, and directive processing see no change. ### `__setitem__` accepts both formats `need["links"] = ["NEED-1", NeedLink(id="NEED-1")]` — both strings and `NeedLink` instances are accepted and normalized to `NeedLink` internally. Same for `add_backlink()`. ### Validation error messages updated Internal validation messages now correctly reference `NeedLink` instances instead of strings. ### `parent_need` computed field fixed `_recompute()` now calls `to_filter_string()` on the first `parent_needs` link, since the internal list is now `list[NeedLink]` rather than `list[str]`. ## Back-compatibility considerations ### 1. Link list mutation via `__getitem__` is now a no-op (MEDIUM risk) **Before**: `need["links"]` returned the actual internal `list[str]`, so `need["links"].append("x")` mutated the stored data. **After**: `need["links"]` returns a freshly constructed `list[str]` (via list comprehension over `NeedLink` objects), so `.append()` mutates a throwaway copy. **Impact assessment**: No internal sphinx-needs code relies on this pattern. All link mutations use either: - `__setitem__` with read-copy-write: `need[k] = [*need[k], ...]` (needextend) - Typed methods: `add_backlink()`, `reset_backlinks()` - Direct `NeedPartData.backlinks` dict mutation (for parts) The only indexed mutation found (`utils.py:import_prefix_link_edit`) operates on plain dicts from JSON imports, not `NeedItem` objects. **External extensions** that relied on `need["links"].append("x")` would silently break. This is an inherent consequence of the structured internal storage. **Possible mitigation** (not yet implemented): Return `tuple` instead of `list` from `__getitem__` for link fields, turning silent failure into a loud `AttributeError`. This makes the immutability contract explicit. ### 2. `get_links()` / `get_backlinks()` now return copies (LOW risk) Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers. ### 3. `iter_links_items()` / `iter_backlinks_items()` return string lists (LOW risk) Previously yielded `(key, list[str])` where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual. ## Design decisions and future direction ### Why `to_filter_string()` excludes constraints When constraints are added (e.g. `NeedLink(id="REQ-1", constraint="status==approved")`), the filter string representation should **not** include the constraint. Filter expressions like `"REQ-1" in links` should match based on the target ID, not the constraint predicate. Constraints are a property of the link *declaration* that gets validated separately. This naturally leads to two serialization methods: - `to_filter_string()` → `"ID"` or `"ID.part"` — for filter context, backward compat - `to_string()` (future) → `"ID[constraint]"` — full round-trip representation ### `NeedLink` replaces `split_need_id()` The `split_need_id()` utility (called in `update_back_links`, `needuml`, `need_ref`, `need_outgoing`) parses `"NEED-1.part"` into `(id, part)` — exactly what `NeedLink.from_string()` does. Once `NeedLink` is propagated through the pipeline, `split_need_id()` can be removed. ### Equality semantics for deduplication `add_backlink()` deduplicates via `if backlink not in self._backlinks[link_type]`, relying on `NeedLink.__eq__`. Since `NeedLink` is a frozen dataclass, equality is structural — two `NeedLink(id="X", part=None)` are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom `__eq__` or a separate dedup key. ### `from_string()` dot-splitting edge case `NeedLink.from_string("A.B.C")` splits on the first `.` → `id="A", part="B.C"`. This matches current `split_need_id` behavior. If `id_regex` allows dots, there is inherent ambiguity — same as today, but documented here for awareness.

codecov · 2026-03-16T10:04:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.89%. Comparing base (4e10030) to head (8b7ca31).
⚠️ Report is 261 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1670      +/-   ##
==========================================
+ Coverage   86.87%   88.89%   +2.01%     
==========================================
  Files          56       71      +15     
  Lines        6532    10027    +3495     
==========================================
+ Hits         5675     8914    +3239     
- Misses        857     1113     +256

Flag	Coverage Δ
pytests	`88.89% <100.00%> (+2.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ubmarco

LGTM

chrisjsewell added 2 commits March 16, 2026 10:46

Update test_need_item_api.py

b9914c4

Update api.rst

0938bd9

chrisjsewell requested a review from ubmarco March 16, 2026 13:31

ubmarco approved these changes Mar 16, 2026

View reviewed changes

Merge branch 'master' into NeedLink

8b7ca31

ubmarco approved these changes Mar 16, 2026

View reviewed changes

chrisjsewell merged commit 9748aea into master Mar 16, 2026
25 checks passed

chrisjsewell deleted the NeedLink branch March 16, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️ Introduce `NeedLink` structured internal representation for links#1670

♻️ Introduce `NeedLink` structured internal representation for links#1670
chrisjsewell merged 4 commits intomasterfrom
NeedLink

chrisjsewell commented Mar 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

ubmarco left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisjsewell commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What changed

New NeedLink dataclass

Internal storage changed

External API unchanged

__setitem__ accepts both formats

Validation error messages updated

parent_need computed field fixed

Back-compatibility considerations

1. Link list mutation via __getitem__ is now a no-op (MEDIUM risk)

2. get_links() / get_backlinks() now return copies (LOW risk)

3. iter_links_items() / iter_backlinks_items() return string lists (LOW risk)

Design decisions and future direction

Why to_filter_string() excludes constraints

NeedLink replaces split_need_id()

Equality semantics for deduplication

from_string() dot-splitting edge case

Uh oh!

codecov Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ubmarco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrisjsewell commented Mar 16, 2026 •

edited

Loading

New `NeedLink` dataclass

`setitem` accepts both formats

`parent_need` computed field fixed

1. Link list mutation via `getitem` is now a no-op (MEDIUM risk)

2. `get_links()` / `get_backlinks()` now return copies (LOW risk)

3. `iter_links_items()` / `iter_backlinks_items()` return string lists (LOW risk)

Why `to_filter_string()` excludes constraints

`NeedLink` replaces `split_need_id()`

`from_string()` dot-splitting edge case

codecov Bot commented Mar 16, 2026 •

edited

Loading