Skip to content

♻️ Introduce NeedLink structured internal representation for links#1670

Merged
chrisjsewell merged 4 commits intomasterfrom
NeedLink
Mar 16, 2026
Merged

♻️ Introduce NeedLink structured internal representation for links#1670
chrisjsewell merged 4 commits intomasterfrom
NeedLink

Conversation

@chrisjsewell
Copy link
Copy Markdown
Member

@chrisjsewell chrisjsewell commented Mar 16, 2026

Motivation

This refactor introduces NeedLink — a frozen dataclass with id and part fields — as the internal representation for links and backlinks in NeedItem. This is a preparatory step for the constrained link syntax (ADDRESS[filter_expr]), where NeedLink will gain a constraint field to support inline validation of linked needs.

Previously, links were stored internally as dict[str, list[str]] — flat string lists with no structure. The id/part split ("NEED-1.part") was re-parsed at every usage site via split_need_id(). This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere.

What changed

New NeedLink dataclass

@dataclass(slots=True, frozen=True, kw_only=True)
class NeedLink:
    id: str
    part: str | None = None
  • from_string("NEED-1.part")NeedLink(id="NEED-1", part="part")
  • to_filter_string()"NEED-1.part" (round-trips to the original string)

Internal storage changed

Before After
_links: dict[str, list[str]] _links: dict[str, list[NeedLink]]

External API unchanged

All public-facing access points (__getitem__, items(), values(), get_links(), get_backlinks(), iter_links_items(), iter_backlinks_items(), filter context via {**need}) continue to return list[str] through to_filter_string(). JSON serialization, filter evaluation, and directive processing see no change.

__setitem__ accepts both formats

need["links"] = ["NEED-1", NeedLink(id="NEED-1")] — both strings and NeedLink instances are accepted and normalized to NeedLink internally. Same for add_backlink().

Validation error messages updated

Internal validation messages now correctly reference NeedLink instances instead of strings.

parent_need computed field fixed

_recompute() now calls to_filter_string() on the first parent_needs link, since the internal list is now list[NeedLink] rather than list[str].

Back-compatibility considerations

1. Link list mutation via __getitem__ is now a no-op (MEDIUM risk)

Before: need["links"] returned the actual internal list[str], so need["links"].append("x") mutated the stored data.

After: need["links"] returns a freshly constructed list[str] (via list comprehension over NeedLink objects), so .append() mutates a throwaway copy.

Impact assessment: No internal sphinx-needs code relies on this pattern. All link mutations use either:

  • __setitem__ with read-copy-write: need[k] = [*need[k], ...] (needextend)
  • Typed methods: add_backlink(), reset_backlinks()
  • Direct NeedPartData.backlinks dict mutation (for parts)

The only indexed mutation found (utils.py:import_prefix_link_edit) operates on plain dicts from JSON imports, not NeedItem objects.

External extensions that relied on need["links"].append("x") would silently break. This is an inherent consequence of the structured internal storage.

Possible mitigation (not yet implemented): Return tuple instead of list from __getitem__ for link fields, turning silent failure into a loud AttributeError. This makes the immutability contract explicit.

2. get_links() / get_backlinks() now return copies (LOW risk)

Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers.

3. iter_links_items() / iter_backlinks_items() return string lists (LOW risk)

Previously yielded (key, list[str]) where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual.

Design decisions and future direction

Why to_filter_string() excludes constraints

When constraints are added (e.g. NeedLink(id="REQ-1", constraint="status==approved")), the filter string representation should not include the constraint. Filter expressions like "REQ-1" in links should match based on the target ID, not the constraint predicate. Constraints are a property of the link declaration that gets validated separately.

This naturally leads to two serialization methods:

  • to_filter_string()"ID" or "ID.part" — for filter context, backward compat
  • to_string() (future) → "ID[constraint]" — full round-trip representation

NeedLink replaces split_need_id()

The split_need_id() utility (called in update_back_links, needuml, need_ref, need_outgoing) parses "NEED-1.part" into (id, part) — exactly what NeedLink.from_string() does. Once NeedLink is propagated through the pipeline, split_need_id() can be removed.

Equality semantics for deduplication

add_backlink() deduplicates via if backlink not in self._backlinks[link_type], relying on NeedLink.__eq__. Since NeedLink is a frozen dataclass, equality is structural — two NeedLink(id="X", part=None) are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom __eq__ or a separate dedup key.

from_string() dot-splitting edge case

NeedLink.from_string("A.B.C") splits on the first .id="A", part="B.C". This matches current split_need_id behavior. If id_regex allows dots, there is inherent ambiguity — same as today, but documented here for awareness.

# PR: Introduce `NeedLink` structured internal representation for links

## Motivation

This refactor introduces `NeedLink` — a frozen dataclass with `id` and `part` fields — as the internal representation for links and backlinks in `NeedItem`. This is a preparatory step for the **constrained link syntax** (`ADDRESS[filter_expr]`), where `NeedLink` will gain a `constraint` field to support inline validation of linked needs.

Previously, links were stored internally as `dict[str, list[str]]` — flat string lists with no structure. The id/part split (`"NEED-1.part"`) was re-parsed at every usage site via `split_need_id()`. This made it impossible to attach additional metadata (like constraints or namespace prefixes) to individual link references without changing the string format everywhere.

## What changed

### New `NeedLink` dataclass

```python
@DataClass(slots=True, frozen=True, kw_only=True)
class NeedLink:
    id: str
    part: str | None = None
```

- `from_string("NEED-1.part")` → `NeedLink(id="NEED-1", part="part")`
- `to_filter_string()` → `"NEED-1.part"` (round-trips to the original string)

### Internal storage changed

| Before | After |
|--------|-------|
| `_links: dict[str, list[str]]` | `_links: dict[str, list[NeedLink]]` |
| `_backlinks: dict[str, list[str]]` | `_backlinks: dict[str, list[NeedLink]]` |

### External API unchanged

All public-facing access points (`__getitem__`, `items()`, `values()`, `get_links()`, `get_backlinks()`, `iter_links_items()`, `iter_backlinks_items()`, filter context via `{**need}`) continue to return `list[str]` through `to_filter_string()`. JSON serialization, filter evaluation, and directive processing see no change.

### `__setitem__` accepts both formats

`need["links"] = ["NEED-1", NeedLink(id="NEED-1")]` — both strings and `NeedLink` instances are accepted and normalized to `NeedLink` internally. Same for `add_backlink()`.

### Validation error messages updated

Internal validation messages now correctly reference `NeedLink` instances instead of strings.

### `parent_need` computed field fixed

`_recompute()` now calls `to_filter_string()` on the first `parent_needs` link, since the internal list is now `list[NeedLink]` rather than `list[str]`.

## Back-compatibility considerations

### 1. Link list mutation via `__getitem__` is now a no-op (MEDIUM risk)

**Before**: `need["links"]` returned the actual internal `list[str]`, so `need["links"].append("x")` mutated the stored data.

**After**: `need["links"]` returns a freshly constructed `list[str]` (via list comprehension over `NeedLink` objects), so `.append()` mutates a throwaway copy.

**Impact assessment**: No internal sphinx-needs code relies on this pattern. All link mutations use either:
- `__setitem__` with read-copy-write: `need[k] = [*need[k], ...]` (needextend)
- Typed methods: `add_backlink()`, `reset_backlinks()`
- Direct `NeedPartData.backlinks` dict mutation (for parts)

The only indexed mutation found (`utils.py:import_prefix_link_edit`) operates on plain dicts from JSON imports, not `NeedItem` objects.

**External extensions** that relied on `need["links"].append("x")` would silently break. This is an inherent consequence of the structured internal storage.

**Possible mitigation** (not yet implemented): Return `tuple` instead of `list` from `__getitem__` for link fields, turning silent failure into a loud `AttributeError`. This makes the immutability contract explicit.

### 2. `get_links()` / `get_backlinks()` now return copies (LOW risk)

Previously returned the actual internal list. Same mutation concern, but these are newer APIs with no known external callers.

### 3. `iter_links_items()` / `iter_backlinks_items()` return string lists (LOW risk)

Previously yielded `(key, list[str])` where the list was the internal reference. Now yields freshly constructed string lists. Mutation on iteration results would be unusual.

## Design decisions and future direction

### Why `to_filter_string()` excludes constraints

When constraints are added (e.g. `NeedLink(id="REQ-1", constraint="status==approved")`), the filter string representation should **not** include the constraint. Filter expressions like `"REQ-1" in links` should match based on the target ID, not the constraint predicate. Constraints are a property of the link *declaration* that gets validated separately.

This naturally leads to two serialization methods:
- `to_filter_string()` → `"ID"` or `"ID.part"` — for filter context, backward compat
- `to_string()` (future) → `"ID[constraint]"` — full round-trip representation

### `NeedLink` replaces `split_need_id()`

The `split_need_id()` utility (called in `update_back_links`, `needuml`, `need_ref`, `need_outgoing`) parses `"NEED-1.part"` into `(id, part)` — exactly what `NeedLink.from_string()` does. Once `NeedLink` is propagated through the pipeline, `split_need_id()` can be removed.

### Equality semantics for deduplication

`add_backlink()` deduplicates via `if backlink not in self._backlinks[link_type]`, relying on `NeedLink.__eq__`. Since `NeedLink` is a frozen dataclass, equality is structural — two `NeedLink(id="X", part=None)` are equal. When constraints are added, backlink deduplication should ignore constraints (backlinks don't carry forward-link constraints), which may require custom `__eq__` or a separate dedup key.

### `from_string()` dot-splitting edge case

`NeedLink.from_string("A.B.C")` splits on the first `.` → `id="A", part="B.C"`. This matches current `split_need_id` behavior. If `id_regex` allows dots, there is inherent ambiguity — same as today, but documented here for awareness.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.89%. Comparing base (4e10030) to head (8b7ca31).
⚠️ Report is 261 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1670      +/-   ##
==========================================
+ Coverage   86.87%   88.89%   +2.01%     
==========================================
  Files          56       71      +15     
  Lines        6532    10027    +3495     
==========================================
+ Hits         5675     8914    +3239     
- Misses        857     1113     +256     
Flag Coverage Δ
pytests 88.89% <100.00%> (+2.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chrisjsewell chrisjsewell requested a review from ubmarco March 16, 2026 13:31
Copy link
Copy Markdown
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chrisjsewell chrisjsewell merged commit 9748aea into master Mar 16, 2026
25 checks passed
@chrisjsewell chrisjsewell deleted the NeedLink branch March 16, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants