Skip to content

S2 — extraction.py: frame-native sibling readers #87

Description

@Polichinel

Epic: #85 · S2 · [UNILATERAL] · depends on #86

Background

unfao/extraction.py is the pandas→primitives seam that feeds the representation-free delivery/ invariants. Its readers (cells_of, months_of, drop_months_above, unmapped_cell_count) currently read a pandas frame via _level_or_column(df, ...) (extraction.py:33-45). To let a PredictionFrame feed the same invariants, the seam needs frame-native counterparts.

Work

  • Add frame-native sibling readers that take a PredictionFrame and return the same primitives: cells from index.unit, months from index.time, the clip decision, etc.
  • Add siblings, do not replace — per the module's own design note there is deliberately no Extractor Protocol (pandas and frames do not coexist at runtime; a polymorphic interface would be speculative).
  • file_metadata is an Appwrite-record normalizer (not pandas) — unchanged.
  • Note: unmapped_cell_count inspects metadata nullness, which a value-only PredictionFrame does not carry — its frame-native form is coupled to where the enriched metadata lives (defer that part to align with S4; this story covers the value/index readers).

Acceptance criteria

  • Frame-native siblings exist for the value/index readers and return identical primitive types (set[int], int64 array).
  • No caller changes (pure addition); delivery/ stays pandas-free; existing pandas readers untouched.

Parity / validation

Golden equivalence test: express one dataset both as the manager's MultiIndex pandas frame and as the S1 PredictionFrame; assert cells_of(df) == cells_of_frame(pf), np.array_equal(months_of(df), months_of_frame(pf)), and matching surviving (time, unit) rows after the clip.

Dependencies

Depends on S1 (needs PredictionFrame with S>1). Blocks S3.

Files

views_postprocessing/unfao/extraction.py, tests/test_extraction.py (+ the conformance/equivalence test).


Review amendment (expert-code-review, 2026-06-28) — file placement

Put the frame-native readers in their own module (e.g. views_postprocessing/unfao/frame_extraction.py), not appended to extraction.py.

  • Why: extraction.py's one concept is the pandas seam; mixing pandas readers and frame readers in one file makes it a two-representation "dumping ground" (violates one-concept-per-file / screaming architecture, blurs SRP — each seam should have one reason to change).
  • Bonus: when the pandas seam is eventually retired, it becomes a clean file-delete, not a surgical edit.
  • Acceptance criteria addition:
    • Frame readers live in a dedicated module; extraction.py (the pandas seam) is untouched except where genuinely shared.

✅ Approved design — Option D seam ownership (2026-06-28)

The frame-native module (its own file per the amendment above) owns the pandas→primitives sample unpacking — turning the forecast's pandas representation into the explicit (N, S) float32 array that frames.py (#86) then wraps.

  • It is told the sample layout (which columns / which sample column), via a declared parameter — it does not sniff dtypes to infer S. Fail loud if the input doesn't match the declared layout.
  • This keeps the declare-don't-guess boundary clean: the seam is the one place that knows the messy pandas layout; frames.py only ever sees declared primitives.
  • WET: support only the one real layout (defined by rusty_bucket #143); raise on anything else. Until #143 defines it, the seam provides the index/point primitives; the sample-array unpacker is added when the layout is declared.

Metadata

Metadata

Assignees

No one assigned

    Labels

    implementationCode implementation workstoryA single reviewable unit of an epic

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions