You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FAO delivery is moving from point estimates to predictions-with-uncertainty (S samples per cell). As long as pandas sits on the data path, samples cannot propagate: a distribution becomes an object-dtype list-in-cell (the inherited PGMDataset representation) — the memory / encode-decode wall captured in register C-40 (and pipeline-core's own ~18 GB OOM, their #181). views-frames stores a distribution natively as a contiguous (N, S) float32 PredictionFrame.
This epic maps every pandas site in views-postprocessing and sequences the swap to native views-frames, one site at a time, each with a parity proof, disturbing cross-repo interconnectivity as little as possible.
Key enabling fact: the samples already arrive inside the forecast PGMDataset's object-dtype cells — they're collapsed to point estimates today only because unfao/frames.py hardcodes S=1 and the wires carry object-dtype parquet. So vpp can make its interior sample-aware unilaterally, while the two wire boundaries stay frozen until cross-repo coordination.
It follows the maintainer's cross-repo strategy: push pandas to the seams of each repo, then swap one site at a time with parity.
The map — three concentric rings (grounded in file:line)
Ring 0 — already pandas-free (do not touch):views_postprocessing/delivery/{coverage,identity,observed_range,provenance}.py, unfao/gaul_schema.py, unfao/source_metadata.py.
Ring 1 — vpp-owned seams (UNILATERAL):
unfao/extraction.py — the pandas→primitives seam (cells_of/months_of/drop_months_above/unmapped_cell_count).
A forecast flows through vpp's interior as a PredictionFrame carrying (N, S), producing byte-identical deliveries to today on S=1 inputs (proving the interior is sample-ready) behind frozen wires. The final outbound hop that actually ships (N,S) to FAO (S6) and the historical inbound retirement (S7) are landed only when their cross-repo gates clear.
Scope
In: the 7 sequenced steps below (S1–S7), each independently shippable with a parity proof. Out: Ring 0 (already pandas-free); pushing the enrichment metadata join into views-frames (S4 moves it off pandas, not into a frame — it is not frame algebra); changing the outbound format ahead of faoapi.
Stories (in sequence)
S1 — frames.py: generalize to_prediction_frame to S>1 [UNILATERAL] ← recommended first
S1–S3 merged: a forecast is represented internally as a (N,S)PredictionFrame; deliveries are byte-identical to today on S=1 inputs (parity).
Each landed step carries a parity proof (build pandas → convert → assert arrays equal, and/or head-to-head old-vs-new) and keeps ruff + the full suite green.
delivery/ remains pandas-free throughout (the design-contract test stays green).
Problem
The FAO delivery is moving from point estimates to predictions-with-uncertainty (S samples per cell). As long as pandas sits on the data path, samples cannot propagate: a distribution becomes an object-dtype list-in-cell (the inherited
PGMDatasetrepresentation) — the memory / encode-decode wall captured in register C-40 (and pipeline-core's own ~18 GB OOM, their #181). views-frames stores a distribution natively as a contiguous(N, S)float32PredictionFrame.This epic maps every pandas site in views-postprocessing and sequences the swap to native views-frames, one site at a time, each with a parity proof, disturbing cross-repo interconnectivity as little as possible.
Key enabling fact: the samples already arrive inside the forecast
PGMDataset's object-dtype cells — they're collapsed to point estimates today only becauseunfao/frames.pyhardcodesS=1and the wires carry object-dtype parquet. So vpp can make its interior sample-aware unilaterally, while the two wire boundaries stay frozen until cross-repo coordination.Why it matters
The map — three concentric rings (grounded in file:line)
Ring 0 — already pandas-free (do not touch):
views_postprocessing/delivery/{coverage,identity,observed_range,provenance}.py,unfao/gaul_schema.py,unfao/source_metadata.py.Ring 1 — vpp-owned seams (UNILATERAL):
unfao/extraction.py— the pandas→primitives seam (cells_of/months_of/drop_months_above/unmapped_cell_count).unfao/enrichment.py—pd.read_parquet(lookup)(:48) +.merge(...)(:117-119): a keyed metadata-attach join.unfao/frames.py— the unused views-frames adapter;_column_2d(...).reshape(-1,1)hardcodes S=1 (:85).scripts/build_gaul_lookup.py— offline lookup build (off the delivery path).Ring 2 — forced / contract boundaries (CROSS-REPO-GATED):
read_dataframe→PGMDataset(unfao.py:58-64) +_append_metadata/_validatepandas ops — gated on pipeline-core DataFrame retirement (C-40; their #186/#207/#161/#159/#211).pd.read_parquet(io.BytesIO(...))(unfao.py:125) — producer wire format.df.to_parquet(unfao.py:298,309) → Appwrite — gated on the views-faoapi wire contract (FAO delivery path is point/DataFrame-based — won't carry rusty_bucket's pooled draws #45); faoapi already rebuilds views-frames at ingestion.Desired end state
A forecast flows through vpp's interior as a
PredictionFramecarrying(N, S), producing byte-identical deliveries to today on S=1 inputs (proving the interior is sample-ready) behind frozen wires. The final outbound hop that actually ships(N,S)to FAO (S6) and the historical inbound retirement (S7) are landed only when their cross-repo gates clear.Scope
In: the 7 sequenced steps below (S1–S7), each independently shippable with a parity proof.
Out: Ring 0 (already pandas-free); pushing the enrichment metadata join into views-frames (S4 moves it off pandas, not into a frame — it is not frame algebra); changing the outbound format ahead of faoapi.
Stories (in sequence)
frames.py: generalizeto_prediction_frameto S>1[UNILATERAL]← recommended firstextraction.py: frame-native sibling readers[UNILATERAL]PredictionFrame)[UNILATERAL]enrichment.py: numpy/pyarrow keyed gather[UNILATERAL, low priority]build_gaul_lookup.py: pyarrow-native build[UNILATERAL, lowest priority][CROSS-REPO-GATED: faoapi #45 + C-40]read_dataframe/PGMDatasetretirement[CROSS-REPO-GATED: pipeline-core C-40]— do not move yetDependency sequence
Epic acceptance criteria
(N,S)PredictionFrame; deliveries are byte-identical to today on S=1 inputs (parity).delivery/remains pandas-free throughout (the design-contract test stays green).Refs
Register C-40 (the pandas gate), #45 (the delivery-side draw carrier / faoapi wire), pipeline-core #186/#207/#161/#159/#211 (their DataFrame retirement). Parity pattern:
tests/test_views_frames_conformance.py.Issues
[UNILATERAL][UNILATERAL][UNILATERAL][UNILATERAL, low pri][UNILATERAL, lowest pri][GATED: faoapi #45 + C-40][GATED: pipeline-core C-40]