Skip to content

Feature/samples for fao#39

Merged
Polichinel merged 27 commits intodevelopmentfrom
feature/samples_for_fao
Mar 17, 2026
Merged

Feature/samples for fao#39
Polichinel merged 27 commits intodevelopmentfrom
feature/samples_for_fao

Conversation

@Polichinel
Copy link
Copy Markdown
Collaborator

No description provided.

Polichinel and others added 27 commits February 24, 2026 23:56
Companion change to views-baseline PR #3. Standardizes the config key
for trailing training window from "months" to "window_months" and adds
the explicit "time_steps" key (forecast horizon, previously a hidden
default of 36 in views-baseline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit `prediction_format: "dataframe"` to 54 models that
previously relied on the implicit default. Prerequisite for making
prediction_format a required key in pipeline-core (Phase 3A).

Also includes baseline config updates and calibration log refreshes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tectural risks

- Add mandatory config keys (time_steps, rolling_origin_stride) to 52+ models
- Replace exec() with importlib.util in create_catalogs.py for safer config loading
- Migrate 13 old-pattern models to ForecastingModelArgs CLI API
- Fix heat_waves/hot_stream forecasting offset bug (-2 → -1)
- Add comprehensive test suite (2029 tests):
  - Config completeness, structure, CLI pattern, partition consistency
  - Ensemble config validation and dependency checking
  - Red-team failure injection tests
- Add base_docs governance: 9 ADRs, 3 CICs, contributor protocols, standards
- Remove dead code from create_catalogs.py and purple_alien/main.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove files superseded by the test suite or no longer referenced:
- reports/archived/ (15 old training/sweep logs from Feb 2025)
- verify_architecture.py (one-off NBEATS debug investigation)
- compare_configs.py (hardcoded 3-model check, replaced by test suite)

Also add .ruff_cache/ to .gitignore and update ADR-001 ontology.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused imports (sys, re, pytest, REPO_ROOT, ALL_MODEL_DIRS)
and rename ambiguous variable `l` to `line` in test_catalogs.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shell script that trains and evaluates each model on calibration and
validation partitions, logging results per model without aborting on
failure. Supports --models, --partitions, and --timeout flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models without requirements.txt fail to install packages when run.sh
creates their conda environment. Generated from main.py imports:
- 15 models: views-r2darts2>=1.0.0,<2.0.0
- 12 models: views-baseline>=1.0.0,<2.0.0
- 6 models: views-stepshifter>=1.0.0,<2.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simplify integration test runner to activate a single conda env
(default: views_pipeline) instead of trying to use per-model envs
via run.sh. Adds --env and --exclude flags. Excludes purple_alien
by default (needs views-hydranet env).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oint_metrics

CoreConfigSniffer in views-pipeline-core v2.2.0 requires the new key
names. Updates 44 models from the old generic keys to the type-specific
format. Also removes CRPS from regression_point_metrics (not meaningful
for point estimates).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The views-r2darts2 ReproducibilityGate requires core parameters
(random_state, optimizer_cls, lr_scheduler_*, early_stopping_min_delta,
gradient_clip_val, output_chunk_length/shift) that these models were
missing. Values taken from working reference models of the same arch.

Fixed: heat_waves, good_life, elastic_heart, teenage_dirtbag,
dancing_queen, cool_cat. Also cleaned up 60 lines of commented-out
sweep results in good_life.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ReproducibilityGate checks both core and architecture-specific
parameters. After fixing core params, 4 models still had missing
arch-specific params:

- cool_cat (TiDEModel): use_reversible_instance_norm
- good_life (TransformerModel): d_model, nhead, dim_feedforward,
  activation, norm_type, use_reversible_instance_norm, detect_anomaly
- heat_waves (TFTModel): dropout, add_relative_index,
  use_static_covariates, norm_type, skip_interpolation,
  hidden_continuous_size
- elastic_heart (TSMixerModel): use_static_covariates,
  use_reversible_instance_norm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix leading space in bittersweet_symphony queryset name
- Rename targets→regression_targets in all 5 ensembles (wrap strings
  in lists)
- Rename metrics→regression_point_metrics in all 5 ensembles (remove
  CRPS — not meaningful for point estimates)
- Update ensemble test to expect new key names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atchpad

Align documentation structure with platform convention (views-r2darts2,
views-hydranet pattern):
- Move ADRs, CICs, contributor_protocols, standards from base_docs/ to docs/
- Delete docs/internal/ (leftover scratchpad from prior session)
- Delete docs/model_catalog_old_pipeline.md (superseded by README catalogs)
- Create reports/ directory for future operational outputs
- Update all internal cross-references from base_docs/ to docs/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. ReproducibilityGate param validation for all 15 darts models —
   checks 16 core params + architecture-specific params per algorithm
   (TFT: 15, Transformer: 13, TiDE: 14, TSMixer: 12, TCN: 8,
   BlockRNN: 8, NBEATS: 10). Prevents the exact bug fixed in 6 models.

2. Old-key regression tests — verify no model or ensemble uses the
   deprecated "targets" or "metrics" keys (must be "regression_targets"
   and "regression_point_metrics").

3. Ensemble-model level agreement — verify all constituent models in
   an ensemble have the same level (cm/pgm) as the ensemble itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded DARTS_CORE_PARAMS and ARCH_PARAMS with dynamic
import from views_r2darts2.infrastructure.reproducibility_gate. This
eliminates the DRY violation — param requirements are now sourced
from the canonical definition. Tests skip when views_r2darts2 is not
installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The common/ directory was created then reverted — it no longer exists.
Remove the Shared Infrastructure row from the ontology table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The common/partitions.py centralization was attempted then reverted,
but governance docs still referenced it. Updated all ADRs, protocols,
and checklist to reflect the actual architecture: per-model self-
contained partition files with test-enforced consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the 6 ranger baseline models (black, blue, green, pink, red,
yellow) by committing their config_deployment, config_partitions,
config_queryset, config_sweep, main.py, and run.sh files. Only
config_meta, config_hyperparameters, and requirements.txt were
previously committed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows running integration tests on only CM or PGM models:
  bash run_integration_tests.sh --level cm
  bash run_integration_tests.sh --level pgm

Extracts level from each model's config_meta.py via Python.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add docs/run_integration_tests.md with prerequisites, all flags,
internal mechanics, log structure, exit codes, and caveats.
Add Integration Testing section to README with quick-start examples
and options table. Add doc pointer to top of run_integration_tests.sh.
Uncomment *.txt in .gitignore to stop tracking model run log artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… package

Allows running integration tests for a single model library at a time
(baseline, stepshifter, r2darts2, hydranet) by matching views-<name> in
each model's requirements.txt. Combinable with --level.

Also fixes exit-code precedence bug where failures without timeouts
exited 0, and untracks 3 data log .txt files already covered by .gitignore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs the structural test suite (~2200 tests, <20s) on every push to
main/development and on all PRs. Requires only views_pipeline_core.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Polichinel Polichinel merged commit f178d07 into development Mar 17, 2026
1 check passed
@Polichinel Polichinel deleted the feature/samples_for_fao branch March 17, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant