Merged
Conversation
Companion change to views-baseline PR #3. Standardizes the config key for trailing training window from "months" to "window_months" and adds the explicit "time_steps" key (forecast horizon, previously a hidden default of 36 in views-baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit `prediction_format: "dataframe"` to 54 models that previously relied on the implicit default. Prerequisite for making prediction_format a required key in pipeline-core (Phase 3A). Also includes baseline config updates and calibration log refreshes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tectural risks - Add mandatory config keys (time_steps, rolling_origin_stride) to 52+ models - Replace exec() with importlib.util in create_catalogs.py for safer config loading - Migrate 13 old-pattern models to ForecastingModelArgs CLI API - Fix heat_waves/hot_stream forecasting offset bug (-2 → -1) - Add comprehensive test suite (2029 tests): - Config completeness, structure, CLI pattern, partition consistency - Ensemble config validation and dependency checking - Red-team failure injection tests - Add base_docs governance: 9 ADRs, 3 CICs, contributor protocols, standards - Remove dead code from create_catalogs.py and purple_alien/main.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove files superseded by the test suite or no longer referenced: - reports/archived/ (15 old training/sweep logs from Feb 2025) - verify_architecture.py (one-off NBEATS debug investigation) - compare_configs.py (hardcoded 3-model check, replaced by test suite) Also add .ruff_cache/ to .gitignore and update ADR-001 ontology. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused imports (sys, re, pytest, REPO_ROOT, ALL_MODEL_DIRS) and rename ambiguous variable `l` to `line` in test_catalogs.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shell script that trains and evaluates each model on calibration and validation partitions, logging results per model without aborting on failure. Supports --models, --partitions, and --timeout flags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models without requirements.txt fail to install packages when run.sh creates their conda environment. Generated from main.py imports: - 15 models: views-r2darts2>=1.0.0,<2.0.0 - 12 models: views-baseline>=1.0.0,<2.0.0 - 6 models: views-stepshifter>=1.0.0,<2.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simplify integration test runner to activate a single conda env (default: views_pipeline) instead of trying to use per-model envs via run.sh. Adds --env and --exclude flags. Excludes purple_alien by default (needs views-hydranet env). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oint_metrics CoreConfigSniffer in views-pipeline-core v2.2.0 requires the new key names. Updates 44 models from the old generic keys to the type-specific format. Also removes CRPS from regression_point_metrics (not meaningful for point estimates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The views-r2darts2 ReproducibilityGate requires core parameters (random_state, optimizer_cls, lr_scheduler_*, early_stopping_min_delta, gradient_clip_val, output_chunk_length/shift) that these models were missing. Values taken from working reference models of the same arch. Fixed: heat_waves, good_life, elastic_heart, teenage_dirtbag, dancing_queen, cool_cat. Also cleaned up 60 lines of commented-out sweep results in good_life. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ReproducibilityGate checks both core and architecture-specific parameters. After fixing core params, 4 models still had missing arch-specific params: - cool_cat (TiDEModel): use_reversible_instance_norm - good_life (TransformerModel): d_model, nhead, dim_feedforward, activation, norm_type, use_reversible_instance_norm, detect_anomaly - heat_waves (TFTModel): dropout, add_relative_index, use_static_covariates, norm_type, skip_interpolation, hidden_continuous_size - elastic_heart (TSMixerModel): use_static_covariates, use_reversible_instance_norm Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix leading space in bittersweet_symphony queryset name - Rename targets→regression_targets in all 5 ensembles (wrap strings in lists) - Rename metrics→regression_point_metrics in all 5 ensembles (remove CRPS — not meaningful for point estimates) - Update ensemble test to expect new key names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atchpad Align documentation structure with platform convention (views-r2darts2, views-hydranet pattern): - Move ADRs, CICs, contributor_protocols, standards from base_docs/ to docs/ - Delete docs/internal/ (leftover scratchpad from prior session) - Delete docs/model_catalog_old_pipeline.md (superseded by README catalogs) - Create reports/ directory for future operational outputs - Update all internal cross-references from base_docs/ to docs/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. ReproducibilityGate param validation for all 15 darts models — checks 16 core params + architecture-specific params per algorithm (TFT: 15, Transformer: 13, TiDE: 14, TSMixer: 12, TCN: 8, BlockRNN: 8, NBEATS: 10). Prevents the exact bug fixed in 6 models. 2. Old-key regression tests — verify no model or ensemble uses the deprecated "targets" or "metrics" keys (must be "regression_targets" and "regression_point_metrics"). 3. Ensemble-model level agreement — verify all constituent models in an ensemble have the same level (cm/pgm) as the ensemble itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded DARTS_CORE_PARAMS and ARCH_PARAMS with dynamic import from views_r2darts2.infrastructure.reproducibility_gate. This eliminates the DRY violation — param requirements are now sourced from the canonical definition. Tests skip when views_r2darts2 is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The common/ directory was created then reverted — it no longer exists. Remove the Shared Infrastructure row from the ontology table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The common/partitions.py centralization was attempted then reverted, but governance docs still referenced it. Updated all ADRs, protocols, and checklist to reflect the actual architecture: per-model self- contained partition files with test-enforced consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the 6 ranger baseline models (black, blue, green, pink, red, yellow) by committing their config_deployment, config_partitions, config_queryset, config_sweep, main.py, and run.sh files. Only config_meta, config_hyperparameters, and requirements.txt were previously committed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows running integration tests on only CM or PGM models: bash run_integration_tests.sh --level cm bash run_integration_tests.sh --level pgm Extracts level from each model's config_meta.py via Python. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add docs/run_integration_tests.md with prerequisites, all flags, internal mechanics, log structure, exit codes, and caveats. Add Integration Testing section to README with quick-start examples and options table. Add doc pointer to top of run_integration_tests.sh. Uncomment *.txt in .gitignore to stop tracking model run log artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… package Allows running integration tests for a single model library at a time (baseline, stepshifter, r2darts2, hydranet) by matching views-<name> in each model's requirements.txt. Combinable with --level. Also fixes exit-code precedence bug where failures without timeouts exited 0, and untracks 3 data log .txt files already covered by .gitignore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs the structural test suite (~2200 tests, <20s) on every push to main/development and on all PRs. Requires only views_pipeline_core. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.