Skip to content

feat(backtesting): wire config fields into implementation#34

Merged
w7-mgfcode merged 2 commits into
devfrom
fix/wire-backtest-config
Feb 1, 2026
Merged

feat(backtesting): wire config fields into implementation#34
w7-mgfcode merged 2 commits into
devfrom
fix/wire-backtest-config

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

@w7-mgfcode w7-mgfcode commented Feb 1, 2026

Summary

  • Add _validate_config() method to enforce settings constraints at runtime:
    • Validates n_splits does not exceed BACKTEST_MAX_SPLITS (default: 20)
    • Validates gap does not exceed BACKTEST_MAX_GAP (default: 30)
    • Logs warning if min_train_size is below BACKTEST_DEFAULT_MIN_TRAIN_SIZE (default: 30)
  • Add save_results() method using BACKTEST_RESULTS_DIR for persisting backtest results as JSON
  • Add 6 unit tests for config validation and result saving functionality

This wires the previously unused config fields from app/core/config.py (lines 50-54) into the backtesting implementation.

Test plan

  • All 21 service tests pass
  • All 101 backtesting unit tests pass
  • mypy and pyright pass with no errors
  • ruff linting passes

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Backtesting with expanding and sliding window time-series cross-validation
    • Configurable gap parameter to simulate data latency
    • Comprehensive metric suite: MAE, sMAPE, WAPE, Bias, Stability Index
    • Baseline comparisons with Naive and Seasonal Naive models
    • Data lineage recording of actuals vs. predictions per fold
    • New API endpoint: POST /backtesting/run
  • Documentation

    • Backtesting protocol specification and architecture updates
    • Testing guide with example scripts
  • Tests

    • Comprehensive unit and integration test coverage for backtesting functionality

✏️ Tip: You can customize this high-level summary in your review settings.

- Add _validate_config() to enforce settings constraints:
  - Validate n_splits <= BACKTEST_MAX_SPLITS
  - Validate gap <= BACKTEST_MAX_GAP
  - Warn if min_train_size < BACKTEST_DEFAULT_MIN_TRAIN_SIZE
- Add save_results() method using BACKTEST_RESULTS_DIR
- Add unit tests for config validation and result saving

Closes issue with unused config fields in app/core/config.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This PR introduces a complete backtesting vertical for time-series forecasting, featuring time-based cross-validation with expanding/sliding windows and gap support, a comprehensive metrics suite (MAE, sMAPE, WAPE, Bias, Stability Index), baseline model comparisons, data lineage tracking per fold, and API endpoints for backtesting orchestration and result retrieval.

Changes

Cohort / File(s) Summary
Documentation & Configuration
INITIAL-6.md, README.md, docs/ARCHITECTURE.md, docs/validation/pytest-standard.md, app/core/config.py
Added backtesting feature documentation, testing guidelines, architecture spec update, and 4 new configuration settings (backtest_max_splits, backtest_default_min_train_size, backtest_max_gap, backtest_results_dir).
Core Data Models & Schemas
app/features/backtesting/schemas.py
Introduced 7 immutable Pydantic models: SplitConfig, BacktestConfig (with config_hash method), SplitBoundary, FoldResult, ModelBacktestResult, BacktestRequest, BacktestResponse for versioned, hashable backtesting configuration and results.
Time-Series Splitting Logic
app/features/backtesting/splitter.py
Implemented TimeSeriesSplitter class supporting expanding and sliding window strategies with gap parameter for latency simulation, boundary extraction, and leakage validation; includes TimeSeriesSplit dataclass for fold metadata.
Metrics Computation
app/features/backtesting/metrics.py
Created MetricsCalculator with static methods for MAE, sMAPE, WAPE, Bias, and Stability Index (with edge-case handling for zeros, empty arrays, NaN filtering); aggregation utilities for per-fold metrics with std deviation tracking.
Backtesting Service & Orchestration
app/features/backtesting/service.py
Implemented BacktestingService to orchestrate end-to-end backtesting: data loading, per-fold train/predict/evaluate, baseline comparisons, leakage checking, result aggregation, and JSON persistence; includes SeriesData container and internal helper methods.
API Routes & Module Export
app/features/backtesting/routes.py, app/features/backtesting/__init__.py, app/main.py
Added POST /backtesting/run and GET /backtesting/results/{backtest_id} endpoints; centralized public API exports via init.py with all list; registered router in main app.
Test Fixtures & Scaffolding
app/features/backtesting/tests/conftest.py
Provided 16+ fixtures for integration testing including async DB session, HTTP client, sample store/product, 120-day calendar and sales data, date sequences, and BacktestConfig variants for expanding/sliding/gap scenarios.
Unit & Integration Tests
app/features/backtesting/tests/test_*.py (metrics, schemas, splitter, routes_integration, service, service_integration)`
Comprehensive test coverage (2050+ lines) validating metrics calculations, schema validation, splitter behavior across strategies/gaps, API integration with real DB, service orchestration, and end-to-end backtesting flows.
Example Scripts
examples/backtest/inspect_splits.py, examples/backtest/metrics_demo.py, examples/backtest/run_backtest.py
Three example scripts demonstrating TimeSeriesSplitter visualization, MetricsCalculator usage across scenarios, and end-to-end backtest execution via HTTP API with result parsing and display.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API Route
    participant BacktestingService
    participant Database
    participant TimeSeriesSplitter
    participant Model
    participant MetricsCalculator

    Client->>API Route: POST /backtesting/run (store, product, dates, config)
    API Route->>BacktestingService: run_backtest(db, config, dates)
    
    BacktestingService->>Database: _load_series_data(store_id, product_id, date_range)
    Database-->>BacktestingService: SeriesData (dates, values)
    
    BacktestingService->>TimeSeriesSplitter: split(dates, values)
    TimeSeriesSplitter-->>BacktestingService: Iterator[TimeSeriesSplit] (train/test indices per fold)
    
    loop For each fold
        BacktestingService->>Model: train(X_train, y_train)
        Model-->>BacktestingService: trained_model
        BacktestingService->>Model: predict(X_test)
        Model-->>BacktestingService: predictions
        
        BacktestingService->>MetricsCalculator: calculate_all(actuals, predictions)
        MetricsCalculator-->>BacktestingService: fold_metrics (MAE, sMAPE, WAPE, Bias)
    end
    
    BacktestingService->>MetricsCalculator: aggregate_fold_metrics(all_fold_metrics)
    MetricsCalculator-->>BacktestingService: aggregated_metrics, stability_indices
    
    BacktestingService-->>API Route: BacktestResponse (main results, baselines, comparison summary)
    API Route-->>Client: 200 OK with BacktestResponse JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • w7-learn

Poem

🐰 Hops through time with expanding grace,
Splits and gaps in data's embrace,
Metrics bloom like clover in spring,
Backtests verified, comparisons sing! 🌱✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main focus: wiring config fields (backtest_max_splits, backtest_max_gap, backtest_default_min_train_size, backtest_results_dir) from app/core/config.py into the BacktestingService implementation via _validate_config() and save_results() methods.
Docstring Coverage ✅ Passed Docstring coverage is 97.59% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/wire-backtest-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Add SIGNED_METRICS class constant to identify signed metrics (e.g., "bias")
- Update _generate_comparison_summary to use absolute values for
  percentage improvement calculations on signed metrics
- Original signed values are preserved in main/naive/seasonal_naive keys
- Add 3 unit tests for signed metric handling:
  - test_comparison_signed_metric_uses_absolute_values
  - test_comparison_signed_metric_positive_values
  - test_comparison_signed_metric_mixed_signs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@w7-mgfcode w7-mgfcode changed the base branch from main to dev February 1, 2026 05:02
@w7-mgfcode w7-mgfcode merged commit daef9ce into dev Feb 1, 2026
6 of 7 checks passed
@w7-mgfcode w7-mgfcode deleted the fix/wire-backtest-config branch February 1, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants