feat(backtesting): implement time-series backtesting module (PRP-6) by w7-mgfcode · Pull Request #32 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-02-01T03:54:04Z

Summary

Implement complete backtesting infrastructure for time-series model evaluation (PRP-6)
Add TimeSeriesSplitter with expanding/sliding window strategies and configurable gap parameter
Add MetricsCalculator with MAE, sMAPE (0-200), WAPE, Bias, and Stability Index
Add BacktestingService for orchestrating backtests with mandatory baseline comparisons
Add POST /backtesting/run endpoint with full response schema
Add comprehensive integration tests for routes and service layer

Changes

New Module: app/features/backtesting/

schemas.py - Pydantic schemas (SplitConfig, BacktestConfig, FoldResult, etc.)
splitter.py - TimeSeriesSplitter with leakage validation
metrics.py - MetricsCalculator with edge case handling
service.py - BacktestingService orchestrator
routes.py - FastAPI endpoint

Tests: 95 unit tests + 16 integration tests (111 total)

test_schemas.py - Schema validation (28 tests)
test_splitter.py - Splitter behavior (22 tests)
test_metrics.py - Metrics calculation (28 tests)
test_service.py - Service unit tests (17 tests)
test_routes_integration.py - Route integration tests (8 tests)
test_service_integration.py - Service integration tests (8 tests)

Examples:

examples/backtest/run_backtest.py - API usage
examples/backtest/inspect_splits.py - Split visualization
examples/backtest/metrics_demo.py - Metrics explanation

Documentation:

Updated README.md with testing section
Updated docs/validation/pytest-standard.md with integration test patterns

Test plan

🤖 Generated with Claude Code

Add complete backtesting infrastructure for model evaluation: - TimeSeriesSplitter with expanding/sliding window strategies and gap support - MetricsCalculator with MAE, sMAPE, WAPE, Bias, and Stability Index - BacktestingService for orchestrating backtests with baseline comparisons - POST /backtesting/run endpoint with full response schema - 95 unit tests covering schemas, splitter, metrics, and service - Example scripts for API usage, split visualization, and metrics demo Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

sourcery-ai

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

coderabbitai · 2026-02-01T03:54:10Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/prp-6-backtesting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- README.md: Add backtesting endpoint, examples, and project structure - ARCHITECTURE.md: Mark backtesting as implemented with full details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add 16 integration tests that run against real PostgreSQL database: - 8 route tests for POST /backtesting/run endpoint - 8 service tests for BacktestingService._load_series_data Tests use @pytest.mark.integration marker and require docker-compose. Test data: 120 days of sequential sales (quantity = day number 1-120). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Use savepoint-based transaction isolation instead of table drop/create - Fix client dependency override to use async generator - Format example files (inspect_splits.py, metrics_demo.py) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove complex savepoint-based isolation that caused issues with FastAPI dependency injection. Use simpler session pattern that matches other working integration tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Generate unique store codes and SKUs using UUID per test - Use merge() for calendar fixture to handle existing records - Clean up test data after each test (SalesDaily, TEST-* stores/products) - Preserve shared Calendar data between tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…te coercion The strict=True config prevented Pydantic from automatically converting ISO date strings to date objects in JSON requests, causing 422 errors. Changed to extra="forbid" to still reject unknown fields while allowing normal type coercion. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Delete calendar entries from 2024-01-01 to 2024-04-29 during test cleanup to prevent conflicts with other test modules that insert calendar records in the same date range. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

w7-learn and others added 2 commits February 1, 2026 03:20

docs: update INITIAL-6.md

d552240

sourcery-ai Bot reviewed Feb 1, 2026

View reviewed changes

w7-learn and others added 4 commits February 1, 2026 03:57

docs: update documentation for backtesting module (PRP-6)

f4370d1

- README.md: Add backtesting endpoint, examples, and project structure - ARCHITECTURE.md: Mark backtesting as implemented with full details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore: update uv.lock version to 0.1.7

019a38f

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>