feat(backtesting): implement time-series backtesting module (PRP-6)#32
Merged
Conversation
Add complete backtesting infrastructure for model evaluation: - TimeSeriesSplitter with expanding/sliding window strategies and gap support - MetricsCalculator with MAE, sMAPE, WAPE, Bias, and Stability Index - BacktestingService for orchestrating backtests with baseline comparisons - POST /backtesting/run endpoint with full response schema - 95 unit tests covering schemas, splitter, metrics, and service - Example scripts for API usage, split visualization, and metrics demo Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- README.md: Add backtesting endpoint, examples, and project structure - ARCHITECTURE.md: Mark backtesting as implemented with full details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 16 integration tests that run against real PostgreSQL database: - 8 route tests for POST /backtesting/run endpoint - 8 service tests for BacktestingService._load_series_data Tests use @pytest.mark.integration marker and require docker-compose. Test data: 120 days of sequential sales (quantity = day number 1-120). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use savepoint-based transaction isolation instead of table drop/create - Fix client dependency override to use async generator - Format example files (inspect_splits.py, metrics_demo.py) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
previously approved these changes
Feb 1, 2026
Remove complex savepoint-based isolation that caused issues with FastAPI dependency injection. Use simpler session pattern that matches other working integration tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
previously approved these changes
Feb 1, 2026
- Generate unique store codes and SKUs using UUID per test - Use merge() for calendar fixture to handle existing records - Clean up test data after each test (SalesDaily, TEST-* stores/products) - Preserve shared Calendar data between tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
previously approved these changes
Feb 1, 2026
w7-learn
previously approved these changes
Feb 1, 2026
6a0becd to
606e772
Compare
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
previously approved these changes
Feb 1, 2026
…te coercion The strict=True config prevented Pydantic from automatically converting ISO date strings to date objects in JSON requests, causing 422 errors. Changed to extra="forbid" to still reject unknown fields while allowing normal type coercion. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
previously approved these changes
Feb 1, 2026
Delete calendar entries from 2024-01-01 to 2024-04-29 during test cleanup to prevent conflicts with other test modules that insert calendar records in the same date range. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
w7-learn
approved these changes
Feb 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
New Module:
app/features/backtesting/schemas.py- Pydantic schemas (SplitConfig, BacktestConfig, FoldResult, etc.)splitter.py- TimeSeriesSplitter with leakage validationmetrics.py- MetricsCalculator with edge case handlingservice.py- BacktestingService orchestratorroutes.py- FastAPI endpointTests: 95 unit tests + 16 integration tests (111 total)
test_schemas.py- Schema validation (28 tests)test_splitter.py- Splitter behavior (22 tests)test_metrics.py- Metrics calculation (28 tests)test_service.py- Service unit tests (17 tests)test_routes_integration.py- Route integration tests (8 tests)test_service_integration.py- Service integration tests (8 tests)Examples:
examples/backtest/run_backtest.py- API usageexamples/backtest/inspect_splits.py- Split visualizationexamples/backtest/metrics_demo.py- Metrics explanationDocumentation:
Test plan
🤖 Generated with Claude Code