chore: merge dev to main (Phase 1 Data Platform)#13
Conversation
* docs: add PRP-2 for data platform schema and migrations Create comprehensive Product Requirements Prompt for INITIAL-2 (Data Platform) including: - Mini-warehouse schema with dimension tables (store, product, calendar) - Fact tables (sales_daily, price_history, promotion, inventory_snapshot_daily) - SQLAlchemy 2.0 ORM model patterns with type annotations - Alembic migration guidance and constraint naming conventions - Grain protection via unique constraints for idempotent upserts - Composite indexes for time-series query optimization - Unit and integration test specifications - Example SQL queries for KPIs and exogenous feature joins Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(data-platform): implement PRP-2 schema and migrations Add mini-warehouse schema for retail demand forecasting: Dimension tables: - store: locations with code, region, city, store_type - product: catalog with sku, category, brand, base_price - calendar: time dimension with day_of_week, is_holiday Fact tables: - sales_daily: grain-protected (date, store_id, product_id) - price_history: validity windows with valid_from/valid_to - promotion: discount mechanics with pct and amount - inventory_snapshot_daily: stockout detection Key features: - SQLAlchemy 2.0 patterns (Mapped[], mapped_column()) - Grain protection via UniqueConstraint for idempotent upserts - Check constraints for data quality (positive qty, valid dates) - Composite indexes for time-range + store/product queries - Pydantic v2 schemas for API validation - 32 unit tests + 11 integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Phase 1 (Data Platform) documentation - Update README.md with migration step and database schema section - Update ARCHITECTURE.md section 5 to reflect implemented data platform - Update PHASE-index.md with Phase 1 status and version history - Create docs/PHASE/1-DATA_PLATFORM.md with detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review feedback from Sourcery - Add `from __future__ import annotations` and remove quoted type annotations - Fix SQL discount calculation to distinguish NULL (no price) from 0% discount - Add positive-path test for Calendar insertion - Remove redundant single-column indexes covered by composite indexes - Add documentation comments explaining fixture duplication for pytest discovery Addresses: Comments 1-4 and overall comments from Sourcery review Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): resolve lint and type check failures - Format models.py with ruff - Add type: ignore comments for __table__.constraints attribute access - Consolidate mypy overrides for test modules in pyproject.toml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds a Phase 1 Data Platform feature: star-schema ORM models (Store, Product, Calendar, SalesDaily, PriceHistory, Promotion, InventorySnapshotDaily), Alembic migration, Pydantic schemas, tests (unit + integration), docs, and example SQL queries and fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant Alembic as Alembic
participant App as Application (models/schemas)
participant DB as Database (Postgres)
rect rgba(200,200,255,0.5)
Dev->>Alembic: run "alembic upgrade head"
Alembic->>DB: create tables, constraints, indexes
DB-->>Alembic: schema applied
end
rect rgba(200,255,200,0.5)
Dev->>App: start application / run tests
App->>DB: connect (use ORM models)
App->>DB: read/write using models (enforce constraints)
DB-->>App: results / IntegrityErrors for violations
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/PHASE/1-DATA_PLATFORM.md`:
- Around line 60-66: The doc list uses names like StoreResponse/ProductResponse
but the code defines StoreRead/ProductRead/etc.; update the documentation to
reference the actual schema names (StoreRead, ProductRead, CalendarRead,
SalesDailyRead, PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so
the docs match the schemas, or alternatively add aliases in the schema module
(e.g., define StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so
both names are valid; pick one approach and apply it consistently across the
referenced entries.
🧹 Nitpick comments (12)
pyproject.toml (1)
112-120: Consider simplifying the module patterns.The pattern
"*.tests.*"is a glob that should already match"app.*.tests.*"and"app.features.*.tests.*". The more specific patterns may be redundant.However, MyPy's module matching can be nuanced, so keeping explicit patterns ensures coverage across different import paths. If intentional for safety, this is acceptable.
tests/conftest.py (1)
22-53: Consider using session/module scope or transaction-based isolation for performance.Creating and dropping all tables for each test function is expensive. For integration test suites with many tests, this can significantly slow down the test run.
Consider these alternatives:
- Use
scope="session"orscope="module"for the table creation/teardown, with per-test transaction rollback for isolation- Use savepoints for nested transaction isolation within a single database setup
♻️ Example using module scope with transaction isolation
`@pytest.fixture`(scope="module") async def db_engine(): """Create engine and tables once per module.""" settings = get_settings() engine = create_async_engine(settings.database_url, echo=False) async with engine.begin() as conn: await conn.run_sync(Base.metadata.create_all) yield engine async with engine.begin() as conn: await conn.run_sync(Base.metadata.drop_all) await engine.dispose() `@pytest.fixture` async def db_session(db_engine): """Create session with transaction rollback per test.""" async_session_maker = async_sessionmaker( db_engine, class_=AsyncSession, expire_on_commit=False, ) async with async_session_maker() as session: async with session.begin(): yield session await session.rollback()app/features/data_platform/tests/conftest.py (1)
20-51: Same performance consideration applies here.This fixture has the same create/drop-all-tables-per-test pattern. The recommendation from
tests/conftest.pyto use scoped fixtures with transaction isolation applies here as well.app/features/data_platform/models.py (2)
249-249: Consider adjustingNumeric(5, 4)precision fordiscount_pct.With
Numeric(5, 4), the maximum value is9.9999, but the check constraint limits the range to0-1. A precision ofNumeric(5, 4)reserves 4 decimal places (e.g.,0.1500for 15%), which is reasonable for precision but means 4 of the 5 digits are after the decimal.This works correctly but
Numeric(3, 2)(allowing0.00to0.99) would be more aligned with typical percentage representations, unless sub-percentage precision (e.g., 15.25% =0.1525) is a requirement.
294-296: Consider a check constraint to keepis_stockoutconsistent withon_hand_qty.
is_stockoutis a denormalized boolean that could become inconsistent withon_hand_qtyif updated independently. Consider adding a check constraint to enforce consistency:CheckConstraint( "(is_stockout = true AND on_hand_qty = 0) OR (is_stockout = false AND on_hand_qty > 0)", name="ck_inventory_stockout_consistency" )Alternatively, if
is_stockoutshould always be derived, consider making it a generated/computed column (PostgreSQL 12+ supportsGENERATED ALWAYS AS), or removing it and computing it at query time.PRPs/PRP-2-data-platform-schema.md (1)
193-196: Minor documentation inconsistency with actual implementation.The gotchas section recommends wildcard import (
from app.features.data_platform.models import *), but the actual implementation inalembic/env.pyuses the cleaner module import pattern (from app.features.data_platform import models as data_platform_models). Consider updating this documentation to reflect the preferred pattern used in the implementation.📝 Suggested documentation update
# CRITICAL: Alembic import models in env.py for autogenerate to work # In alembic/env.py, add: -from app.features.data_platform.models import * # noqa: F401, F403 +from app.features.data_platform import models as data_platform_models # noqa: F401app/features/data_platform/schemas.py (2)
127-147: Consider adding cross-field validation for date ranges.The schema accepts
valid_toandvalid_frombut doesn't validate thatvalid_to >= valid_from. While the database check constraint will catch this, adding a Pydanticmodel_validatorwould provide earlier feedback to API consumers.♻️ Optional: Add model validator
from pydantic import model_validator class PriceHistoryBase(BaseModel): """Base schema for price history data.""" product_id: int = Field(..., gt=0) store_id: int | None = Field(default=None, gt=0) price: Decimal = Field(..., ge=0) valid_from: date valid_to: date | None = None `@model_validator`(mode='after') def check_date_range(self) -> 'PriceHistoryBase': if self.valid_to is not None and self.valid_to < self.valid_from: raise ValueError('valid_to must be >= valid_from') return self
154-176: Same date range validation opportunity as PriceHistory.Consider adding a
model_validatorto ensureend_date >= start_datefor early API feedback, though the DB constraint will enforce this.docs/PHASE/1-DATA_PLATFORM.md (1)
199-242: Add language specifiers to fenced code blocks.Per static analysis hints, the validation result and directory structure code blocks should have language specifiers for proper rendering.
📝 Suggested fix
### Ruff (Linting + Formatting) -``` +```text All checks passed!MyPy (Static Type Checking)
-
+text
Success: no issues found in 25 source files### Pyright (Static Type Checking) -``` +```text 0 errors, 0 warnings, 0 informationsPytest
-
+text
58 passed (14 core + 32 unit + 11 integration + 1 fixture)## Directory Structure -``` +```text app/features/data_platform/ ├── __init__.pyexamples/schema/README.md (1)
145-158: Add language specifier for the relationships diagram.The ASCII diagram code block should have a language specifier for consistent markdown rendering.
📝 Suggested fix
-``` +```text ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Store │ │ Product │ │ Calendar │examples/queries/kpi_sales.sql (2)
7-18: Parameterize the date range to keep this example evergreen.Hard-coding January 2024 will go stale quickly. Consider using relative dates or placeholders so the query stays useful.
♻️ Proposed tweak
-WHERE s.date BETWEEN '2024-01-01' AND '2024-01-31' +WHERE s.date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND CURRENT_DATE
55-81: Consider left‑joining prior year to include new stores.The inner join drops stores without prior‑year data, which can hide growth for newly opened stores. A left join keeps those rows while still protecting against divide‑by‑zero.
♻️ Proposed tweak
-FROM current_year cy -JOIN prior_year py ON cy.store_id = py.store_id +FROM current_year cy +LEFT JOIN prior_year py ON cy.store_id = py.store_id JOIN store st ON cy.store_id = st.id- py.revenue AS prior_year_revenue, - ROUND((cy.revenue - py.revenue) / NULLIF(py.revenue, 0) * 100, 2) AS yoy_growth_pct + py.revenue AS prior_year_revenue, + ROUND((cy.revenue - COALESCE(py.revenue, 0)) + / NULLIF(COALESCE(py.revenue, 0), 0) * 100, 2) AS yoy_growth_pct
| - `StoreBase`, `StoreCreate`, `StoreResponse` | ||
| - `ProductBase`, `ProductCreate`, `ProductResponse` | ||
| - `CalendarBase`, `CalendarCreate`, `CalendarResponse` | ||
| - `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse` | ||
| - `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse` | ||
| - `PromotionBase`, `PromotionCreate`, `PromotionResponse` | ||
| - `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse` |
There was a problem hiding this comment.
Documentation-code naming mismatch.
The documentation references StoreResponse, ProductResponse, etc., but the actual schemas in app/features/data_platform/schemas.py use StoreRead, ProductRead, etc.
📝 Suggested fix
Base and response schemas for each model:
-- `StoreBase`, `StoreCreate`, `StoreResponse`
-- `ProductBase`, `ProductCreate`, `ProductResponse`
-- `CalendarBase`, `CalendarCreate`, `CalendarResponse`
-- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`
-- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`
-- `PromotionBase`, `PromotionCreate`, `PromotionResponse`
-- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`
+- `StoreBase`, `StoreCreate`, `StoreRead`
+- `ProductBase`, `ProductCreate`, `ProductRead`
+- `CalendarBase`, `CalendarCreate`, `CalendarRead`
+- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead`
+- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead`
+- `PromotionBase`, `PromotionCreate`, `PromotionRead`
+- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `StoreBase`, `StoreCreate`, `StoreResponse` | |
| - `ProductBase`, `ProductCreate`, `ProductResponse` | |
| - `CalendarBase`, `CalendarCreate`, `CalendarResponse` | |
| - `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse` | |
| - `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse` | |
| - `PromotionBase`, `PromotionCreate`, `PromotionResponse` | |
| - `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse` | |
| - `StoreBase`, `StoreCreate`, `StoreRead` | |
| - `ProductBase`, `ProductCreate`, `ProductRead` | |
| - `CalendarBase`, `CalendarCreate`, `CalendarRead` | |
| - `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead` | |
| - `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead` | |
| - `PromotionBase`, `PromotionCreate`, `PromotionRead` | |
| - `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead` |
🤖 Prompt for AI Agents
In `@docs/PHASE/1-DATA_PLATFORM.md` around lines 60 - 66, The doc list uses names
like StoreResponse/ProductResponse but the code defines
StoreRead/ProductRead/etc.; update the documentation to reference the actual
schema names (StoreRead, ProductRead, CalendarRead, SalesDailyRead,
PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so the docs match
the schemas, or alternatively add aliases in the schema module (e.g., define
StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so both names
are valid; pick one approach and apply it consistently across the referenced
entries.
Summary
Merges Phase 1 (Data Platform) implementation from dev to main.
Changes included:
PRs merged:
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.