Skip to content

chore: merge dev to main (Phase 1 Data Platform)#13

Merged
w7-mgfcode merged 2 commits into
mainfrom
dev
Jan 26, 2026
Merged

chore: merge dev to main (Phase 1 Data Platform)#13
w7-mgfcode merged 2 commits into
mainfrom
dev

Conversation

@w7-mgfcode
Copy link
Copy Markdown
Owner

@w7-mgfcode w7-mgfcode commented Jan 26, 2026

Summary

Merges Phase 1 (Data Platform) implementation from dev to main.

Changes included:

  • 7 SQLAlchemy 2.0 ORM models (Store, Product, Calendar, SalesDaily, PriceHistory, Promotion, InventorySnapshotDaily)
  • Alembic baseline migration
  • Pydantic v2 validation schemas
  • 59 tests (32 unit + 13 integration + 14 core)
  • Documentation updates
  • Example SQL queries

PRs merged:

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a retail forecasting data platform: 3 dimension tables and 4 fact tables, plus DB migrations and enforcement of grain/constraints.
  • Documentation

    • Updated Quick Start to run migrations; added detailed schema docs, example queries, and phase/architecture notes.
  • Tests

    • Added unit and integration tests to validate models, constraints, and migration behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

* docs: add PRP-2 for data platform schema and migrations

Create comprehensive Product Requirements Prompt for INITIAL-2 (Data Platform)
including:
- Mini-warehouse schema with dimension tables (store, product, calendar)
- Fact tables (sales_daily, price_history, promotion, inventory_snapshot_daily)
- SQLAlchemy 2.0 ORM model patterns with type annotations
- Alembic migration guidance and constraint naming conventions
- Grain protection via unique constraints for idempotent upserts
- Composite indexes for time-series query optimization
- Unit and integration test specifications
- Example SQL queries for KPIs and exogenous feature joins

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(data-platform): implement PRP-2 schema and migrations

Add mini-warehouse schema for retail demand forecasting:

Dimension tables:
- store: locations with code, region, city, store_type
- product: catalog with sku, category, brand, base_price
- calendar: time dimension with day_of_week, is_holiday

Fact tables:
- sales_daily: grain-protected (date, store_id, product_id)
- price_history: validity windows with valid_from/valid_to
- promotion: discount mechanics with pct and amount
- inventory_snapshot_daily: stockout detection

Key features:
- SQLAlchemy 2.0 patterns (Mapped[], mapped_column())
- Grain protection via UniqueConstraint for idempotent upserts
- Check constraints for data quality (positive qty, valid dates)
- Composite indexes for time-range + store/product queries
- Pydantic v2 schemas for API validation
- 32 unit tests + 11 integration tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add Phase 1 (Data Platform) documentation

- Update README.md with migration step and database schema section
- Update ARCHITECTURE.md section 5 to reflect implemented data platform
- Update PHASE-index.md with Phase 1 status and version history
- Create docs/PHASE/1-DATA_PLATFORM.md with detailed phase documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review feedback from Sourcery

- Add `from __future__ import annotations` and remove quoted type annotations
- Fix SQL discount calculation to distinguish NULL (no price) from 0% discount
- Add positive-path test for Calendar insertion
- Remove redundant single-column indexes covered by composite indexes
- Add documentation comments explaining fixture duplication for pytest discovery

Addresses: Comments 1-4 and overall comments from Sourcery review

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(ci): resolve lint and type check failures

- Format models.py with ruff
- Add type: ignore comments for __table__.constraints attribute access
- Consolidate mypy overrides for test modules in pyproject.toml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Gabe@w7dev <gabor@w7-7.net>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 26, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds a Phase 1 Data Platform feature: star-schema ORM models (Store, Product, Calendar, SalesDaily, PriceHistory, Promotion, InventorySnapshotDaily), Alembic migration, Pydantic schemas, tests (unit + integration), docs, and example SQL queries and fixtures.

Changes

Cohort / File(s) Summary
PRP & Planning Docs
PRPs/PRP-2-data-platform-schema.md, docs/PHASE/1-DATA_PLATFORM.md, docs/PHASE-index.md, docs/ARCHITECTURE.md
New PRP and Phase 1 documentation describing schema, goals, tasks, deliverables, and status.
ORM Models & Package
app/features/data_platform/__init__.py, app/features/data_platform/models.py
Adds SQLAlchemy 2.0 models for 3 dimensions and 4 fact tables with PKs, FKs, composite unique grain constraints, check constraints, relationships, and indexes.
Alembic Migration
alembic/env.py, alembic/versions/e1165ebcef61_create_data_platform_tables.py
Registers models for autogenerate and provides an upgrade/downgrade migration creating all tables, constraints, and indexes.
Pydantic Schemas
app/features/data_platform/schemas.py
Adds Base/Create/Read Pydantic models (ORM mapping via from_attributes) and field validations for each entity.
Tests & Fixtures
app/features/data_platform/tests/__init__.py, app/features/data_platform/tests/conftest.py, app/features/data_platform/tests/test_constraints.py, app/features/data_platform/tests/test_models.py, tests/conftest.py
New async test fixtures (db_session, sample entities) and comprehensive unit/integration tests validating constraints, relationships, and model surface. Root test conftest extended to support async DB sessions.
Examples & Schema README
examples/schema/README.md, examples/queries/exog_join.sql, examples/queries/kpi_sales.sql
Adds schema README and multiple example queries (exogenous joins, feature engineering, KPI queries).
Project Docs & Config
README.md, pyproject.toml
README updated to include migration step and schema refs; MyPy overrides merged/expanded for test modules.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer
  participant Alembic as Alembic
  participant App as Application (models/schemas)
  participant DB as Database (Postgres)

  rect rgba(200,200,255,0.5)
    Dev->>Alembic: run "alembic upgrade head"
    Alembic->>DB: create tables, constraints, indexes
    DB-->>Alembic: schema applied
  end

  rect rgba(200,255,200,0.5)
    Dev->>App: start application / run tests
    App->>DB: connect (use ORM models)
    App->>DB: read/write using models (enforce constraints)
    DB-->>App: results / IntegrityErrors for violations
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hopping through migrations bright,

Seven tables in the night,
Constraints snug, indexes tight,
Schemas, tests — a tasty bite,
Forecasts bloom beneath moonlight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: merging Phase 1 (Data Platform implementation) from dev to main, which aligns with the changeset's substantial introduction of ORM models, schemas, migrations, tests, and documentation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/PHASE/1-DATA_PLATFORM.md`:
- Around line 60-66: The doc list uses names like StoreResponse/ProductResponse
but the code defines StoreRead/ProductRead/etc.; update the documentation to
reference the actual schema names (StoreRead, ProductRead, CalendarRead,
SalesDailyRead, PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so
the docs match the schemas, or alternatively add aliases in the schema module
(e.g., define StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so
both names are valid; pick one approach and apply it consistently across the
referenced entries.
🧹 Nitpick comments (12)
pyproject.toml (1)

112-120: Consider simplifying the module patterns.

The pattern "*.tests.*" is a glob that should already match "app.*.tests.*" and "app.features.*.tests.*". The more specific patterns may be redundant.

However, MyPy's module matching can be nuanced, so keeping explicit patterns ensures coverage across different import paths. If intentional for safety, this is acceptable.

tests/conftest.py (1)

22-53: Consider using session/module scope or transaction-based isolation for performance.

Creating and dropping all tables for each test function is expensive. For integration test suites with many tests, this can significantly slow down the test run.

Consider these alternatives:

  1. Use scope="session" or scope="module" for the table creation/teardown, with per-test transaction rollback for isolation
  2. Use savepoints for nested transaction isolation within a single database setup
♻️ Example using module scope with transaction isolation
`@pytest.fixture`(scope="module")
async def db_engine():
    """Create engine and tables once per module."""
    settings = get_settings()
    engine = create_async_engine(settings.database_url, echo=False)
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
    yield engine
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
    await engine.dispose()


`@pytest.fixture`
async def db_session(db_engine):
    """Create session with transaction rollback per test."""
    async_session_maker = async_sessionmaker(
        db_engine,
        class_=AsyncSession,
        expire_on_commit=False,
    )
    async with async_session_maker() as session:
        async with session.begin():
            yield session
            await session.rollback()
app/features/data_platform/tests/conftest.py (1)

20-51: Same performance consideration applies here.

This fixture has the same create/drop-all-tables-per-test pattern. The recommendation from tests/conftest.py to use scoped fixtures with transaction isolation applies here as well.

app/features/data_platform/models.py (2)

249-249: Consider adjusting Numeric(5, 4) precision for discount_pct.

With Numeric(5, 4), the maximum value is 9.9999, but the check constraint limits the range to 0-1. A precision of Numeric(5, 4) reserves 4 decimal places (e.g., 0.1500 for 15%), which is reasonable for precision but means 4 of the 5 digits are after the decimal.

This works correctly but Numeric(3, 2) (allowing 0.00 to 0.99) would be more aligned with typical percentage representations, unless sub-percentage precision (e.g., 15.25% = 0.1525) is a requirement.


294-296: Consider a check constraint to keep is_stockout consistent with on_hand_qty.

is_stockout is a denormalized boolean that could become inconsistent with on_hand_qty if updated independently. Consider adding a check constraint to enforce consistency:

CheckConstraint(
    "(is_stockout = true AND on_hand_qty = 0) OR (is_stockout = false AND on_hand_qty > 0)",
    name="ck_inventory_stockout_consistency"
)

Alternatively, if is_stockout should always be derived, consider making it a generated/computed column (PostgreSQL 12+ supports GENERATED ALWAYS AS), or removing it and computing it at query time.

PRPs/PRP-2-data-platform-schema.md (1)

193-196: Minor documentation inconsistency with actual implementation.

The gotchas section recommends wildcard import (from app.features.data_platform.models import *), but the actual implementation in alembic/env.py uses the cleaner module import pattern (from app.features.data_platform import models as data_platform_models). Consider updating this documentation to reflect the preferred pattern used in the implementation.

📝 Suggested documentation update
 # CRITICAL: Alembic import models in env.py for autogenerate to work
 # In alembic/env.py, add:
-from app.features.data_platform.models import *  # noqa: F401, F403
+from app.features.data_platform import models as data_platform_models  # noqa: F401
app/features/data_platform/schemas.py (2)

127-147: Consider adding cross-field validation for date ranges.

The schema accepts valid_to and valid_from but doesn't validate that valid_to >= valid_from. While the database check constraint will catch this, adding a Pydantic model_validator would provide earlier feedback to API consumers.

♻️ Optional: Add model validator
from pydantic import model_validator

class PriceHistoryBase(BaseModel):
    """Base schema for price history data."""

    product_id: int = Field(..., gt=0)
    store_id: int | None = Field(default=None, gt=0)
    price: Decimal = Field(..., ge=0)
    valid_from: date
    valid_to: date | None = None

    `@model_validator`(mode='after')
    def check_date_range(self) -> 'PriceHistoryBase':
        if self.valid_to is not None and self.valid_to < self.valid_from:
            raise ValueError('valid_to must be >= valid_from')
        return self

154-176: Same date range validation opportunity as PriceHistory.

Consider adding a model_validator to ensure end_date >= start_date for early API feedback, though the DB constraint will enforce this.

docs/PHASE/1-DATA_PLATFORM.md (1)

199-242: Add language specifiers to fenced code blocks.

Per static analysis hints, the validation result and directory structure code blocks should have language specifiers for proper rendering.

📝 Suggested fix
 ### Ruff (Linting + Formatting)
-```
+```text
 All checks passed!

MyPy (Static Type Checking)

- +text
Success: no issues found in 25 source files


### Pyright (Static Type Checking)
-```
+```text
0 errors, 0 warnings, 0 informations

Pytest

- +text
58 passed (14 core + 32 unit + 11 integration + 1 fixture)


## Directory Structure

-```
+```text
app/features/data_platform/
├── __init__.py
examples/schema/README.md (1)

145-158: Add language specifier for the relationships diagram.

The ASCII diagram code block should have a language specifier for consistent markdown rendering.

📝 Suggested fix
-```
+```text
 ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
 │    Store     │     │   Product    │     │   Calendar   │
examples/queries/kpi_sales.sql (2)

7-18: Parameterize the date range to keep this example evergreen.

Hard-coding January 2024 will go stale quickly. Consider using relative dates or placeholders so the query stays useful.

♻️ Proposed tweak
-WHERE s.date BETWEEN '2024-01-01' AND '2024-01-31'
+WHERE s.date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND CURRENT_DATE

55-81: Consider left‑joining prior year to include new stores.

The inner join drops stores without prior‑year data, which can hide growth for newly opened stores. A left join keeps those rows while still protecting against divide‑by‑zero.

♻️ Proposed tweak
-FROM current_year cy
-JOIN prior_year py ON cy.store_id = py.store_id
+FROM current_year cy
+LEFT JOIN prior_year py ON cy.store_id = py.store_id
 JOIN store st ON cy.store_id = st.id
-    py.revenue AS prior_year_revenue,
-    ROUND((cy.revenue - py.revenue) / NULLIF(py.revenue, 0) * 100, 2) AS yoy_growth_pct
+    py.revenue AS prior_year_revenue,
+    ROUND((cy.revenue - COALESCE(py.revenue, 0))
+          / NULLIF(COALESCE(py.revenue, 0), 0) * 100, 2) AS yoy_growth_pct

Comment on lines +60 to +66
- `StoreBase`, `StoreCreate`, `StoreResponse`
- `ProductBase`, `ProductCreate`, `ProductResponse`
- `CalendarBase`, `CalendarCreate`, `CalendarResponse`
- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`
- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`
- `PromotionBase`, `PromotionCreate`, `PromotionResponse`
- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation-code naming mismatch.

The documentation references StoreResponse, ProductResponse, etc., but the actual schemas in app/features/data_platform/schemas.py use StoreRead, ProductRead, etc.

📝 Suggested fix
 Base and response schemas for each model:
-- `StoreBase`, `StoreCreate`, `StoreResponse`
-- `ProductBase`, `ProductCreate`, `ProductResponse`
-- `CalendarBase`, `CalendarCreate`, `CalendarResponse`
-- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`
-- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`
-- `PromotionBase`, `PromotionCreate`, `PromotionResponse`
-- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`
+- `StoreBase`, `StoreCreate`, `StoreRead`
+- `ProductBase`, `ProductCreate`, `ProductRead`
+- `CalendarBase`, `CalendarCreate`, `CalendarRead`
+- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead`
+- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead`
+- `PromotionBase`, `PromotionCreate`, `PromotionRead`
+- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `StoreBase`, `StoreCreate`, `StoreResponse`
- `ProductBase`, `ProductCreate`, `ProductResponse`
- `CalendarBase`, `CalendarCreate`, `CalendarResponse`
- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`
- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`
- `PromotionBase`, `PromotionCreate`, `PromotionResponse`
- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`
- `StoreBase`, `StoreCreate`, `StoreRead`
- `ProductBase`, `ProductCreate`, `ProductRead`
- `CalendarBase`, `CalendarCreate`, `CalendarRead`
- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead`
- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead`
- `PromotionBase`, `PromotionCreate`, `PromotionRead`
- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead`
🤖 Prompt for AI Agents
In `@docs/PHASE/1-DATA_PLATFORM.md` around lines 60 - 66, The doc list uses names
like StoreResponse/ProductResponse but the code defines
StoreRead/ProductRead/etc.; update the documentation to reference the actual
schema names (StoreRead, ProductRead, CalendarRead, SalesDailyRead,
PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so the docs match
the schemas, or alternatively add aliases in the schema module (e.g., define
StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so both names
are valid; pick one approach and apply it consistently across the referenced
entries.

w7-learn
w7-learn previously approved these changes Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants