Skip to content

This project builds a modern data warehouse on Google BigQuery using dbt. It transforms raw data from the Wide World Importers sample dataset into a star schema, featuring a central fact table for sales orders and several conformed dimensions (such as customer, product, and date).

Notifications You must be signed in to change notification settings

tranthe170/Data-Warehouse-Project

Repository files navigation

Data Warehouse Project: Sales Orders in BigQuery with dbt

This repository demonstrates how to build a data warehouse using Google BigQuery and dbt, with a star schema design inspired by Kimball’s Dimensional Modeling. It showcases an end-to-end pipeline—from raw data ingestion to a curated dimensional model—focused on the Sales Orders domain from the Wide World Importers sample dataset.


Table of Contents


Overview

This project demonstrates core data warehousing skills and best practices:

  • Dimensional Modeling using facts and dimensions.
  • End-to-End Pipeline using dbt for data transformation.
  • Data Quality Checks leveraging dbt’s testing framework.

The final output is a star schema centered around the fact_sales_order table, with related dimension tables such as dim_customer, dim_person, dim_product, and more.


Project Highlights

  • Kimball Dimensional Modeling
    Designed conformed dimensions (e.g., Customer, Product, Person) and fact tables (Sales Orders) to support analytical queries.

  • dbt (Data Build Tool)

    • Automates SQL transformations.
    • Manages dependencies between staging, intermediate, and analytics layers via {{ ref('...') }}.
    • Simplifies documentation and testing.
  • Data Quality & Tests

    • Utilizes dbt’s built-in testing features to enforce not null, unique, and referential integrity constraints.
    • Supports custom tests for additional data quality checks.
  • Deployment on BigQuery

    • Leverages BigQuery’s scalable infrastructure for efficient data processing.
    • Integrates easily with Google Cloud Storage for raw data ingestion.

Data Model

Below is a high-level star schema diagram of the Sales Orders model:

Sales Order Model

Fact Table

  • fact_sales_order – Contains metrics and foreign keys to related dimensions (e.g., customer_key, product_key).

Dimension Tables

  • dim_customer – Contains customer attributes (e.g., name, credit limit, address).
  • dim_person – Contains details about people involved (e.g., salespersons, contact persons).
  • dim_product – Contains product attributes (e.g., brand, supplier, category).
  • dim_category, dim_supplier, dim_color, etc. – Additional dimensions for product hierarchy and reference data.
  • dim_date – Provides date-related attributes for time-based analysis.
  • dim_sales_order_line_indicator – A junk dimension used to store miscellaneous sales order line indicators.
  • dim_customer_attribute – Aggregates customer segmentation attributes from the fact sales order line data.
    • This dimension calculates key performance metrics (e.g., lifetime monetary amount, lifetime sales order quantity, last month metrics) and assigns segmentation labels (High, Medium, Low) based on percentile ranks.

How to Check This Project


Additional Info


References

About

This project builds a modern data warehouse on Google BigQuery using dbt. It transforms raw data from the Wide World Importers sample dataset into a star schema, featuring a central fact table for sales orders and several conformed dimensions (such as customer, product, and date).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published