
# Snowflake Schema Explanation

The Snowflake Schema is an extension of the Star Schema where dimension tables are normalized. In other words, the dimension data is divided into additional tables to eliminate redundancy and improve data integrity. This results in a structure that resembles a snowflake, hence the name.

## Components of Snowflake Schema

- **Fact Table**: Similar to the Star Schema, this is the central table in the Snowflake Schema containing quantitative metrics for analysis.
- **Dimension Tables**: These are more normalized compared to the Star Schema. For example, a Product dimension table in a Star Schema might be split into separate Product and Category tables in the Snowflake Schema.
- **Sub-Dimension Tables**: These tables are further normalizations of dimension tables, which might include hierarchies within the dimensions.

Normalization reduces redundancy and can improve data integrity but might result in more complex queries due to the increased number of joins.
    


## Snowflake Schema Example

Consider an extension of the sales system used in the Star Schema example. In the Snowflake Schema, the `ProductDim` table is split into two tables: `Product` and `ProductCategory`, to normalize the data structure.

- `Product` table now only contains product-related information.
- `ProductCategory` table contains information about product categories.
    

In [None]:

# SQL statements for creating a snowflake schema

# Create Product Table (Normalized from ProductDim)
product_sql = '''
CREATE TABLE Product (
    ProductKey INT PRIMARY KEY,
    ProductID VARCHAR(50) NOT NULL,
    ProductName VARCHAR(255) NOT NULL,
    CategoryKey INT NOT NULL
);
'''

# Create ProductCategory Table (Normalized from ProductDim)
product_category_sql = '''
CREATE TABLE ProductCategory (
    CategoryKey INT PRIMARY KEY,
    CategoryName VARCHAR(50)
);
'''

# Adjusted Customer and Date Dimension Tables remain the same as in Star Schema

# Adjusted Sales Fact Table (remains largely the same but references normalized Product table)
sales_fact_snowflake_sql = '''
CREATE TABLE SalesFact (
    SalesKey INT PRIMARY KEY,
    DateKey INT NOT NULL,
    ProductKey INT NOT NULL,
    CustomerKey INT NOT NULL,
    QuantitySold INT NOT NULL,
    TotalSaleAmount DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (DateKey) REFERENCES DateDim(DateKey),
    FOREIGN KEY (ProductKey) REFERENCES Product(ProductKey),
    FOREIGN KEY (CustomerKey) REFERENCES CustomerDim(CustomerKey)
);
'''

# Print SQL for demonstration
print(product_sql)
print(product_category_sql)
print(sales_fact_snowflake_sql)
    


## Insights from Snowflake Schema

The Snowflake Schema can support the same types of analytical queries as the Star Schema, but with potentially more joins due to the normalized structure. This can be beneficial for queries that require a more detailed drill-down into dimension hierarchies.

### Example Query

Here's an example SQL query that finds total sales amount by product category for a given year, using the Snowflake Schema:



This query involves an additional join to the `ProductCategory` table compared to a similar query in the Star Schema, due to the normalization of the product category information.
    

In [1]:

SELECT
    pc.CategoryName,
    SUM(sf.TotalSaleAmount) AS TotalSales
FROM
    SalesFact sf
JOIN
    DateDim dd ON sf.DateKey = dd.DateKey
JOIN
    Product p ON sf.ProductKey = p.ProductKey
JOIN
    ProductCategory pc ON p.CategoryKey = pc.CategoryKey
WHERE
    dd.Year = 2024
GROUP BY
    pc.CategoryName;
```

SyntaxError: invalid syntax (2524800466.py, line 1)