
# Star Schema Explanation

The Star Schema is a database schema that is optimized for querying and reporting. It consists of one or more fact tables referencing any number of dimension tables. The fact table contains the measures, metrics, or facts of a business process (e.g., sales amount, quantity), and it is surrounded by dimension tables which contain the context (e.g., date, product, customer) of the facts and measures in the fact table.

## Components of Star Schema

- **Fact Table**: Central table in a star schema. It contains quantitative data for analysis.
- **Dimension Tables**: Satellite tables linked to the fact table. They contain descriptive attributes related to the facts (e.g., time, items, location).

The schema is called a "star schema" because the diagram of such a database resembles a star, with points radiating from a center. The center represents the fact table and the points represent the dimension tables.
    


## Star Schema Example

Consider a sales system where the fact table records sales transactions. This fact table, `SalesFact`, might include columns for the date of sale, product ID, customer ID, quantity of items sold, and total sale amount. Dimension tables could include `DateDim` (with date, day, month, year), `ProductDim` (with product ID, name, category), and `CustomerDim` (with customer ID, name, contact details).
    

In [None]:

# SQL statements for creating a simple star schema

# Create Date Dimension Table
date_dim_sql = '''
CREATE TABLE DateDim (
    DateKey INT PRIMARY KEY,
    Date DATE NOT NULL,
    Year SMALLINT NOT NULL,
    Quarter SMALLINT NOT NULL,
    Month SMALLINT NOT NULL,
    Day SMALLINT NOT NULL,
    WeekdayName VARCHAR(9)
);
'''

# Create Product Dimension Table
product_dim_sql = '''
CREATE TABLE ProductDim (
    ProductKey INT PRIMARY KEY,
    ProductID VARCHAR(50) NOT NULL,
    ProductName VARCHAR(255) NOT NULL,
    Category VARCHAR(50),
    Price DECIMAL(10,2) NOT NULL
);
'''

# Create Customer Dimension Table
customer_dim_sql = '''
CREATE TABLE CustomerDim (
    CustomerKey INT PRIMARY KEY,
    CustomerID VARCHAR(50) NOT NULL,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100),
    PhoneNumber VARCHAR(20),
    Address VARCHAR(255),
    City VARCHAR(50),
    State VARCHAR(50),
    ZipCode VARCHAR(10)
);
'''

# Create Sales Fact Table
sales_fact_sql = '''
CREATE TABLE SalesFact (
    SalesKey INT PRIMARY KEY,
    DateKey INT NOT NULL,
    ProductKey INT NOT NULL,
    CustomerKey INT NOT NULL,
    QuantitySold INT NOT NULL,
    TotalSaleAmount DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (DateKey) REFERENCES DateDim(DateKey),
    FOREIGN KEY (ProductKey) REFERENCES ProductDim(ProductKey),
    FOREIGN KEY (CustomerKey) REFERENCES CustomerDim(CustomerKey)
);
'''

# Print SQL for demonstration
print(date_dim_sql)
print(product_dim_sql)
print(customer_dim_sql)
print(sales_fact_sql)
    


## Insights from Star Schema

Once the star schema is set up, we can run various analytical queries to gain insights. For example, we might want to understand sales trends over time, performance of different product categories, or buying patterns of customers.

### Example Query

Here's an example SQL query that finds total sales amount by product category for a given year:

```sql
SELECT
    pd.Category,
    SUM(sf.TotalSaleAmount) AS TotalSales
FROM
    SalesFact sf
JOIN
    DateDim dd ON sf.DateKey = dd.DateKey
JOIN
    ProductDim pd ON sf.ProductKey = pd.ProductKey
WHERE
    dd.Year = 2024
GROUP BY
    pd.Category;
```

This query joins the sales fact table with the date and product dimension tables to sum up sales amounts grouped by product category for the year 2024.
    