# **Predictive Maintenance Datamart Notebook**

**📝 Dataset Overview:**  

These datasets supports the analysis of machine performance and predictiive maintenance by integrating sensor telemetry, error logs, maintenance actions, and failure events. The analysis can help predict failures, optimize maintenance schedules and reduce machine downtime.

**📝 Data Source:**  

The source datasets include:

| Source Table      | Description                                                                                          |
|-------------------|------------------------------------------------------------------------------------------------------|
| `PdM_errors`      | Contains error logs with timestamps and machine IDs.                                                 |
| `PdM_failures`    | Tracks machine failures with timestamps and failure types.                                           |
| `PdM_machines`    | Details about the machines, such as ID, age, and type.                                               |
| `PdM_maint`       | Maintenance records including timestamps and activity types.                                         |
| `PdM_telemetry`   | Time-series data with sensor readings for each machine, such as voltage, rotation, pressure, and vibration. |  



The datasets are sourced from Microsoft Azure Predictive Maintenance Dataset. It can be found on [Kaggle](https://www.kaggle.com/datasets/arnabbiswas1/microsoft-azure-predictive-maintenance/data).  


## 🖥️ **Steps to Mart Creation**


### **📍1. Schema Creation**

In this project, schemas are being created to logically separate and manage data at different stages of the data processing lifecycle. Each schema serves a specific purpose to ensure data integrity, performance, and maintainability. 

| Schema            | Description                                                                                       |
|--------------------|---------------------------------------------------------------------------------------------------|
| `stg` (staging)  | Acts as a temporary area to store raw data before transformation.                            |
| `dim` (dimension)| Stores descriptive or reference data used for categorization and filtering.                 |
| `f` (fact)       | Centralizes quantitative data like failure counts, error counts, and average telemetry readings. |
 

SQL scripts will be used to define the schema and tables using DDL (Data Definition Language).

```
-- Creating Dimension Schema
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'dim' ) 
BEGIN
	EXEC sp_executesql N'CREATE SCHEMA dim AUTHORIZATION dbo;'
END
;

GO
-- Creating Staging Schema
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'stg' ) 
BEGIN
	EXEC sp_executesql N'CREATE SCHEMA stg AUTHORIZATION dbo;'
END
;

GO
-- Creating Fact Schema
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'f' ) 
BEGIN
	EXEC sp_executesql N'CREATE SCHEMA f AUTHORIZATION dbo;'
END
;

GO

```

<span style="color: var(--vscode-foreground);">The datamart will be constructed as the following Entity Relationship Diagram and follow a star schema:</span>

<span style="color: var(--vscode-foreground);"><br></span>

<span style="color: var(--vscode-foreground);"><br></span>

### **📍2. Dimension tables creation**

Tables Creation:

| Dimension Table     | Description                                                                          |
|---------------------|--------------------------------------------------------------------------------------|
| `dim.Machines`      | Machine details like Machine ID, model, and age.                                    |
| `dim.Errors`        | Logs machine errors.                                                                |
| `dim.Failures`      | Logs machine failures.                                                              |
| `dim.Maintenance`   | Tracks maintenance history for machine components.                                  |
| `dim.Calendar`      | Time dimension for analysis, including date, month, and year.                       |




### **📍3. Fact table creation**

`f.Telemetry`: fact table that aggregates telemetry performance metrics over time

Sensor Data Aggregation:
Calculate averages, maximums, or trends (e.g., daily average temperature per machine).
Correlations with Failures or Errors:
Link telemetry data to failures, errors, or maintenance events to identify patterns.
In these cases, telemetry becomes a fact table, as it stores measurable events linked to dimensions like machines and time.

 telemetry fact table might look:


TelemetryID	INT (PK)	Unique identifier for the telemetry record.
fkMachineID	INT (FK)	Foreign key linking to dim.Machines.
fkCalendarID	INT (FK)	Foreign key linking to dim.Calendar.
Temperature	FLOAT	Temperature reading from sensors.
Vibration	FLOAT	Vibration level reading from sensors.
Pressure	FLOAT	Pressure level reading from sensors.
Timestamp	DATETIME	Timestamp of the telemetry reading.

Granular Data: Each row represents a telemetry event (e.g., sensor reading at a specific time).
Measurable Metrics: Contains numerical data like temperature, pressure, or vibration.
Aggregation Potential: You can aggregate telemetry data to create trends or summaries.

Fact Table: f.AggregatedTelemetry
Column Name	Data Type	Description
fkMachineID	INT (FK)	Foreign key linking to dim.Machines.
fkCalendarID	INT (FK)	Foreign key linking to dim.Calendar.
AvgTemperature	FLOAT	Average temperature over a period.
AvgVibration	FLOAT	Average vibration level over a period.
MaxPressure	FLOAT	Maximum pressure over a period.


The full SQL scripts for the datamart creation can be found \[here\].

## 🖥️ **Data Loading**

A test database, called Sandbox, was created to run the SQL scripts and ensure it is runable to load data into the datamart.  


### **📍1. Load data into staging tables**


text


### **📍2. Loading Dimension tables**

Loading Dimension tables

dim.Machines

```
INSERT INTO dim.Machines(MachineID, Age, Model)
    SELECT MachineID
          ,Age 
          ,nwc.City
          ,nwc.Country
          ,CAST(getdate() as DATE)
          ,concat('Country = ', nwc.Country) 
    FROM stg.Machines
    WHERE nwc.CustomerID not in (SELECT CustomerID FROM dim.Customers)
;
GO

```

The full SQL scripts for loading datamart creation can be found \[here\].

_References_

_All emojis in this notebook are sourced from [emojicopy](https:\emojicopy.com\)._

_The datamart was created based on the principles of Star Schema Design as outlined by [Ralph Kimball](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/)._