# 🧭 **1. Introduction**

---

## 👋 **1.1 About Me:**

### 🎙️ Speaker Introduction

Welcome!  


**👨‍🏫 Usama Arshad**
- Assistant Professor (Business Analytics), FAST National University, Islamabad  
- PhD in Computer Science – Blockchain & AI, Ghulam Ishaq Khan Institute  
- Research Assistant – National Yunlin University, Taiwan (AI in Healthcare)  
- President – Graduate Students Society, GIKI  
- Published Author – IEEE, Springer, Elsevier (Blockchain, AI, Cybersecurity)  
- GitHub: [github.com/usamajanjua9](https://github.com/usamajanjua9)  
- LinkedIn: [linkedin.com/in/usamajanjua9](https://linkedin.com/in/usamajanjua9)  
- Website: [usamajanjua.com](https://usamajanjua.com)

<img src="https://isb.nu.edu.pk/Images/Profile/FSM/7078.jpg" width="350" >

---





---

# **Module 5: AI for Predictive Maintenance & Asset Management**

---

# **1. Introduction to Predictive Maintenance**

Predictive Maintenance (PdM) is a technique where we **use data, sensors, and AI models to predict equipment failure *before* it happens**.
Instead of repairing machines only after they fail, PdM helps companies **take action at the perfect time**—not too early, not too late.

It is widely used in industries such as **manufacturing, energy, transportation, aviation, oil & gas, and industrial automation**.

---

# **1.1 Predictive vs Preventive vs Corrective Maintenance**

To understand Predictive Maintenance clearly, we compare it with the other common maintenance strategies.

---

## **✔ Corrective Maintenance (Fix after failure)**

**Idea:** Wait until the machine breaks, then repair it.

**Example:**
A motor stops working → you call a technician → machine stays offline.

**Pros:**

* No planning needed
* Low upfront cost

**Cons:**

* Unexpected failures
* High downtime
* Loss of productivity
* Emergency repair cost is high

---

## **✔ Preventive Maintenance (Scheduled maintenance)**

**Idea:** Maintain after a fixed time, even if the machine is fine.

**Example:**
Changing car engine oil every 5,000 km, even if it is still usable.

**Pros:**

* Reduces random failures
* Simple and predictable

**Cons:**

* May replace healthy parts
* Wastes time and money
* Not suitable for complex machines

---

## **✔ Predictive Maintenance (AI + Sensors + Prediction)**

**Idea:** Use sensor data to **predict** when a failure will happen, then perform maintenance only when needed.

**Example:**
If vibration increases beyond normal range, AI predicts that the bearing will fail in 20 days → maintenance team changes it before failure.

**Pros:**

* Reduces downtime
* Saves cost
* Extends asset life
* More accurate and data-driven

**Cons:**

* Requires sensors
* Requires data and models
* Needs skilled people initially

---

### **Easy Summary Chart**

| Strategy        | When Maintenance Happens         | Example             | Cost   | Downtime |
| --------------- | -------------------------------- | ------------------- | ------ | -------- |
| Corrective      | After failure                    | Machine breaks      | High   | High     |
| Preventive      | On a fixed schedule              | Every 3 months      | Medium | Medium   |
| Predictive (AI) | Based on real machine conditions | AI predicts failure | Low    | Very Low |

---

# **1.2 Industrial Use Cases of Predictive Maintenance**

PdM is used in almost every industry where machines exist.
Some real examples:

---

## **✔ Manufacturing Machines**

* Predicting **bearing failures** in rotating machines
* Monitoring **motor vibration and temperature**
* Preventing **conveyor belt breakdowns**
* Detecting **tool wear** in CNC machines

---

## **✔ Aviation**

* Monitoring turbofan engines
* Predicting remaining useful life (RUL) of engine parts
* Detecting unusual pressure or temperature behavior

NASA’s CMAPSS dataset is a famous example used for PdM research.

---

## **✔ Energy Sector**

* Wind turbine fault detection
* Solar farm inverter failure prediction
* Transformer oil temperature rise

---

## **✔ Transportation**

* Predicting brake failure in buses
* Railway axle and wheel monitoring
* Fleet health monitoring

---

## **✔ Oil & Gas**

* Pipeline leak prediction using pressure sensors
* Valve and pump anomaly detection
* Gas compressor failure forecasting

---

## **✔ Smart Buildings**

* Air conditioning (HVAC) equipment failure
* Generator and UPS health monitoring

---

### **Simple Conclusion:**

If something has **sensors + data + machine parts**, Predictive Maintenance can be applied.

---

# **1.3 Asset Failure Modes**

Failure Mode means **how a machine can fail**.
Understanding this helps in selecting the right sensors and AI models.

---

## **✔ Mechanical Failures**

* Bearing wear
* Gearbox damage
* Shaft misalignment
* Belt cracks
* Loose fittings

**Sensor Type:** Vibration, speed

---

## **✔ Thermal Failures**

* Overheating
* Lubrication breakdown
* Electrical overload

**Sensor Type:** Temperature, current

---

## **✔ Electrical Failures**

* Short circuits
* Voltage imbalance
* Insulation breakdown

**Sensor Type:** Voltage, current, thermal cameras

---

## **✔ Environmental Failures**

* Moisture
* Corrosion
* Dust accumulation

**Sensor Type:** Humidity, air-quality sensors

---

## **✔ Human/Operational Failures**

* Improper handling
* Overloading
* Poor maintenance schedule

---

### **Summary**

Different failure modes → different sensors → different AI models.

---

# **1.4 Predictive Maintenance Workflow**

A simple step-by-step process showing how PdM works:

---

## **Step 1: Data Collection**

* Sensors collect vibration, temperature, pressure, etc.
* Machines send readings continuously

---

## **Step 2: Data Storage**

* Data sent to cloud or local servers
* Stored in time-series format

---

## **Step 3: Feature Engineering**

* Extract meaningful values like:

  * Mean
  * Standard deviation
  * FFT frequencies
  * Health index

---

## **Step 4: Model Training**

* Use ML models to detect patterns
* Choose model type:

  * Random Forest
  * XGBoost
  * LSTM
  * Autoencoders

---

## **Step 5: Prediction**

* Model predicts:

  * Upcoming failures
  * Remaining Useful Life (RUL)
  * Anomalies

---

## **Step 6: Maintenance Decision**

* Maintenance team receives alerts
* Replace the part before failure
* Reduce downtime

---

## **Step 7: Continuous Improvement**

* Update model with new data
* Make predictions more accurate

---

### **Very Simple Diagram (Text-Based)**

```
Sensors → Data Storage → Feature Engineering → AI/ML Model →
Prediction → Maintenance Action → Feedback → Retrain Model
```

---

# **1.5 Benefits and Challenges of Predictive Maintenance**

---

## **✔ Benefits**

* **Reduced downtime**
* **Lower maintenance cost**
* **Longer asset life**
* **Higher safety**
* **Better production efficiency**
* **Maintenance only when needed**
* **Data-driven decisions**

---

## **✔ Challenges**

* Requires initial investment
* Need skilled data engineers / AI engineers
* Sensor installation cost
* Data cleaning is difficult (missing/noisy data)
* Requires proper IT infrastructure
* Needs continuous monitoring

---

### **Final Summary**

Predictive Maintenance is one of the most practical applications of AI in industry.
It saves cost, avoids failures, improves safety, and helps companies keep machines running without disruption.

---




---

# **2. Types of Data in Predictive Maintenance (PdM)**

Predictive Maintenance depends completely on **data**.
Machines produce different types of data through sensors, operations, and logs.
AI models learn patterns from this data to predict failures.

The better the data, the more accurate the predictions.

We divide PdM data into **five major categories**:

1. **Sensor Data**
2. **Operational Data**
3. **Environmental Data**
4. **Failure Logs**
5. **Maintenance Records**

Each type has its own importance.

---

# **2.1 Sensor Data**

Sensor data is the **heart** of predictive maintenance.
Sensors continuously measure the health of machine components.

Sensor readings help identify:

* Unusual vibration
* Sudden temperature increases
* Pressure drop
* Electrical current spikes
* Strange noise patterns

Below are the common sensor types and what they measure.

---

## **✔ Vibration Data**

Vibration sensors (accelerometers) measure how much a machine **shakes**.

Why important:

* Almost all rotating machines (motors, bearings, compressors, turbines) show early failure signs through vibration.
* Misalignment, imbalance, looseness, and bearing wear are all detected from vibration changes.

Example data:

* Acceleration in X, Y, Z
* Frequency peaks (FFT)

Real-world use:

* Predicting bearing failure 30+ days earlier
* Detecting gearbox damage

---

## **✔ Temperature Data**

Temperature sensors measure **heat** generated during operation.

Why important:

* Overheating is the first sign of many failures
* Lack of lubrication
* Increased friction
* Electrical overload
* Blocked ventilation

Example:

* Motor temperature rising from 65°C → 95°C indicates imminent failure.

Real-world use:

* Transformers
* Motors
* Compressors
* Engines

---

## **✔ Pressure Data**

Used mainly in:

* Hydraulic systems
* Oil & gas pipelines
* Compressors
* Pumps

Why important:

* A sudden pressure drop → leakage
* High pressure → clogging or blockage
* Pressure fluctuation → worn-out pumps or valves

Example:
A gas pipeline pressure dropping by 25% indicates leakage.

---

## **✔ Acoustic (Sound) Data**

Acoustic sensors capture **sound waves** and noise patterns.

Why important:
Machines make different sounds when:

* Bearings wear
* Valves leak
* Fans hit obstacles
* Motors are overloaded

Acoustic AI + spectrograms = Detect hidden machine problems.

Used in:

* HVAC systems
* Engines
* Fans
* Industrial pumps

---

## **✔ Voltage/Current Data**

Electrical sensors measure:

* Voltage
* Current
* Power factor
* Load imbalance
* Current spikes (inrush currents)

Why important:

* Electrical faults occur before mechanical failures
* Insulation damage
* Motor winding short circuits
* Overloading

Example:
A motor drawing 30% extra current → upcoming failure.

Used in:

* Industrial motors
* Electric vehicles
* Transformers
* UPS/Generators

---

# **2.2 Operational Data**

Operational data describes **how the machine is being used**.

It does NOT come from sensors, but from the machine’s internal control systems (PLC, SCADA, IoT platform).

---

## **✔ Load**

Indicates how much work the machine is doing.

Example:

* A motor running at 95% load is more likely to fail than one running at 60%.

If load stays high for long periods → stress → early failure.

---

## **✔ Speed / RPM**

Shows how fast rotating equipment is running.

Important for:

* Fans
* Turbines
* Compressors
* Engines

Example:
High RPM with high vibration = critical warning.

---

## **✔ Runtime**

Total time the machine has been running.

Used to compute:

* Wear rate
* Remaining usable time
* Maintenance scheduling

Example:
A pump running for 10,000 hours without service needs urgent inspection.

---

## **✔ Duty Cycles**

Duty cycle = % of time a machine is ON vs OFF.

Example:

* 80% duty cycle → machine is ON most of the time
* 20% duty cycle → machine rests often

Machines with high duty cycles fail sooner.

---

# **2.3 Environmental Data**

Environment strongly affects machine lifespan.

Common environmental readings:

* Temperature
* Humidity
* Dust levels
* Air quality
* Vibrations from surroundings
* Water leakage

Examples:

* High humidity causes corrosion
* Dust blocks filters and fans
* Very cold temperature slows lubrication flow

Real-world scenario:
A factory near a steel plant experiences heavy dust → faster cooling system failure.

---

# **2.4 Failure Logs**

Failure logs store **all past failures** of machines.

They include:

* Date of failure
* Type of failure
* Symptoms observed
* Sensor values before failure
* Root cause (if known)
* Replaced parts

Why important:
AI models use this for supervised learning.
They learn patterns that *lead to* failure.

Example:
If every bearing failure was preceded by vibration above 5g RMS → model learns this pattern.

---

# **2.5 Maintenance Records**

Maintenance records show **what maintenance actions were performed**.

They include:

* What part was replaced
* When service happened
* Work done
* Technician notes
* Cost
* Preventive vs corrective
* Tools used

Why important:

* Helps AI understand which actions solved which problems
* Helps calculate maintenance cost savings
* Helps optimize schedules

Example:
If lubrication was done every 500 hours and failures disappeared → model learns optimal intervals.

---

# **Summary: Why All These Data Types Matter**

| Data Type           | Purpose                            |
| ------------------- | ---------------------------------- |
| Sensor Data         | Detect early signs of failure      |
| Operational Data    | Understand machine stress          |
| Environmental Data  | Detect external effects            |
| Failure Logs        | Train models to recognize failures |
| Maintenance Records | Improve scheduling and reliability |

Together, they create a **complete picture** of machine health.

---





---

# **3. Feature Engineering in Predictive Maintenance**

Feature Engineering means **converting raw sensor data into meaningful information** that AI models can understand.

In Predictive Maintenance (PdM), sensor data comes in the form of **time-series signals** (continuous readings over time).
We extract features from these signals to:

* Detect early signs of machine failure
* Understand degradation patterns
* Predict Remaining Useful Life (RUL)
* Improve model accuracy

Think of feature engineering as converting messy signals into useful numbers.

---

# **3.1 Statistical Features**

These features capture **basic patterns** in the data.
They are calculated over a certain time window (e.g., the last 5 seconds of vibration data).

Statistical features are **simple but very powerful** in detecting faults.

---

## **✔ Mean**

The average value of the signal.

Why useful:

* High mean temperature → overheating
* Increasing vibration mean → wear increasing over time

Example:
If vibration mean increases slowly, it indicates bearing degradation.

---

## **✔ Standard Deviation (STD)**

Shows **how much the signal varies** around the mean.

Why useful:

* High STD = unstable machine behavior
* Low STD = stable operation

In vibration analysis, increasing STD means the machine is shaking irregularly.

---

## **✔ RMS (Root Mean Square)**

RMS measures the **energy** of the signal.

Why important in PdM:

* RMS is the most common feature for vibration data
* Higher RMS → more vibration → more wear

For rotating machines, RMS is used to detect:

* imbalance
* misalignment
* bearing wear

---

## **✔ Skewness**

Measures whether the signal is **tilted** more to the left or right.

Why useful:

* Sudden changes in skewness indicate unusual vibration patterns.

If skewness becomes positive or negative suddenly → new fault developing.

---

## **✔ Kurtosis**

Measures how **sharp or flat** the signal peaks are.

Why important:

* High kurtosis → sharp spikes → early bearing faults
* Low kurtosis → smooth signal → normal operation

Kurtosis is one of the earliest indicators of bearing damage.

---

# **3.2 Time-Domain Features**

Time-domain features are extracted from the **actual sensor readings over time**.

Examples:

* Peak value
* Crest factor
* Rise time
* Zero-crossing rate

Why useful:

* They directly capture machine behavior
* Useful for mechanical, electrical, and thermal faults
* Good for anomaly detection

Simple example:
If vibration peak suddenly increases, something is hitting inside the machine.

---

# **3.3 Frequency-Domain Features (FFT)**

Machines produce faults at **specific frequencies**.
By using **Fast Fourier Transform (FFT)**, we convert the signal from time → frequency domain.

Why FFT is useful:

* Early bearing faults create vibration at specific frequencies
* Gear defects show unique frequency patterns
* Unbalance/looseness has signature peaks

Common FFT features:

* Dominant frequency
* Harmonic peaks
* Spectral energy
* Spectral entropy

Example:
A bearing creates a “fault frequency” when damaged. FFT helps detect it.

---

# **3.4 Rolling Window Features**

Rolling windows calculate features over a **moving window** of time.

Example:
Take vibration data →
Calculate mean every 5 seconds →
Slide the window forward →
Repeat.

Helpful for:

* Tracking gradual degradation
* Smoothing noisy data
* Predicting RUL

Rolling features used:

* Rolling mean
* Rolling STD
* Rolling RMS
* Rolling FFT energy

Example:
If rolling RMS continues increasing → machine is approaching failure.

---

# **3.5 Lagged Features**

Lag features are **previous values** included as new features.

Example:

* Vibration at time t-1
* Temperature at time t-5
* Pressure at time t-10

Why useful:

* Helps models understand trends
* Captures degradation patterns
* Makes time-series models perform better

Example:
If previous 10 readings show rising temperature → overheating is near.

---

# **3.6 Sensor Fusion Features**

Sensor fusion means **combining multiple sensors** to create new features.

Why important:

* Machines have multiple types of sensors (vibration + temperature + current)
* Combining them gives a clearer health picture

Examples:

* "Thermal Stress Index" = temperature + load
* "Mechanical Stress Index" = vibration + speed
* "Energy Consumption Index" = current + runtime

Real world:
A motor may heat only when load is high + vibration is increasing → strong failure sign.

---

# **3.7 Health Index (HI)**

Health Index is a single number that represents **overall machine health**.

Ranges:

* 1 → Healthy
* 0 → Failed

Or

* 100 → Excellent
* 0 → Broken

How it's created:

* Combine multiple features
* Normalize values
* Use ML to compute a health score

Why useful:

* Easy to visualize degradation
* Good for dashboards
* Helps maintenance teams understand machine condition quickly

---

# **3.8 Remaining Useful Life (RUL) Labels**

RUL = **how long before the machine fails**.

It is the most important label in Predictive Maintenance.

Example:

* RUL = 50 hours → still fine
* RUL = 5 hours → urgent maintenance
* RUL = 0 → failure

How RUL labels are created:

1. Find the failure point in training data
2. Count backwards from failure
3. Assign RUL values for each point

Example:

| Time | Condition   | RUL |
| ---- | ----------- | --- |
| t1   | good        | 50  |
| t2   | slight wear | 40  |
| t3   | heavy wear  | 10  |
| t4   | critical    | 1   |
| t5   | failure     | 0   |

RUL helps AI models predict:

* When to schedule maintenance
* When to replace parts
* How to avoid downtime

---

# **Summary**

Feature Engineering converts raw sensor data into meaningful information.

| Feature Type             | Purpose                             |
| ------------------------ | ----------------------------------- |
| Statistical Features     | Capture basic patterns              |
| Time-Domain Features     | Detect faults directly from signals |
| Frequency Features (FFT) | Find fault frequencies              |
| Rolling Windows          | Track slow degradation              |
| Lagged Features          | Capture historical behavior         |
| Sensor Fusion            | Combine multi-sensor data           |
| Health Index             | Summarize machine condition         |
| RUL Labels               | Predict failure time                |

Together, these features help AI models understand machine behavior and make accurate predictions.

---





---

# **4. Models for Predictive Maintenance**

Predictive Maintenance uses AI/ML models to understand patterns in machine behavior and forecast failures.
Below, each model includes a **clear, technical explanation** of *what it is* and *why it is used*—but still in simple language.

---

# **4.1 Logistic Regression**

### **What it is:**

Logistic Regression is a **probabilistic classification model** that uses a mathematical function called the **sigmoid** to convert input features into a probability between 0 and 1.
It is not “regression” in the normal sense—it's actually a **binary classifier**.

It tries to find a straight-line boundary (hyperplane) that separates two classes:

* Healthy (0)
* Faulty (1)

### **Why it works for PdM:**

* Good for simple yes/no predictions
* Works well when data is linearly separable
* Fast and interpretable

---

# **4.2 Decision Trees**

### **What it is:**

A Decision Tree is a **flowchart-like model** that splits data based on the most important features.
At each step (node), it asks a question like:

> “Is vibration RMS > 5?”
> Yes → go to left
> No → go to right

It learns these rules automatically from data.

### **Why it works for PdM:**

* Very easy to interpret
* Captures nonlinear patterns
* Useful for root-cause explanations

---

# **4.3 Random Forest**

### **What it is:**

Random Forest is an **ensemble model** made of many Decision Trees.
Each tree sees different parts of the data (bagging), and all trees vote on the final output.

This reduces overfitting and increases accuracy.

### **Why it works for PdM:**

* Handles noisy sensor data
* Captures complex relationships
* Works well even with many features (vibration, temperature, RPM, etc.)

---

# **4.4 XGBoost**

### **What it is:**

XGBoost is a **gradient boosting algorithm** that builds trees sequentially.
Each new tree focuses on correcting errors made by previous trees.
It uses advanced techniques like:

* shrinkage (learning rate)
* regularization (to prevent overfitting)
* parallel processing

This makes it one of the most powerful ML models.

### **Why it works for PdM:**

* High accuracy on tabular (feature-based) data
* Handles missing data naturally
* Excellent for RUL and fault classification

---

# **4.5 SVM (Support Vector Machine)**

### **What it is:**

SVM finds the **best boundary (hyperplane)** that maximizes the margin between classes.
It uses **kernels** to handle non-linear patterns:

* linear
* radial (RBF)
* polynomial

SVM tries to maximize the distance between the hyperplane and the nearest points (support vectors).

### **Why it works for PdM:**

* Excellent for detecting subtle differences
* Good for high-dimensional data
* Effective when dataset is small but informative

---

# **4.6 KNN (K-Nearest Neighbors)**

### **What it is:**

KNN is a **distance-based model**.
When a new data point comes in, it looks at the “K” most similar past examples and predicts based on them.

No training happens—only comparison at prediction time.

### **Why it works for PdM:**

* Works well when failure patterns repeat
* Easy to implement
* Good for early prototyping

---

# **4.7 1D CNN (1D Convolutional Neural Network)**

### **What it is:**

A 1D CNN applies **filters (kernels)** over time-series data to automatically learn important patterns.
It captures:

* peaks
* periodic vibrations
* sudden anomalies
* frequency-like features

CNNs extract these patterns without manual feature engineering.

### **Why it works for PdM:**

* Ideal for raw vibration/pressure signals
* Learns local time patterns very well
* Great for fault-type classification

---

# **4.8 LSTM (Long Short-Term Memory Network)**

### **What it is:**

LSTM is a **recurrent neural network (RNN)** designed to remember information over long periods using special components:

* **input gate**
* **forget gate**
* **output gate**

These gates control what the network remembers and forgets.

### **Why it works for PdM:**

* Captures long-term degradation
* Excellent for continuous sensor streams
* Best for Remaining Useful Life (RUL) prediction

---

# **4.9 GRU (Gated Recurrent Unit)**

### **What it is:**

GRU is a simplified version of LSTM with only two gates:

* reset gate
* update gate

It retains the ability to learn long-term patterns but is computationally lighter.

### **Why it works for PdM:**

* Faster than LSTM
* Great for real-time or edge computing
* Similar performance with fewer parameters

---

# **4.10 Autoencoders**

### **What it is:**

An Autoencoder is a **neural network that tries to compress input data and then reconstruct it**.
It has:

* **Encoder** → compresses data
* **Decoder** → reconstructs data

If reconstruction error is high → anomaly detected.

### **Why it works for PdM:**

* Works even when failure data is rare
* Learns normal behavior extremely well
* Good for early anomaly detection

---

# **4.11 CNN + LSTM Hybrid**

### **What it is:**

A hybrid architecture that combines:

* **CNN layers** → extract local features from raw signals
* **LSTM layers** → understand long-term temporal behavior

This model captures both **short-term patterns** and **long-term degradation**.

### **Why it works for PdM:**

* Best for complex industrial datasets
* Used in NASA CMAPSS engine dataset
* High accuracy in RUL prediction

---

# **Final Summary**

| Model               | Technical Description                 | Best Use Case in PdM      |
| ------------------- | ------------------------------------- | ------------------------- |
| Logistic Regression | Probabilistic linear classifier       | Healthy vs Faulty         |
| Decision Tree       | Rule-based recursive splitting        | Simple diagnostics        |
| Random Forest       | Ensemble of many trees                | Fault classification      |
| XGBoost             | Gradient-boosted trees                | High-accuracy predictions |
| SVM                 | Maximized-margin classifier           | Subtle fault separation   |
| KNN                 | Distance-based classifier             | Pattern matching          |
| 1D CNN              | Convolution filters for raw signals   | Vibration/Acoustic faults |
| LSTM                | Sequence-learning RNN                 | RUL prediction            |
| GRU                 | Light-weight LSTM                     | Real-time PdM             |
| Autoencoder         | Reconstruction-based anomaly detector | Rare failure detection    |
| CNN + LSTM          | Hybrid temporal-spatial model         | Complex RUL models        |

---




---

# **5. Core Predictive Maintenance Tasks**

Predictive Maintenance (PdM) is not just one task.
It involves several AI-driven tasks that work together to monitor machine health, detect problems, and forecast failures.

Each task has a unique purpose, different data needs, and specific ML models suited for it.

We explain all **six key tasks** below.

---

# **5.1 Fault Detection**

### **What it is:**

Fault detection answers the question:

> **“Is something wrong with the machine right now?”**

It is the simplest PdM task:

* No need to identify *what* the fault is
* Only detect if the current state is normal or abnormal

### **How it works technically:**

* Model learns patterns of healthy machine behavior
* Compares current sensor readings (vibration, temperature, current)
* If values deviate beyond learned boundaries → “Fault Detected”

### **Common models used:**

* Logistic Regression
* Decision Trees
* Random Forest
* SVM
* Autoencoders (for unlabeled data)

### **Example:**

If vibration RMS exceeds the normal pattern the model learned, it raises a fault alert.

---

# **5.2 Fault Classification**

### **What it is:**

Fault classification answers:

> **“What type of fault is happening?”**

Instead of just saying “fault,” the model categorizes the problem.

### **Technical idea:**

Uses supervised learning where each failure type has a label.

Classes can be:

* Imbalance
* Misalignment
* Bearing inner-race fault
* Bearing outer-race fault
* Gear tooth damage
* Electrical winding fault

### **How it works:**

* Extract features (RMS, kurtosis, FFT peaks)
* Train multi-class models
* Classify new data into the correct fault category

### **Common models used:**

* Random Forest
* XGBoost
* 1D CNN
* CNN + LSTM

### **Example:**

Vibration pattern shows a peak at a specific frequency → classified as “Bearing Outer Race Fault”.

---

# **5.3 Anomaly Detection**

### **What it is:**

Anomaly detection identifies **unusual patterns** that do NOT match normal behavior.

It answers:

> **“Is something happening that has never happened before?”**

This is useful when:

* Failure data is limited
* Machine rarely breaks
* New kinds of faults appear that were never labeled

### **Technical idea:**

Models learn only **normal** data.
If reconstruction error or deviation is high → anomaly.

### **Common models used:**

* Autoencoders
* Isolation Forest
* One-Class SVM
* LSTM Autoencoders

### **Example:**

Machine produces a new vibration pattern not seen in training → anomaly flagged → maintenance team investigates.

---

# **5.4 Remaining Useful Life (RUL) Prediction**

### **What it is:**

RUL prediction answers the most important question in PdM:

> **“How much time is left before this machine fails?”**

It is a **regression** problem (predict continuous value).

### **Technical idea:**

* Track sensor values over time
* Learn degradation trends
* Predict how many hours/cycles remain until failure

RUL labels are created by counting backward from the failure point.

### **Models used:**

* LSTM
* GRU
* CNN + LSTM
* XGBoost
* Linear Regression (baseline)

### **Why RUL matters:**

* Helps schedule maintenance
* Avoids unnecessary replacements
* Maximizes machine lifespan
* Minimizes downtime

### **Example:**

Model predicts:

* RUL = 25 hours → Prepare maintenance
* RUL = 5 hours → Urgent replacement required

---

# **5.5 Health Scoring**

### **What it is:**

Health Scoring converts multiple sensor readings into a **single number** that represents machine health.

Example scale:

* 100 → Excellent
* 70 → Minor wear
* 40 → Medium degradation
* 20 → High risk
* 0 → Failure

### **Technical idea:**

Health score is created by:

* Normalizing features (vibration, temperature, current)
* Weighting their importance
* Combining them into one index
* Sometimes using ML to compute it

### **Benefits:**

* Easy for maintenance teams
* Good for dashboards
* Helps monitor long-term trends

### **Models used:**

* Random Forest (probability-based)
* Autoencoders (reconstruction error as score)
* LSTM (trend-based scoring)

### **Example:**

A motor’s health score drops from 90 → 65 → 40 → 15 over months → clear degradation.

---

# **5.6 Degradation Modelling**

### **What it is:**

Degradation modelling tracks how machine performance declines over time.

It answers:

> **“How is the machine wearing out?”**
> **“Is it degrading slowly or rapidly?”**

### **Technical idea:**

Models analyze long-term sensor trends:

* increasing vibration
* rising temperature
* decreasing efficiency
* growing noise

Degradation models often produce a **curve** representing the machine’s decline.

### **Models used:**

* LSTM
* GRU
* Time-series models (ARIMA, Prophet)
* Polynomial regression
* CNN-LSTM (for complex patterns)

### **Why useful:**

* Helps detect early warning patterns
* Supports planning for future maintenance
* Identifies sudden vs gradual degradation
* Used heavily in turbine/engine analysis

### **Example:**

A bearing's vibration RMS increases slowly for 200 hours → then sharply rises → indicates approaching failure.

---

# **Summary Chart**

| Task                  | Main Question             | Output            | Typical Models                        |
| --------------------- | ------------------------- | ----------------- | ------------------------------------- |
| Fault Detection       | Is there a fault?         | Normal/Faulty     | Logistic Regression, SVM, Autoencoder |
| Fault Classification  | What fault is it?         | Fault type        | Random Forest, CNN                    |
| Anomaly Detection     | Is this behavior unusual? | Anomaly/Normal    | Autoencoder, Isolation Forest         |
| RUL Prediction        | When will it fail?        | Hours/cycles left | LSTM, GRU, XGBoost                    |
| Health Scoring        | How healthy is it?        | 0–100 score       | RF, Autoencoder                       |
| Degradation Modelling | How is health changing?   | Trend curve       | LSTM, GRU                             |

---





---

# **6. Evaluation Metrics**

To measure how well Predictive Maintenance models work, we use evaluation metrics.
These metrics help us evaluate:

* Classification performance (fault vs no fault)
* Anomaly detection
* Regression tasks (RUL prediction)
* Overall model reliability

Each metric answers a different question about model quality.

---

# **6.1 Accuracy**

### **What it is:**

Accuracy measures **how many predictions the model got correct** out of all predictions.

### **Formula:**

$
\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
$

### **When useful:**

* When classes are balanced (equal healthy and faulty samples)

### **Limitation:**

If 95% of machine data is “healthy,” a model could predict “healthy” always and still get 95% accuracy → misleading.

### **Example:**

Out of 100 predictions, 90 are correct → Accuracy = 90%

---

# **6.2 Precision**

### **What it is:**

Precision tells us:

> “Of all the cases where the model predicted **fault**, how many were actually faults?”

### **Formula:**

$
\text{Precision} = \frac{TP}{TP + FP}
$

Where:

* **TP** = True Positive
* **FP** = False Positive

### **Why important in PdM:**

A low precision means many **false alarms**, which wastes maintenance time.

### **Example:**

Model predicts 20 faults, but only 10 are real → Precision = 50%.

---

# **6.3 Recall (Sensitivity)**

### **What it is:**

Recall answers:

> “Out of all **actual faults**, how many did the model detect?”

### **Formula:**

$
\text{Recall} = \frac{TP}{TP + FN}
$

Where:

* **FN** = False Negative (missed faults)

### **Why important in PdM:**

Missing a fault (**false negative**) is dangerous because it can cause unplanned machine failure → costly downtime.

### **Example:**

There were 15 actual faults; model detected only 10 → Recall = 66%.

---

# **6.4 F1-Score**

### **What it is:**

F1-score combines **Precision and Recall** into one balanced score.

### **Formula:**

$
\text{F1} = 2 \cdot \frac{(\text{Precision} \cdot \text{Recall})}{(\text{Precision} + \text{Recall})}
$

### **Why useful:**

* Best metric when classes are imbalanced
* Penalizes both false alarms and missed faults
* Used widely for fault detection/classification

### **Example:**

If Precision = 0.8 and Recall = 0.6 → F1 ≈ 0.69.

---

# **6.5 AUC-ROC (Area Under ROC Curve)**

### **What it is:**

AUC-ROC measures how well the model separates **healthy vs faulty** classes across different thresholds.

### **ROC Curve axes:**

* X-axis: False Positive Rate
* Y-axis: True Positive Rate

### **AUC Value:**

* 1.0 → Perfect model
* 0.5 → Random guessing
* 0.0 → Completely wrong

### **Why important in PdM:**

* Helps compare models
* Shows how well the model detects faults across all possible settings

### **Example:**

AUC = 0.92 → Excellent fault detection capability.

---

# **6.6 RMSE (Root Mean Square Error)**

### **What it is:**

RMSE measures how far predictions are from actual values in **regression tasks**, like RUL prediction.

### **Formula:**

$
\text{RMSE} = \sqrt{\frac{1}{N} \sum (y_{\text{true}} - y_{\text{pred}})^2 }
$

### **Why important:**

* Penalizes large errors heavily
* Good for RUL tasks where large mistakes cause operational risk

### **Example:**

If model predicts RUL = 100 hours but true was 50 → large RMSE penalty.

---

# **6.7 MAE (Mean Absolute Error)**

### **What it is:**

MAE measures the **average absolute difference** between predicted and actual values.

### **Formula:**

$
\text{MAE} = \frac{1}{N} \sum |y_{\text{true}} - y_{\text{pred}}|
$

### **Why useful:**

* Easy to interpret
* Treats all errors equally
* Good for general RUL prediction

### **Example:**

If errors are: 5, 10, 15 → MAE = 10.

---

# **6.8 MAPE (Mean Absolute Percentage Error)**

### **What it is:**

MAPE measures the **percentage error** between predicted and actual values.

### **Formula:**

$
\text{MAPE} = \frac{100}{N} \sum \left| \frac{y_{\text{true}} - y_{\text{pred}}}{y_{\text{true}}} \right|
$

### **Why useful:**

* Good when RUL values vary a lot
* Helps understand error in percentage form

### **Example:**

True RUL = 100, Predicted = 90 → MAPE = 10%.

---

# **6.9 RUL-Specific Score Functions**

### **What they are:**

Some industries define special metrics to evaluate Remaining Useful Life (RUL) prediction accuracy.

### **Why needed:**

Because:

* Overestimating RUL (predicting too high) is dangerous
* Underestimating RUL (predicting too low) wastes money

So the scoring function often penalizes one more than the other.

### **Examples:**

#### **1. NASA CMAPSS Scoring Function**

Widely used in turbofan RUL tasks:

* Small penalty for early replacement
* Big penalty for late replacement (predicting too high)

#### **2. Weighted Error Metrics**

$
\text{Score} = a \times \text{Late Error} + b \times \text{Early Error}
$
Where **a > b**.

#### **3. Asymmetric Loss Functions**

Custom loss functions:

* Penalize overestimation more
* Encourage safer maintenance schedules

### **Real meaning:**

RUL models must not say a machine will survive 50 more hours when it will actually fail in 5.

---

# **Summary Table**

| Metric    | Type           | Best For                  | Interpretation                |
| --------- | -------------- | ------------------------- | ----------------------------- |
| Accuracy  | Classification | Balanced data             | Overall correctness           |
| Precision | Classification | Avoid false alarms        | Fault prediction quality      |
| Recall    | Classification | Avoid missed faults       | Safety-critical detection     |
| F1-score  | Classification | Imbalanced datasets       | Balance of Precision + Recall |
| AUC-ROC   | Classification | Model comparison          | Fault separability            |
| RMSE      | Regression     | RUL                       | Penalizes large errors        |
| MAE       | Regression     | RUL                       | Average error                 |
| MAPE      | Regression     | RUL                       | Percentage-based error        |
| RUL Score | Regression     | Safety-critical RUL tasks | Penalizes dangerous mistakes  |

---





---

# **7. Asset Management Concepts**

Asset Management focuses on managing the **entire lifecycle** of machines and equipment so they stay reliable, safe, and cost-effective.

Predictive Maintenance is a part of Asset Management, but Asset Management is **broader**—it involves planning, risk assessment, health scoring, and decision-making.

We explain each concept below.

---

# **7.1 Asset Lifecycle**

### **What it is:**

The Asset Lifecycle describes **every stage** a machine goes through—from installation to disposal.

### **Stages:**

1. **Design & Procurement**
   Choosing machine type, specifications, and vendor.

2. **Installation & Commissioning**
   Machine is installed, tested, and handed over for operation.

3. **Operation**
   Machine performs daily tasks; sensor data is collected.

4. **Maintenance**
   Includes corrective, preventive, and predictive actions.

5. **Degradation & Ageing**
   Machine wears out naturally due to usage and environment.

6. **Retirement/Replacement**
   Machine is removed or replaced when performance drops too low.

### **Why important:**

* Helps plan long-term costs
* Supports maintenance strategy
* Ensures optimal machine usage

### **Example:**

A motor may have a lifecycle of 10 years → maintenance decisions must extend life while reducing failure risks.

---

# **7.2 Failure Patterns**

### **What it is:**

Failure patterns describe **how and when** a machine typically fails.

Machines do not fail randomly—there are common patterns.

### **Common failure patterns (from reliability engineering):**

1. **Infant Mortality**

   * Early sudden failures
   * Poor installation/manufacturing defects

2. **Random Failure**

   * No clear pattern
   * Caused by unexpected events

3. **Wear-Out Failure**

   * Slowly increasing probability of failure
   * Caused by ageing, fatigue, corrosion

### **Why important:**

Understanding failure patterns helps decide:

* Whether to use Preventive or Predictive Maintenance
* What sensors to install
* What models to train

### **Example:**

Bearings show **wear-out** failure → ideal for Predictive Maintenance.

---

# **7.3 Risk-Based Maintenance**

### **What it is:**

Risk-Based Maintenance prioritizes machines based on **risk**, not just time or schedule.

**Risk = Probability of failure × Consequence of failure**

### **Technical idea:**

High-risk machines → frequent monitoring
Low-risk machines → minimal monitoring

### **Factors in risk:**

* Machine criticality
* Cost of downtime
* Safety hazards
* Failure impact on operations
* Frequency of past failures

### **Why useful:**

Companies save money by focusing maintenance where it matters most.

### **Example:**

* Conveyor motor (low risk) → monthly check
* Turbine gearbox (high risk) → continuous monitoring with sensors + PdM

---

# **7.4 Asset Health Index (AHI)**

### **What it is:**

A single score (0–100) that represents the **overall health** of an asset.

### **How it is calculated:**

* Combine sensor values
* Normalize features
* Use weighted scoring
* Sometimes AI models generate the health score

### **Example Components:**

* Vibration intensity
* Temperature rise
* Load fluctuations
* Current imbalance
* Maintenance history

### **Score Interpretation:**

* **80–100 → Healthy**
* **60–80 → Minor wear**
* **40–60 → Needs inspection**
* **20–40 → High risk**
* **0–20 → Critical (likely failure)**

### **Why AHI is important:**

* Easy to use in dashboards
* Helps track asset performance
* Useful for management decisions
* Helps predict degradation trends

---

# **7.5 Criticality Analysis**

### **What it is:**

A method to determine **how important** each asset is to operations.

Criticality = **Importance of machine + Impact of failure**

### **Factors used:**

1. Safety impact
2. Environmental impact
3. Production loss
4. Repair cost
5. Availability of backup equipment
6. Failure frequency

### **Criticality Levels:**

* **High criticality** → requires constant monitoring + PdM
* **Medium** → scheduled + periodic monitoring
* **Low** → basic preventive maintenance

### **Why used:**

Ensures the most important assets get priority in monitoring and maintenance investment.

### **Example:**

* Turbine → High criticality
* Small exhaust fan → Low criticality

---

# **7.6 Dashboard Indicators**

### **What it is:**

Dashboards provide a **real-time visual summary** of asset performance and health.

They help engineers and managers make quick decisions.

### **Important dashboard indicators:**

#### **1. Real-Time Sensor Values**

* Temperature
* Vibration
* Pressure
* Current/Voltage

#### **2. Trend Graphs**

* Vibration RMS over time
* Temperature vs load
* Degradation curve

#### **3. Health Score Indicators**

* Health Index radius chart
* Green → Healthy
* Yellow → Warning
* Red → Critical

#### **4. Fault Alerts & Alarms**

* High vibration alert
* Overheating alarm
* Low pressure alarm

#### **5. RUL Prediction Display**

* Remaining hours
* “Next maintenance in X hours”

#### **6. Asset Performance Summary**

* Downtime
* Maintenance cost
* Fault frequency
* Energy consumption

### **Why dashboards matter:**

* Make complex data easy to understand
* Help with fast decision-making
* Provide transparency to management
* Detect problems early

---

# **Summary Table**

| Concept                | Purpose                                      |
| ---------------------- | -------------------------------------------- |
| Asset Lifecycle        | Manage machine from installation to disposal |
| Failure Patterns       | Understand how machines fail                 |
| Risk-Based Maintenance | Prioritize assets based on risk              |
| Asset Health Index     | Single score representing health             |
| Criticality Analysis   | Rank machines by importance                  |
| Dashboard Indicators   | Visualize health, RUL, alerts, and trends    |

---




---

# **B. Hands-On Coding Content**

# **1. Dataset Setup (with ipywidgets)**

We will work with three commonly used Predictive Maintenance datasets:

1️⃣ Pump Sensor Failure Dataset (Working link)

Source: openML / GitHub mirror

https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/machine_temperature_system_failure.csv


✔ Time-series sensor data
✔ Contains cooling system failure pattern
✔ Great for anomaly detection

2️⃣ Gas Turbine Sensor Data (Working link)

Source: reliable GitHub mirror

https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv


⚠️ This is a time-series dataset we slightly repurpose for PdM.
✔ Stable sensor-like numeric time series
✔ Good for LSTM/RNN feature demonstration

3️⃣ Industrial Equipment Temperature & Load Dataset (Working link)

Source: real industrial example dataset

https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv




---

# ✅ **Step 1 — Install Required Libraries**




































In [None]:
# ============================================================
# 📦 ONLINE PREDICTIVE MAINTENANCE DATASETS (100% WORKING)
# With ipywidgets (Load + Clean + Plot)
# ============================================================

!pip install pandas numpy matplotlib ipywidgets seaborn requests

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display, Markdown

# ------------------------------------------------------------
# 1️⃣ VERIFIED DATASET LINKS (ALL WORKING)
# ------------------------------------------------------------
DATASETS = {
    "Pump Sensor Failure (Machine Temp)":
        "https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/machine_temperature_system_failure.csv",

    "Gas Turbine Sensor-Like Series":
        "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv",

    "Industrial Equipment Temperature":
        "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv"
}

# ------------------------------------------------------------
# 2️⃣ Widgets
# ------------------------------------------------------------
dataset_selector = widgets.Dropdown(
    options=list(DATASETS.keys()),
    value="Pump Sensor Failure (Machine Temp)",
    description="Select Dataset:"
)

load_button = widgets.Button(description="Load Dataset", button_style="success")
output_area = widgets.Output()

display(dataset_selector, load_button, output_area)

# Missing value widget
na_strategy = widgets.Dropdown(
    options=["Drop rows", "Fill mean", "Fill median", "Forward fill", "Backward fill"],
    value="Fill mean",
    description="Missing Values:"
)
fix_button = widgets.Button(description="Apply Cleaning", button_style="warning")
display(na_strategy, fix_button)

# Sensor plot
column_selector = widgets.Dropdown(description="Select Column:")
plot_button = widgets.Button(description="Plot Column", button_style="info")
display(column_selector, plot_button)

current_df = None

# ------------------------------------------------------------
# 3️⃣ Missing value function
# ------------------------------------------------------------
def clean_missing(df, method):
    if method == "Drop rows":
        return df.dropna()
    if method == "Fill mean":
        return df.fillna(df.mean(numeric_only=True))
    if method == "Fill median":
        return df.fillna(df.median(numeric_only=True))
    if method == "Forward fill":
        return df.fillna(method="ffill")
    if method == "Backward fill":
        return df.fillna(method="bfill")

# ------------------------------------------------------------
# 4️⃣ Load dataset callback
# ------------------------------------------------------------
def on_load_clicked(b):
    global current_df
    with output_area:
        output_area.clear_output()
        name = dataset_selector.value
        url = DATASETS[name]

        display(Markdown(f"## 📥 Loading Dataset: **{name}**"))
        display(Markdown(f"**Source URL:** {url}"))

        df = pd.read_csv(url)
        current_df = df

        display(Markdown("### ✅ Dataset Loaded"))
        display(df.head())

        # populate column selector
        column_selector.options = df.columns.tolist()

load_button.on_click(on_load_clicked)

# ------------------------------------------------------------
# 5️⃣ Apply Cleaning
# ------------------------------------------------------------
def on_clean_clicked(b):
    global current_df
    with output_area:
        output_area.clear_output()
        if current_df is None:
            display(Markdown("### ❗ Load dataset first."))
            return

        current_df = clean_missing(current_df, na_strategy.value)

        display(Markdown(f"## 🧹 Cleaning Applied — **{na_strategy.value}**"))
        display(current_df.head())

fix_button.on_click(on_clean_clicked)

# ------------------------------------------------------------
# 6️⃣ Plotting Callback
# ------------------------------------------------------------
def on_plot_clicked(b):
    with output_area:
        output_area.clear_output()
        if current_df is None:
            display(Markdown("### ❗ Load dataset first."))
            return

        col = column_selector.value
        display(Markdown(f"## 📈 Plot for Column — **{col}**"))

        plt.figure(figsize=(12, 4))
        plt.plot(current_df[col])
        plt.grid(True)
        plt.title(f"{col} Trend")
        plt.xlabel("Index")
        plt.ylabel(col)
        plt.show()

plot_button.on_click(on_plot_clicked)




Dropdown(description='Select Dataset:', options=('Pump Sensor Failure (Machine Temp)', 'Gas Turbine Sensor-Lik…

Button(button_style='success', description='Load Dataset', style=ButtonStyle())

Output()

Dropdown(description='Missing Values:', index=1, options=('Drop rows', 'Fill mean', 'Fill median', 'Forward fi…



Dropdown(description='Select Column:', options=(), value=None)

Button(button_style='info', description='Plot Column', style=ButtonStyle())

In [None]:
# ============================================================
# 2️⃣ FEATURE ENGINEERING IN CODE
# Scaling • Rolling Stats • Lag Features • FFT • Windows • RUL
# ============================================================

import numpy as np
from IPython.display import display, Markdown

display(Markdown("## ⚙️ Step 2 — Feature Engineering"))

# ------------------------------------------------------------
# Widgets
# ------------------------------------------------------------
window_size_slider = widgets.IntSlider(
    value=30, min=5, max=200, step=5,
    description="Window Size:", style={'description_width': 'initial'}
)

lag_slider = widgets.IntSlider(
    value=3, min=1, max=10, step=1,
    description="Lag Count:", style={'description_width': 'initial'}
)

fft_slider = widgets.IntSlider(
    value=5, min=1, max=20, step=1,
    description="FFT Components:", style={'description_width': 'initial'}
)

generate_features_button = widgets.Button(
    description="Generate Feature Matrix",
    button_style="primary"
)

feature_output = widgets.Output()

display(window_size_slider, lag_slider, fft_slider, generate_features_button, feature_output)

# ------------------------------------------------------------
# Helper Function: Create Feature Matrix
# ------------------------------------------------------------
def create_feature_matrix(series, window_size=30, n_lags=3, n_fft=5):
    """
    Takes a numeric time-series and generates a PdM feature matrix containing:
    - Scaling
    - Rolling statistics
    - Lag features
    - FFT frequency features
    - RUL labels
    """
    values = pd.to_numeric(series, errors='coerce').dropna().values
    N = len(values)

    # Min–Max Scaling
    vmin, vmax = values.min(), values.max()
    scaled = (values - vmin) / (vmax - vmin + 1e-9)

    rows = []

    for start in range(0, N - window_size):
        end = start + window_size
        window_vals = scaled[start:end]

        row = {
            "window_start": start,
            "window_end": end - 1
        }

        # ----------------------------------------------------
        # Rolling Statistical Features
        # ----------------------------------------------------
        row["mean"] = window_vals.mean()
        row["std"] = window_vals.std()
        row["min"] = window_vals.min()
        row["max"] = window_vals.max()
        row["rms"] = np.sqrt(np.mean(window_vals ** 2))
        row["last_value"] = window_vals[-1]

        # ----------------------------------------------------
        # Lag Features
        # ----------------------------------------------------
        for k in range(1, n_lags + 1):
            idx = end - k
            row[f"lag_{k}"] = scaled[idx] if idx >= 0 else np.nan

        # ----------------------------------------------------
        # FFT Frequency Features
        # ----------------------------------------------------
        fft_vals = np.abs(np.fft.rfft(window_vals))
        for j in range(1, n_fft + 1):
            row[f"fft_{j}"] = fft_vals[j] if j < len(fft_vals) else np.nan

        # ----------------------------------------------------
        # Remaining Useful Life Labels
        # ----------------------------------------------------
        row["RUL"] = N - end

        rows.append(row)

    return pd.DataFrame(rows)


# ------------------------------------------------------------
# CALLBACK: Generate Features
# ------------------------------------------------------------
def on_generate_features_clicked(b):
    with feature_output:
        feature_output.clear_output()

        if current_df is None:
            display(Markdown("### ❗ Load and clean a dataset first."))
            return

        col = column_selector.value
        if col is None:
            display(Markdown("### ❗ Select a column first."))
            return

        # Ensure selected column is numeric
        if not pd.api.types.is_numeric_dtype(current_df[col]):
            display(Markdown(f"### ❗ Column **{col}** is not numeric."))
            display(Markdown("Select a numeric sensor/value column."))
            return

        series = current_df[col].dropna().reset_index(drop=True)

        display(Markdown(f"## 🧮 Feature Engineering for Column: **{col}**"))
        display(Markdown(
            f"- Window Size: **{window_size_slider.value}**  \n"
            f"- Lag Count: **{lag_slider.value}**  \n"
            f"- FFT Components: **{fft_slider.value}**"
        ))

        # ------------------------------------
        # ⭐ FIX: Store feature matrix globally
        # ------------------------------------
        global features_df_global
        features_df_global = create_feature_matrix(
            series,
            window_size=window_size_slider.value,
            n_lags=lag_slider.value,
            n_fft=fft_slider.value
        )

        display(Markdown("### 📘 First 10 Rows of Feature Matrix"))
        display(features_df_global.head(10))

        display(Markdown("### 🔢 Feature Matrix Shape"))
        display(Markdown(f"- Rows: **{features_df_global.shape[0]}**"))
        display(Markdown(f"- Columns: **{features_df_global.shape[1]}**"))

generate_features_button.on_click(on_generate_features_clicked)


## ⚙️ Step 2 — Feature Engineering

IntSlider(value=30, description='Window Size:', max=200, min=5, step=5, style=SliderStyle(description_width='i…

IntSlider(value=3, description='Lag Count:', max=10, min=1, style=SliderStyle(description_width='initial'))

IntSlider(value=5, description='FFT Components:', max=20, min=1, style=SliderStyle(description_width='initial'…

Button(button_style='primary', description='Generate Feature Matrix', style=ButtonStyle())

Output()

In [None]:
# ============================================================
# 3️⃣ ML MODEL TRAINING — WITH LIVE TRAINING OUTPUT
# ============================================================

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import (
    mean_absolute_error, mean_squared_error, r2_score,
    accuracy_score, precision_score, recall_score, f1_score
)
import matplotlib.pyplot as plt
from IPython.display import display, Markdown
import time

# Global storage for ML training
generated_features_df = None

# ============================================================
# ONE SINGLE OUTPUT AREA FOR EVERYTHING
# ============================================================
train_output = widgets.Output()
display(Markdown("## 🧾 Step 3 — ML Model Training"))
display(train_output)

# ============================================================
# SAVE FEATURE MATRIX BUTTON
# ============================================================
save_features_button = widgets.Button(
    description="Save Feature Matrix",
    button_style="success"
)
display(save_features_button)

def on_save_features_clicked(b):
    global generated_features_df
    with train_output:
        train_output.clear_output()

        if "features_df_global" not in globals():
            display(Markdown("### ❗ Please generate features first in Step 2."))
            return

        generated_features_df = features_df_global.copy()
        display(Markdown("### ✅ Feature Matrix Saved Successfully"))
        display(generated_features_df.head())

save_features_button.on_click(on_save_features_clicked)

# ============================================================
# MODEL SELECTION + TRAIN BUTTON
# ============================================================
model_type_selector = widgets.Dropdown(
    options=["RUL Regression", "Binary Classification (High vs Low RUL)"],
    value="RUL Regression",
    description="Model Type:"
)

train_button = widgets.Button(
    description="Train Model",
    button_style="primary"
)

display(model_type_selector, train_button)

# ============================================================
# TRAINING FUNCTIONS
# ============================================================

def train_regression_model(df):
    X = df.drop(columns=["RUL"])
    y = df["RUL"]

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, shuffle=False
    )

    model = RandomForestRegressor(n_estimators=200, random_state=42)
    model.fit(X_train, y_train)

    preds = model.predict(X_test)

    return model, preds, y_test


def train_classification_model(df):
    df = df.copy()
    threshold = df["RUL"].median()
    df["label"] = (df["RUL"] < threshold).astype(int)

    X = df.drop(columns=["RUL", "label"])
    y = df["label"]

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, shuffle=False
    )

    model = RandomForestClassifier(n_estimators=200, random_state=42)
    model.fit(X_train, y_train)

    preds = model.predict(X_test)

    return model, preds, y_test


# ============================================================
# TRAIN BUTTON CALLBACK (WITH LIVE STATUS MESSAGES)
# ============================================================
def on_train_clicked(b):
    with train_output:
        train_output.clear_output()

        if generated_features_df is None:
            display(Markdown("### ❗ Please click **Save Feature Matrix** first."))
            return

        df = generated_features_df.copy()
        model_type = model_type_selector.value

        # ------------------------------------------
        # LIVE MESSAGE FEEDBACK
        # ------------------------------------------
        print("⏳ Starting Training...")
        display(Markdown("### ⏳ Training started… Please wait."))

        time.sleep(0.5)
        print("📌 Splitting dataset...")
        time.sleep(0.4)

        # ----------------------------------------------------
        # RUL REGRESSION
        # ----------------------------------------------------
        if model_type == "RUL Regression":
            display(Markdown("## 🔧 Training RUL Regression Model..."))

            model, preds, y_test = train_regression_model(df)

            print("📌 Calculating metrics...")
            time.sleep(0.3)

            mae = mean_absolute_error(y_test, preds)
            rmse = np.sqrt(mean_squared_error(y_test, preds))
            r2 = r2_score(y_test, preds)

            display(Markdown("### 📊 Regression Metrics"))
            display(Markdown(f"- **MAE:** {mae:.4f}"))
            display(Markdown(f"- **RMSE:** {rmse:.4f}"))
            display(Markdown(f"- **R² Score:** {r2:.4f}"))

            print("📈 Plotting results...")
            time.sleep(0.3)

            plt.figure(figsize=(12,4))
            plt.plot(y_test.values, label="Actual RUL", linewidth=2)
            plt.plot(preds, label="Predicted RUL", linestyle="dashed")
            plt.legend()
            plt.grid(True)
            plt.title("Actual vs Predicted RUL")
            plt.show()

            display(Markdown("### ✅ Training Completed Successfully"))

        # ----------------------------------------------------
        # BINARY CLASSIFICATION
        # ----------------------------------------------------
        else:
            display(Markdown("## 🔧 Training High/Low Risk Classifier..."))

            model, preds, y_test = train_classification_model(df)

            print("📌 Evaluating predictions...")
            time.sleep(0.3)

            acc = accuracy_score(y_test, preds)
            prec = precision_score(y_test, preds)
            rec = recall_score(y_test, preds)
            f1 = f1_score(y_test, preds)

            display(Markdown("### 📊 Classification Metrics"))
            display(Markdown(f"- **Accuracy:** {acc:.4f}"))
            display(Markdown(f"- **Precision:** {prec:.4f}"))
            display(Markdown(f"- **Recall:** {rec:.4f}"))
            display(Markdown(f"- **F1 Score:** {f1:.4f}"))

            print("📈 Plotting results...")
            time.sleep(0.3)

            plt.figure(figsize=(12,4))
            plt.plot(y_test.values, label="True Class", linewidth=2)
            plt.plot(preds, label="Predicted Class", linestyle='dashed')
            plt.legend()
            plt.grid(True)
            plt.title("High/Low Risk Classification")
            plt.show()

            display(Markdown("### ✅ Training Completed Successfully"))

train_button.on_click(on_train_clicked)


## 🧾 Step 3 — ML Model Training

Output()

Button(button_style='success', description='Save Feature Matrix', style=ButtonStyle())

Dropdown(description='Model Type:', options=('RUL Regression', 'Binary Classification (High vs Low RUL)'), val…

Button(button_style='primary', description='Train Model', style=ButtonStyle())

In [None]:
# ============================================================
# 4️⃣ DEEP LEARNING MODELS FOR PREDICTIVE MAINTENANCE
# LSTM Forecasting + Autoencoder Anomaly Detection
# ============================================================

import tensorflow as tf
from tensorflow.keras import layers, models
from IPython.display import display, Markdown

display(Markdown("## 🤖 Step 4 — Deep Learning Models"))

# ------------------------------------------------------------
# Widgets for DL configuration
# ------------------------------------------------------------
dl_model_selector = widgets.Dropdown(
    options=["LSTM Forecasting", "Autoencoder Anomaly Detection"],
    value="LSTM Forecasting",
    description="DL Model:",
    style={'description_width': 'initial'}
)

seq_len_slider = widgets.IntSlider(
    value=30, min=5, max=100, step=5,
    description="Sequence Length:",
    style={'description_width': 'initial'}
)

epochs_slider = widgets.IntSlider(
    value=5, min=1, max=30, step=1,
    description="Epochs:",
    style={'description_width': 'initial'}
)

batch_slider = widgets.IntSlider(
    value=32, min=8, max=128, step=8,
    description="Batch Size:",
    style={'description_width': 'initial'}
)

dl_train_button = widgets.Button(
    description="Train DL Model",
    button_style="primary"
)

dl_output = widgets.Output()

display(dl_model_selector, seq_len_slider, epochs_slider, batch_slider, dl_train_button, dl_output)

# ------------------------------------------------------------
# Helper: Build sequences from a 1D time series
# ------------------------------------------------------------
def build_sequences(values, seq_len=30):
    """
    Given a 1D numpy array, create overlapping sequences:
    X: sequences of length seq_len
    y: next value (for forecasting)
    """
    X, y = [], []
    for i in range(len(values) - seq_len):
        X.append(values[i:i+seq_len])
        y.append(values[i+seq_len])
    X = np.array(X)
    y = np.array(y)
    # reshape X for LSTM: (samples, timesteps, features)
    X = X.reshape((X.shape[0], X.shape[1], 1))
    return X, y

# ------------------------------------------------------------
# LSTM Model Builder
# ------------------------------------------------------------
def build_lstm_model(seq_len):
    model = models.Sequential([
        layers.Input(shape=(seq_len, 1)),
        layers.LSTM(32, return_sequences=False),
        layers.Dense(16, activation="relu"),
        layers.Dense(1)
    ])
    model.compile(optimizer="adam", loss="mse")
    return model

# ------------------------------------------------------------
# Autoencoder Model Builder (Dense on Flattened Window)
# ------------------------------------------------------------
def build_autoencoder(seq_len):
    model = models.Sequential([
        layers.Input(shape=(seq_len,)),
        layers.Dense(64, activation="relu"),
        layers.Dense(32, activation="relu"),
        layers.Dense(64, activation="relu"),
        layers.Dense(seq_len)  # reconstruct original window
    ])
    model.compile(optimizer="adam", loss="mse")
    return model

# ------------------------------------------------------------
# CALLBACK: Train DL Model
# ------------------------------------------------------------
def on_dl_train_clicked(b):
    with dl_output:
        dl_output.clear_output()

        if current_df is None:
            display(Markdown("### ❗ Please load and clean a dataset first (Step 1)."))
            return

        col = column_selector.value
        if col is None:
            display(Markdown("### ❗ Please select a column (sensor) in Step 1."))
            return

        # Ensure numeric
        if not pd.api.types.is_numeric_dtype(current_df[col]):
            display(Markdown(f"### ❗ Column **{col}** is not numeric. Select a numeric sensor column."))
            return

        seq_len = seq_len_slider.value
        epochs = epochs_slider.value
        batch_size = batch_slider.value
        model_type = dl_model_selector.value

        # Prepare series
        series = current_df[col].dropna().reset_index(drop=True).values.astype(float)

        if len(series) <= seq_len + 5:
            display(Markdown("### ❗ Time series too short for the selected sequence length."))
            return

        # Min–Max scaling
        vmin, vmax = series.min(), series.max()
        scaled = (series - vmin) / (vmax - vmin + 1e-9)

        display(Markdown(f"## 🤖 Training **{model_type}** on Column: **{col}**"))
        display(Markdown(
            f"- Sequence Length: **{seq_len}**  \n"
            f"- Epochs: **{epochs}**  \n"
            f"- Batch Size: **{batch_size}**"
        ))

        # ----------------------------------------------------
        # LSTM FORECASTING
        # ----------------------------------------------------
        if model_type == "LSTM Forecasting":
            print("⏳ Building sequences for forecasting...")
            X, y = build_sequences(scaled, seq_len=seq_len)

            # Train/test split
            split_idx = int(len(X) * 0.8)
            X_train, X_test = X[:split_idx], X[split_idx:]
            y_train, y_test = y[:split_idx], y[split_idx:]

            print("📌 Building LSTM model...")
            model = build_lstm_model(seq_len)

            print("🚀 Training LSTM...")
            history = model.fit(
                X_train, y_train,
                epochs=epochs,
                batch_size=batch_size,
                validation_split=0.2,
                verbose=0
            )

            print("📌 Predicting on test set...")
            y_pred = model.predict(X_test, verbose=0)

            # Rescale back to original units
            y_test_real = y_test * (vmax - vmin + 1e-9) + vmin
            y_pred_real = y_pred.flatten() * (vmax - vmin + 1e-9) + vmin

            # Metrics
            mae = mean_absolute_error(y_test_real, y_pred_real)
            rmse = np.sqrt(mean_squared_error(y_test_real, y_pred_real))

            display(Markdown("### 📊 LSTM Forecasting Metrics"))
            display(Markdown(f"- **MAE:** {mae:.4f}"))
            display(Markdown(f"- **RMSE:** {rmse:.4f}"))

            # Plot forecast vs actual
            plt.figure(figsize=(12,4))
            plt.plot(y_test_real, label="Actual", linewidth=2)
            plt.plot(y_pred_real, label="Predicted", linestyle="dashed")
            plt.legend()
            plt.grid(True)
            plt.title(f"LSTM Forecasting — {col}")
            plt.show()

            # Optional: Training loss curve
            plt.figure(figsize=(8,4))
            plt.plot(history.history["loss"], label="Training Loss")
            plt.plot(history.history.get("val_loss", []), label="Validation Loss")
            plt.legend()
            plt.title("LSTM Training Loss Curve")
            plt.grid(True)
            plt.show()

            display(Markdown("### ✅ LSTM Training Completed"))

        # ----------------------------------------------------
        # AUTOENCODER ANOMALY DETECTION
        # ----------------------------------------------------
        else:
            print("⏳ Creating windowed data for Autoencoder...")
            # Create overlapping windows as flat vectors
            windows = []
            for i in range(len(scaled) - seq_len):
                windows.append(scaled[i:i+seq_len])
            windows = np.array(windows)

            # Train/test split
            split_idx = int(len(windows) * 0.8)
            X_train = windows[:split_idx]
            X_test = windows[split_idx:]

            print("📌 Building Autoencoder model...")
            autoencoder = build_autoencoder(seq_len)

            print("🚀 Training Autoencoder...")
            history = autoencoder.fit(
                X_train, X_train,
                epochs=epochs,
                batch_size=batch_size,
                validation_split=0.2,
                verbose=0
            )

            print("📌 Computing reconstruction errors...")
            # Reconstruction on train + test
            train_recon = autoencoder.predict(X_train, verbose=0)
            test_recon = autoencoder.predict(X_test, verbose=0)

            train_errors = np.mean((X_train - train_recon)**2, axis=1)
            test_errors = np.mean((X_test - test_recon)**2, axis=1)

            # Threshold: mean + 2*std of training errors
            threshold = train_errors.mean() + 2 * train_errors.std()

            # Flag anomalies in test
            anomalies = test_errors > threshold
            num_anomalies = anomalies.sum()

            display(Markdown("### 📊 Autoencoder Anomaly Detection Summary"))
            display(Markdown(f"- **Threshold:** {threshold:.6f} (mean + 2·std of train error)"))
            display(Markdown(f"- **Total Test Windows:** {len(test_errors)}"))
            display(Markdown(f"- **Anomalous Windows Detected:** **{num_anomalies}**"))

            # Plot reconstruction error
            plt.figure(figsize=(12,4))
            plt.plot(test_errors, label="Reconstruction Error")
            plt.axhline(threshold, color="red", linestyle="--", label="Threshold")
            plt.title(f"Autoencoder Reconstruction Error — {col}")
            plt.xlabel("Test Window Index")
            plt.ylabel("MSE")
            plt.legend()
            plt.grid(True)
            plt.show()

            # Optional: Training loss curve
            plt.figure(figsize=(8,4))
            plt.plot(history.history["loss"], label="Training Loss")
            plt.plot(history.history.get("val_loss", []), label="Validation Loss")
            plt.legend()
            plt.title("Autoencoder Training Loss Curve")
            plt.grid(True)
            plt.show()

            display(Markdown("### ✅ Autoencoder Training Completed"))

dl_train_button.on_click(on_dl_train_clicked)


## 🤖 Step 4 — Deep Learning Models

Dropdown(description='DL Model:', options=('LSTM Forecasting', 'Autoencoder Anomaly Detection'), style=Descrip…

IntSlider(value=30, description='Sequence Length:', min=5, step=5, style=SliderStyle(description_width='initia…

IntSlider(value=5, description='Epochs:', max=30, min=1, style=SliderStyle(description_width='initial'))

IntSlider(value=32, description='Batch Size:', max=128, min=8, step=8, style=SliderStyle(description_width='in…

Button(button_style='primary', description='Train DL Model', style=ButtonStyle())

Output()