# Machine Learning for Predecttive Maintenance

## Introduction

### Due to the lack of existing real-world data, we will generate synthetic data to simulate the operating conditions and potential failures of industrial transformers and company laptops. The synthetic data will include various features such as temperature, vibration, load, ambient temperature, and humidity for transformers, and CPU usage, memory usage, disk health, and uptime for laptops.

### The main steps in this notebook include:
##### 1. **Data Augmentation:** Creating synthetic data that mimics real-world scenarios, including both normal operating conditions and potential failure conditions.
##### 2. **Feature Engineering:** Creating additional features that help improve the predictive power of the model.
##### 3. **Model Training:** Training machine learning models using the synthetic data to predict equipment failures.
##### 4. **Real-time Data Integration:** Simulating real-time data collection and feeding it into the model for continuous predictions.
##### 5. **Monitoring and Alerting:** Setting up mechanisms to monitor model predictions and generate alerts for potential failures.

## Let's get started!

## Data Augmentation
#### Synthetic data is generated at random within specfic ranges for each of the features. Additionally, a failure type label, and failure data are added to the data to simulate a fault. Finally, the data is shuffled to ensure that the data is not ordered in a way that would lead to a biased model. 

### Transformer Data Augmentation

In [18]:
import numpy as np
import pandas as pd

# Define the number of samples
numSamples = 10000

# Generate normal operating conditions (70% of total)
normalTemperature = np.random.uniform(60, 75, int(numSamples * 0.7))
normalVibration = np.random.uniform(0, 0.08, int(numSamples * 0.7))
normalLoad = np.random.uniform(30, 65, int(numSamples * 0.7))
normalAmbientTemp = np.random.uniform(20, 35, int(numSamples * 0.7))
normalHumidity = np.random.uniform(30, 70, int(numSamples * 0.7))

# Generate abnormal conditions (30% total)
coolingSystemFailures = np.random.uniform(81, 100, int(numSamples * 0.1))  
mechanicalWear = np.random.uniform(0.11, 0.5, int(numSamples * 0.1))      
overload = np.random.uniform(70, 100, int(numSamples * 0.1))               

# For cooling system failures
abnormalTemperatureCooling = np.random.uniform(81, 100, int(numSamples * 0.1))
abnormalVibrationCooling = np.random.uniform(0, 0.08, int(numSamples * 0.1))
abnormalLoadCooling = np.random.uniform(30, 65, int(numSamples * 0.1))

# For mechanical wear
abnormalTemperatureWear = np.random.uniform(60, 75, int(numSamples * 0.1))
abnormalVibrationWear = np.random.uniform(0.11, 0.5, int(numSamples * 0.1))
abnormalLoadWear = np.random.uniform(30, 65, int(numSamples * 0.1))

# For overload
abnormalTemperatureOverload = np.random.uniform(60, 75, int(numSamples * 0.1))
abnormalVibrationOverload = np.random.uniform(0, 0.08, int(numSamples * 0.1))
abnormalLoadOverload = np.random.uniform(70, 100, int(numSamples * 0.1))

temperature = np.concatenate((normalTemperature, abnormalTemperatureCooling, abnormalTemperatureWear, abnormalTemperatureOverload))
vibration = np.concatenate((normalVibration, abnormalVibrationCooling, abnormalVibrationWear, abnormalVibrationOverload))
load = np.concatenate((normalLoad, abnormalLoadCooling, abnormalLoadWear, abnormalLoadOverload))
ambientTemp = np.concatenate((normalAmbientTemp, np.random.uniform(20, 35, int(numSamples * 0.3))))
humidity = np.concatenate((normalHumidity, np.random.uniform(30, 70, int(numSamples * 0.3))))

equipmentIds = np.arange(1, len(temperature) + 1)

# Create categorical labels based on the conditions
def categorizeFailure(temp, vib, load):
    if temp > 80:
        return "Insulation Breakdown or Cooling System Failure"
    elif vib > 0.1:
        return "Mechanical Wear or Loose Components"
    elif load > 70:
        return "Overload or Imbalanced Load"
    else:
        return "Normal Operation"

# categorization logic
failureCategory = [categorizeFailure(t, v, l) for t, v, l in zip(temperature, vibration, load)]

transformerData = pd.DataFrame({
    'equipmentId': equipmentIds,
    'temperature': temperature,
    'vibration': vibration,
    'load': load,
    'ambientTemp': ambientTemp,
    'humidity': humidity,
    'failureCategory': failureCategory
})

transformerData = transformerData.sample(frac=1).reset_index(drop=True)
transformerData.to_csv('synthetic_transformer_data.csv', index=False)

print(transformerData.head())


   equipmentId  temperature  vibration       load  ambientTemp   humidity  \
0          125    74.362677   0.036248  39.330760    34.290848  60.542704   
1         4181    70.560084   0.077564  62.415138    28.350994  53.671244   
2         6575    73.010552   0.006720  61.619548    31.960994  41.641994   
3         4074    62.651261   0.036117  48.998045    30.773942  63.160376   
4         7182    82.462929   0.039446  41.636565    31.425095  41.058617   

                                  failureCategory  
0                                Normal Operation  
1                                Normal Operation  
2                                Normal Operation  
3                                Normal Operation  
4  Insulation Breakdown or Cooling System Failure  


### Laptop Data Augmentation

In [22]:
import numpy as np
import pandas as pd

# Define the number of samples
numSamples = 10000

# Generate normal operating conditions (70% of total)
normalCpuUsage = np.random.uniform(10, 50, int(numSamples * 0.7))
normalMemoryUsage = np.random.uniform(30, 70, int(numSamples * 0.7))
normalDiskHealth = np.random.uniform(80, 100, int(numSamples * 0.7))
normalBatteryHealth = np.random.uniform(70, 100, int(numSamples * 0.7))
normalUptime = np.random.uniform(0, 100, int(numSamples * 0.7))

# Generate specific abnormal conditions (remaining 30%)
# CPU Overload or Thermal Throttling
abnormalCpuUsageOverload = np.random.uniform(85, 100, int(numSamples * 0.1))
abnormalMemoryUsageOverload = np.random.uniform(30, 70, int(numSamples * 0.1))
abnormalDiskHealthOverload = np.random.uniform(80, 100, int(numSamples * 0.1))
abnormalBatteryHealthOverload = np.random.uniform(70, 100, int(numSamples * 0.1))
abnormalUptimeOverload = np.random.uniform(0, 100, int(numSamples * 0.1))

# Memory Leak or Insufficient RAM
abnormalCpuUsageMemory = np.random.uniform(10, 50, int(numSamples * 0.1))
abnormalMemoryUsageMemory = np.random.uniform(85, 100, int(numSamples * 0.1))
abnormalDiskHealthMemory = np.random.uniform(80, 100, int(numSamples * 0.1))
abnormalBatteryHealthMemory = np.random.uniform(70, 100, int(numSamples * 0.1))
abnormalUptimeMemory = np.random.uniform(0, 100, int(numSamples * 0.1))

# Imminent Disk Failure or Bad Sectors
abnormalCpuUsageDisk = np.random.uniform(10, 50, int(numSamples * 0.1))
abnormalMemoryUsageDisk = np.random.uniform(30, 70, int(numSamples * 0.1))
abnormalDiskHealthDisk = np.random.uniform(0, 50, int(numSamples * 0.1))
abnormalBatteryHealthDisk = np.random.uniform(70, 100, int(numSamples * 0.1))
abnormalUptimeDisk = np.random.uniform(0, 100, int(numSamples * 0.1))

cpuUsage = np.concatenate((normalCpuUsage, abnormalCpuUsageOverload, abnormalCpuUsageMemory, abnormalCpuUsageDisk))
memoryUsage = np.concatenate((normalMemoryUsage, abnormalMemoryUsageOverload, abnormalMemoryUsageMemory, abnormalMemoryUsageDisk))
diskHealth = np.concatenate((normalDiskHealth, abnormalDiskHealthOverload, abnormalDiskHealthMemory, abnormalDiskHealthDisk))
batteryHealth = np.concatenate((normalBatteryHealth, abnormalBatteryHealthOverload, abnormalBatteryHealthMemory, abnormalBatteryHealthDisk))
uptime = np.concatenate((normalUptime, abnormalUptimeOverload, abnormalUptimeMemory, abnormalUptimeDisk))

laptopIds = np.arange(1, len(cpuUsage) + 1)

# Create categorical labels based on the conditions
def categorizeIssue(cpu, mem, disk):
    if cpu > 85:
        return "CPU Overload or Thermal Throttling"
    elif mem > 85:
        return "Memory Leak or Insufficient RAM"
    elif disk < 50:
        return "Imminent Disk Failure or Bad Sectors"
    else:
        return "Normal Operation"

# categorization logic
issueCategory = [categorizeIssue(c, m, d) for c, m, d in zip(cpuUsage, memoryUsage, diskHealth)]

laptopData = pd.DataFrame({
    'laptop_id': laptopIds,
    'cpu_usage': cpuUsage,
    'memory_usage': memoryUsage,
    'disk_health': diskHealth,
    'battery_health': batteryHealth,
    'uptime': uptime,
    'issue_category': issueCategory
})

laptopData = laptopData.sample(frac=1).reset_index(drop=True)
laptopData.to_csv('synthetic_laptop_data.csv', index=False)

print(laptopData.head())


   laptop_id  cpu_usage  memory_usage  disk_health  battery_health     uptime  \
0       6432  43.636612     69.290219    97.206920       84.789257  60.751652   
1       2352  14.476792     41.743899    93.772955       81.736711  11.826356   
2       3297  42.317531     44.259301    95.606907       90.954988  26.904917   
3       7899  99.748977     66.251311    80.787914       96.842310  61.863915   
4       5197  28.178531     42.096964    96.588996       99.261028  34.610132   

                       issue_category  
0                    Normal Operation  
1                    Normal Operation  
2                    Normal Operation  
3  CPU Overload or Thermal Throttling  
4                    Normal Operation  


## Training Modles for Predictive Maintenance

### Here models will be trained on both the laptop data and the Transformers data. THe models will then be dumped (saved) in order to be used later on.

### Training transformer Model

In [24]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib

# Load the synthetic transformer data
transformer_data = pd.read_csv('synthetic_transformer_data.csv')

# Define features and target
features = ['temperature', 'vibration', 'load', 'ambientTemp', 'humidity']
target = 'failureCategory'

X = transformer_data[features]
y = transformer_data[target]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Random Forest model to predict categories
transformer_model = RandomForestClassifier(n_estimators=100, random_state=42)
transformer_model.fit(X_train, y_train)

#Save the model and ouput a classification report
joblib.dump(transformer_model, 'transformer_model.pkl')
y_pred = transformer_model.predict(X_test)
report = classification_report(y_test, y_pred)
with open('transformer_classification_report.txt', 'w') as file:
    file.write(report)
print("Transformer Model Classification Report:")
print(report)


Transformer Model Classification Report:
                                                precision    recall  f1-score   support

Insulation Breakdown or Cooling System Failure       1.00      1.00      1.00       194
           Mechanical Wear or Loose Components       1.00      1.00      1.00       207
                              Normal Operation       1.00      1.00      1.00      1398
                   Overload or Imbalanced Load       1.00      1.00      1.00       201

                                      accuracy                           1.00      2000
                                     macro avg       1.00      1.00      1.00      2000
                                  weighted avg       1.00      1.00      1.00      2000



### Training Laptop model

In [25]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib

# Load the synthetic laptop data
laptop_data = pd.read_csv('synthetic_laptop_data.csv')

# Define features and target
features = ['cpu_usage', 'memory_usage', 'disk_health', 'battery_health', 'uptime']
target = 'issue_category'

X = laptop_data[features]
y = laptop_data[target]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Logistic Regression model to predict categories
laptop_model = LogisticRegression(random_state=42, max_iter=1000)
laptop_model.fit(X_train, y_train)

#Save the model and ouput a classification report
joblib.dump(laptop_model, 'laptop_model.pkl')
y_pred = laptop_model.predict(X_test)
report = classification_report(y_test, y_pred)
with open('laptop_classification_report.txt', 'w') as file:
    file.write(report)
print("Laptop Model Classification Report:")
print(report)

Laptop Model Classification Report:
                                      precision    recall  f1-score   support

  CPU Overload or Thermal Throttling       1.00      1.00      1.00       226
Imminent Disk Failure or Bad Sectors       1.00      1.00      1.00       201
     Memory Leak or Insufficient RAM       1.00      1.00      1.00       217
                    Normal Operation       1.00      1.00      1.00      1356

                            accuracy                           1.00      2000
                           macro avg       1.00      1.00      1.00      2000
                        weighted avg       1.00      1.00      1.00      2000



## Simulating Real-World Scenarios

### This Sectrion will aim to simulate a real world scenario where data is being given to a model in real time, and the model is then used to make predictions on what the possible failures in each system are if any exist. For this we will have two seperate fucntions for each scenario (laptops and transformers) one simulating normal operation and one simulating failures.

### Transformers

In [27]:
import joblib
import pandas as pd
import numpy as np

# Load the transformer model
transformerModel = joblib.load('transformer_model.pkl')

# Simulate 5 well-operating transformers
wellOperatingTransformers = pd.DataFrame({
    'equipment_id': np.arange(1, 6),
    'temperature': np.random.uniform(65, 75, 5),
    'vibration': np.random.uniform(0.01, 0.05, 5),
    'load': np.random.uniform(40, 60, 5),
    'ambientTemp': np.random.uniform(25, 35, 5),
    'humidity': np.random.uniform(40, 60, 5)
})

# Simulate specific failure conditions for 5 poorly-operating transformers
# Transformer 1: High temperature (cooling system failure)
transformer1 = pd.DataFrame({
    'equipment_id': [6],
    'temperature': np.random.uniform(85, 100, 1),
    'vibration': np.random.uniform(0.01, 0.05, 1),
    'load': np.random.uniform(40, 60, 1),
    'ambientTemp': np.random.uniform(25, 35, 1),
    'humidity': np.random.uniform(40, 60, 1)
})

# Transformer 2: High vibration (mechanical wear)
transformer2 = pd.DataFrame({
    'equipment_id': [7],
    'temperature': np.random.uniform(65, 75, 1),
    'vibration': np.random.uniform(0.2, 0.4, 1),
    'load': np.random.uniform(40, 60, 1),
    'ambientTemp': np.random.uniform(25, 35, 1),
    'humidity': np.random.uniform(40, 60, 1)
})

# Transformer 3: High load (overload)
transformer3 = pd.DataFrame({
    'equipment_id': [8],
    'temperature': np.random.uniform(65, 75, 1),
    'vibration': np.random.uniform(0.01, 0.05, 1),
    'load': np.random.uniform(75, 100, 1),
    'ambientTemp': np.random.uniform(25, 35, 1),
    'humidity': np.random.uniform(40, 60, 1)
})

# Transformer 4: Combination of high temperature and vibration
transformer4 = pd.DataFrame({
    'equipment_id': [9],
    'temperature': np.random.uniform(85, 100, 1),
    'vibration': np.random.uniform(0.2, 0.4, 1),
    'load': np.random.uniform(40, 60, 1),
    'ambientTemp': np.random.uniform(25, 35, 1),
    'humidity': np.random.uniform(40, 60, 1)
})

# Transformer 5: All normal conditions but slightly elevated temperature
transformer5 = pd.DataFrame({
    'equipment_id': [10],
    'temperature': np.random.uniform(75, 85, 1),
    'vibration': np.random.uniform(0.01, 0.05, 1),
    'load': np.random.uniform(40, 60, 1),
    'ambientTemp': np.random.uniform(25, 35, 1),
    'humidity': np.random.uniform(40, 60, 1)
})

simulatedTransformerData = pd.concat([wellOperatingTransformers, transformer1, transformer2, transformer3, transformer4, transformer5])

features = ['temperature', 'vibration', 'load', 'ambientTemp', 'humidity']
simulatedTransformerData['predicted_category'] = transformerModel.predict(simulatedTransformerData[features])

# Display results
print("Simulated Transformers:")
print(simulatedTransformerData[['equipment_id', 'predicted_category']])


Simulated Transformers:
   equipment_id                              predicted_category
0             1                                Normal Operation
1             2                                Normal Operation
2             3                                Normal Operation
3             4                                Normal Operation
4             5                                Normal Operation
0             6  Insulation Breakdown or Cooling System Failure
0             7             Mechanical Wear or Loose Components
0             8                     Overload or Imbalanced Load
0             9             Mechanical Wear or Loose Components
0            10  Insulation Breakdown or Cooling System Failure


### Laptops

In [35]:
import joblib
import pandas as pd
import numpy as np

# Load the laptop model
laptopModel = joblib.load('laptop_model.pkl')

# Simulate 5 well-operating laptops
wellOperatingLaptops = pd.DataFrame({
    'laptop_id': np.arange(1, 6),
    'cpu_usage': np.random.uniform(20, 40, 5),
    'memory_usage': np.random.uniform(40, 60, 5),
    'disk_health': np.random.uniform(85, 100, 5),
    'battery_health': np.random.uniform(80, 100, 5),
    'uptime': np.random.uniform(10, 50, 5)
})

# Simulate specific failure conditions for 5 poorly-operating laptops

# Laptop 1: High CPU usage (CPU overload)
laptop1 = pd.DataFrame({
    'laptop_id': [6],
    'cpu_usage': np.random.uniform(85, 100, 1),
    'memory_usage': np.random.uniform(40, 60, 1),
    'disk_health': np.random.uniform(85, 100, 1),
    'battery_health': np.random.uniform(80, 100, 1),
    'uptime': np.random.uniform(10, 50, 1)
})

# Laptop 2: High memory usage (memory leak)
laptop2 = pd.DataFrame({
    'laptop_id': [7],
    'cpu_usage': np.random.uniform(20, 40, 1),
    'memory_usage': np.random.uniform(85, 100, 1),
    'disk_health': np.random.uniform(85, 100, 1),
    'battery_health': np.random.uniform(80, 100, 1),
    'uptime': np.random.uniform(10, 50, 1)
})

# Laptop 3: Low disk health (imminent disk failure)
laptop3 = pd.DataFrame({
    'laptop_id': [8],
    'cpu_usage': np.random.uniform(20, 40, 1),
    'memory_usage': np.random.uniform(40, 60, 1),
    'disk_health': np.random.uniform(0, 50, 1),
    'battery_health': np.random.uniform(80, 100, 1),
    'uptime': np.random.uniform(10, 50, 1)
})

# Laptop 5: High CPU and memory usage, indicating potential overall system overload
laptop5 = pd.DataFrame({
    'laptop_id': [9],
    'cpu_usage': np.random.uniform(85, 100, 1),
    'memory_usage': np.random.uniform(85, 100, 1),
    'disk_health': np.random.uniform(85, 100, 1),
    'battery_health': np.random.uniform(80, 100, 1),
    'uptime': np.random.uniform(10, 50, 1)
})

# Combine all simulated laptops
simulatedLaptopData = pd.concat([wellOperatingLaptops, laptop1, laptop2, laptop3, laptop5])

# Predict the issue category
features = ['cpu_usage', 'memory_usage', 'disk_health', 'battery_health', 'uptime']
simulatedLaptopData['predicted_issue_category'] = laptopModel.predict(simulatedLaptopData[features])

# Display results
print("Simulated Laptops:")
print(simulatedLaptopData[['laptop_id', 'predicted_issue_category']])


Simulated Laptops:
   laptop_id              predicted_issue_category
0          1                      Normal Operation
1          2                      Normal Operation
2          3                      Normal Operation
3          4                      Normal Operation
4          5                      Normal Operation
0          6    CPU Overload or Thermal Throttling
0          7       Memory Leak or Insufficient RAM
0          8  Imminent Disk Failure or Bad Sectors
0          9    CPU Overload or Thermal Throttling
