# Machine Learning for Predecttive Maintenance

## Introduction

### Due to the lack of existing real-world data, we will generate synthetic data to simulate the operating conditions and potential failures of industrial transformers and company laptops. The synthetic data will include various features such as temperature, vibration, load, ambient temperature, and humidity for transformers, and CPU usage, memory usage, disk health, and uptime for laptops.

### The main steps in this notebook include:
##### 1. **Data Augmentation:** Creating synthetic data that mimics real-world scenarios, including both normal operating conditions and potential failure conditions.
##### 2. **Feature Engineering:** Creating additional features that help improve the predictive power of the model.
##### 3. **Model Training:** Training machine learning models using the synthetic data to predict equipment failures.
##### 4. **Real-time Data Integration:** Simulating real-time data collection and feeding it into the model for continuous predictions.
##### 5. **Monitoring and Alerting:** Setting up mechanisms to monitor model predictions and generate alerts for potential failures.

## Let's get started!

## Data Augmentation
#### Synthetic data is generated at random within specfic ranges for each of the features. Additionally, a failure type label, and failure data are added to the data to simulate a fault. Finally, the data is shuffled to ensure that the data is not ordered in a way that would lead to a biased model. 

### Transformer Data Augmentation

In [None]:
import numpy as np
import pandas as pd

# Define the number of samples
numSamples = 10000

# Generate equipment IDs
equipment_ids = np.arange(1, numSamples + 1)

# Generate normal operating conditions
normalTemperature = np.random.uniform(60, 80, numSamples)
normalVibration = np.random.uniform(0, 0.1, numSamples)
normalLoad = np.random.uniform(30, 70, numSamples)
normalAmbient_temp = np.random.uniform(20, 40, numSamples)
normalHumidity = np.random.uniform(30, 70, numSamples)

# Generate abnormal conditions (failures)
abnormalTemperature = np.random.uniform(80, 100, int(numSamples * 0.1))
abnormalVibration = np.random.uniform(0.1, 0.5, int(numSamples * 0.1))
abnormalLoad = np.random.uniform(70, 100, int(numSamples * 0.1))
abnormalAmbientTemp = np.random.uniform(40, 60, int(numSamples * 0.1))
abnormalHumidity = np.random.uniform(70, 100, int(numSamples * 0.1))

temperature = np.concatenate((normalTemperature, abnormalTemperature))
vibration = np.concatenate((normalVibration, abnormalVibration))
load = np.concatenate((normalLoad, abnormalLoad))
ambientTemp = np.concatenate((normalAmbient_temp, abnormalAmbientTemp))
humidity = np.concatenate((normalHumidity, abnormalHumidity))

# Create categorical labels based on the conditions
def categorizeFailure(temp, vib, load):
    if temp > 80:
        return "Insulation Breakdown or Cooling System Failure"
    elif vib > 0.1:
        return "Mechanical Wear or Loose Components"
    elif load > 70:
        return "Overload or Imbalanced Load"
    else:
        return "Normal Operation"

failureCategory = [categorizeFailure(t, v, l) for t, v, l in zip(temperature, vibration, load)]

transformer_data = pd.DataFrame({
    'equipment_id': equipment_ids,
    'temperature': temperature,
    'vibration': vibration,
    'load': load,
    'ambient_temp': ambientTemp,
    'humidity': humidity,
    'failure_category': failureCategory
})

# Shuffle data
transformer_data = transformer_data.sample(frac=1).reset_index(drop=True)

# Save data
transformer_data.to_csv('synthetic_transformer_data.csv', index=False)

print(transformer_data.head())



### Laptop Data Augmentation

In [None]:
import numpy as np
import pandas as pd

# Define the number of samples
numSamples = 10000

# Generate equipment IDs
laptop_ids = np.arange(1, numSamples + 1)

# Generate normal operating conditions
normalCpuUsage = np.random.uniform(10, 50, numSamples)
normalMemoryUsage = np.random.uniform(30, 70, numSamples)
normalDiskHealth = np.random.uniform(80, 100, numSamples)
normalBatteryHealth = np.random.uniform(70, 100, numSamples)
normalUptime = np.random.uniform(0, 100, numSamples)

# Generate abnormal conditions (crashes)
abnormalCpuUsage = np.random.uniform(70, 100, int(numSamples * 0.1))
abnormalMemoryUsage = np.random.uniform(75, 100, int(numSamples * 0.1))
abnormalDiskHealth = np.random.uniform(0, 50, int(numSamples * 0.1))
abnormalBatteryHealth = np.random.uniform(0, 50, int(numSamples * 0.1))
abnormalUptime = np.random.uniform(150, 200, int(numSamples * 0.1))

cpuUsage = np.concatenate((normalCpuUsage, abnormalCpuUsage))
memoryUsage = np.concatenate((normalMemoryUsage, abnormalMemoryUsage))
diskHealth = np.concatenate((normalDiskHealth, abnormalDiskHealth))
batteryHealth = np.concatenate((normalBatteryHealth, abnormalBatteryHealth))
uptime = np.concatenate((normalUptime, abnormalUptime))

# Create categorical labels based on the conditions
def categorize_issue(cpu, mem, disk):
    if cpu > 85:
        return "CPU Overload or Thermal Throttling"
    elif cpu > 70:
        return "CPU Failure or Cooling System Degradation"
    elif mem > 85:
        return "Memory Leak or Insufficient RAM"
    elif disk < 50:
        return "Imminent Disk Failure or Bad Sectors"
    elif disk < 70:
        return "Disk Capacity Exhaustion or Fragmentation"
    else:
        return "Normal Operation"

issue_category = [categorize_issue(c, m, d) for c, m, d in zip(cpuUsage, memoryUsage, diskHealth)]

laptop_data = pd.DataFrame({
    'laptop_id': laptop_ids,
    'cpu_usage': cpuUsage,
    'memory_usage': memoryUsage,
    'disk_health': diskHealth,
    'battery_health': batteryHealth,
    'uptime': uptime,
    'issue_category': issue_category
})

# Shuffle data
laptop_data = laptop_data.sample(frac=1).reset_index(drop=True)

laptop_data.to_csv('synthetic_laptop_data.csv', index=False)

print(laptop_data.head())


## Training Modles for Predictive Maintenance

### Here models will be trained on both the laptop data and the Transformers data. THe models will then be dumped (saved) in order to be used later on.

### Training transformer Model

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib

# Load synthetic transformer data
transformer_data = pd.read_csv('synthetic_transformer_data.csv')

# Defining features and target
features = ['temperature', 'vibration', 'load', 'ambient_temp', 'humidity']
target = 'failure'

X = transformer_data[features]
y = transformer_data[target]

# Split data 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model using Random Forest
transformer_model = RandomForestClassifier(n_estimators=100, random_state=42)
transformer_model.fit(X_train, y_train)

#Save the model and ouput a classification report
joblib.dump(transformer_model, 'transformer_model.pkl')
y_pred = transformer_model.predict(X_test)
report = classification_report(y_test, y_pred)
with open('transformer_classification_report.txt', 'w') as file:
    file.write(report)
print("Transformer Model Classification Report:")
print(report)


### Training Laptop model

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib

# Load synthetic laptop data
laptop_data = pd.read_csv('synthetic_laptop_data.csv')

# Define features and target
features = ['cpu_usage', 'memory_usage', 'disk_health', 'battery_health', 'uptime']
target = 'system_crash'

X = laptop_data[features]
y = laptop_data[target]

# Split data 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train using Logistic Regression model
laptop_model = LogisticRegression(random_state=42, max_iter=1000)
laptop_model.fit(X_train, y_train)

#Save the model and ouput a classification report
joblib.dump(laptop_model, 'laptop_model.pkl')
y_pred = laptop_model.predict(X_test)
report = classification_report(y_test, y_pred)
with open('laptop_classification_report.txt', 'w') as file:
    file.write(report)
print("Laptop Model Classification Report:")
print(report)