# Project Name :

# Predictive-Maintenance-Model

## Understand the Problem and Plan

### Objective

#### Predict the remaining time (MTTF) before failure for industrial equipment using provided sensor and operational data.

### Approach

#### 1.Analyze data for insights.
#### 2.Preprocess for quality.
#### 3.Build and validate a machine learning model.
#### 4.Optimize and deploy for real-world use.

## Load and Explore the Data

In [132]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [133]:
# Load the dataset
data = pd.read_excel('PM DATA Training.xlsx')

# Explore the first few rows
print(data.head())


   UID      ProductType  Humidity  Temperature  Age  Quantity  MTTF 
0    1         Extruder      5.88        66.17   13     39764     69
1    2  Pressure Cutter     42.76        40.29    4     45181    532
2    3         Extruder     76.62        52.08    4     70397     93
3    4             Pump     45.91        90.26   14     49470    183
4    5    Gauge Machine     78.87        58.56   12     45145    447


In [134]:
data.describe(include="all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
UID,5000.0,,,,2500.5,1443.520003,1.0,1250.75,2500.5,3750.25,5000.0
ProductType,5000.0,5.0,Gauge Machine,1029.0,,,,,,,
Humidity,5000.0,,,,52.344846,27.617841,5.0,28.3725,52.31,76.58,99.98
Temperature,5000.0,,,,64.53768,17.523957,35.01,49.5375,64.395,79.81,95.0
Age,5000.0,,,,8.9946,4.595911,1.0,5.0,9.0,13.0,17.0
Quantity,5000.0,,,,51222.923,16526.348628,23007.0,36737.0,51061.0,65559.5,79995.0
MTTF,5000.0,,,,316.26,155.19153,50.0,180.0,315.0,453.0,585.0


In [135]:
# Check for missing values and data types
print(data.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   UID          5000 non-null   int64  
 1   ProductType  5000 non-null   object 
 2   Humidity     5000 non-null   float64
 3   Temperature  5000 non-null   float64
 4   Age          5000 non-null   int64  
 5   Quantity     5000 non-null   int64  
 6   MTTF         5000 non-null   int64  
dtypes: float64(2), int64(4), object(1)
memory usage: 273.6+ KB
None


In [136]:
print(data.isnull().sum())
print(data.shape)

UID            0
ProductType    0
Humidity       0
Temperature    0
Age            0
Quantity       0
MTTF           0
dtype: int64
(5000, 7)


## Data Cleaning

#### Handle Missing Values: Replace or drop missing data.
#### Check for Duplicates: Remove duplicates if found.

In [137]:
# Drop duplicate rows
data = data.drop_duplicates()

# Handle missing values (example: fill with mean for numerical data)
data['Humidity'] = data['Humidity'].fillna(data['Humidity'].mean())
data['Temperature'] = data['Temperature'].fillna(data['Temperature'].mean())

# Verify no missing values remain
print(data.isnull().sum())
print(data.shape)


UID            0
ProductType    0
Humidity       0
Temperature    0
Age            0
Quantity       0
MTTF           0
dtype: int64
(5000, 7)


## Feature Engineering

#### 1.Categorical Encoding: Convert ProductType into numerical values.
#### 2.Feature Scaling: Scale features like Humidity, Temperature, Age, and Quantity.
#### 3.Feature Selection: Select only relevant columns.

In [138]:
# Define input (X) and target (y)
X = data[['ProductType', 'Humidity', 'Temperature', 'Age']]
y = data['MTTF ']

In [139]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

# Make a preprocessor:
# - One-hot encode the 'ProductType' column (text data)
# - Scale the number columns (make values smaller and equal size)
preprocessor = ColumnTransformer(transformers=[
    ('cat', OneHotEncoder(), ['ProductType']),  # for text column
    ('num', StandardScaler(), ['Humidity', 'Temperature', 'Age'])  # for number columns
])

# Now apply this preprocessor to your data X
X = preprocessor.fit_transform(X)




## Split Data for Training & Testing

In [140]:
from sklearn.model_selection import train_test_split

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



## Train Multiple Models

### Linear Regression:

In [141]:
from sklearn.linear_model import LinearRegression

# Initialize and train a Linear Regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
print(linear_model.score(X_train, y_train)
)
# Predict on test data
y_pred_lr = linear_model.predict(X_test)


0.0022974171138973043


### Decision Tree Regressor:

In [142]:
from sklearn.tree import DecisionTreeRegressor

# Initialize and train a Decision Tree Regressor model
dt_model = DecisionTreeRegressor()
dt_model.fit(X_train, y_train)
print(dt_model.score(X_train, y_train)*100, "%"
)

# Predict on test data
y_pred_dtr = dt_model.predict(X_test)


100.0 %


### Random Forest Regressor:

In [143]:
from sklearn.ensemble import RandomForestRegressor

# Initialize and train a Random Forest Regressor model
rf_model = RandomForestRegressor()
rf_model.fit(X_train, y_train)
print(rf_model.score(X_train, y_train)*100, "%"
)

# Predict on test data
y_pred_rfr = rf_model.predict(X_test)


84.51445254261064 %


### Support Vector Regressor (SVR):

In [144]:
from sklearn.svm import SVR

# Initialize and train a Support Vector Regressor model
svr_model = SVR()
svr_model.fit(X_train, y_train)
print(svr_model.score(X_train, y_train)*100, "%"
)

# Predict on test data
y_pred_svr = svr_model.predict(X_test)


0.5836855437930666 %


## Evaluate the Models

In [145]:


# Evaluate performance
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

print("Linear Regression")
rmse = np.sqrt(mean_squared_error(y_test, y_pred_lr))
print("RMSE:", rmse)
print("MAE:", mean_absolute_error(y_test, y_pred_lr))
print("R2 Score:", r2_score(y_test, y_pred_lr))
print("\n")

print("Decision Tree Regressor")
rmse = np.sqrt(mean_squared_error(y_test, y_pred_dtr))
print("RMSE:", rmse)
print("MAE:", mean_absolute_error(y_test, y_pred_dtr))
print("R2 Score:", r2_score(y_test, y_pred_dtr))
print("\n")

print("Random Forest Regressor")
rmse = np.sqrt(mean_squared_error(y_test, y_pred_rfr))
print("RMSE:", rmse)
print("MAE:", mean_absolute_error(y_test, y_pred_rfr))
print("R2 Score:", r2_score(y_test, y_pred_rfr))
print("\n")

print("Support Vector Regressor (SVR)")
rmse = np.sqrt(mean_squared_error(y_test, y_pred_svr))
print("RMSE:", rmse)
print("MAE:", mean_absolute_error(y_test, y_pred_svr))
print("R2 Score:", r2_score(y_test, y_pred_svr))
print("\n")




Linear Regression
RMSE: 156.78295971256722
MAE: 136.21529220392097
R2 Score: -0.005381182199798884


Decision Tree Regressor
RMSE: 222.3860000089934
MAE: 184.023
R2 Score: -1.0227766030581145


Random Forest Regressor
RMSE: 164.27008733241726
MAE: 140.53838000000002
R2 Score: -0.10369737821131308


Support Vector Regressor (SVR)
RMSE: 156.96919494780988
MAE: 136.46049089876982
R2 Score: -0.007771092514652134




## Save the Model

In [146]:
best_model = dt_model.fit(X_train, y_train)

In [147]:
import joblib

# Save the trained model
joblib.dump(best_model, 'predictive_maintenance_model.pkl')
joblib.dump(preprocessor, 'preprocessor.pkl')



['preprocessor.pkl']