# Classification

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Import data from csv file `./data/PM_train.csv`

In [None]:
df = pd.read_csv('./data/PM_train.csv')

In [None]:
df.info()

## Feature engineering

Based on the input data description we have walked through in a previous section, an intuitive predictive maintenance question to ask is "Given these aircraft engine operation and failure events history, can we predict when an in-service engine will fail?"

We re-formulate this question into: How many more cycles an in-service engine will last before it fails?

Create new column based on above calculated maximum cycle count

In [None]:
df['RUL'] = df.groupby(['engine_id'])['cycle'].transform(np.max)
df.head()

Subtract the current cycle for each row

In [None]:
df['RUL'] = df.groupby(['engine_id'])['cycle'].transform(np.max) - df['cycle']
df.head()

Create some target variables for classification
1. Binary classification: Is this engine going to fail within w1 (e.g. 30) cycles?
2. Multi-class classification: Is this engine going to fail within the window [1, w0] (e.g. 1 to 15) cycles or to fail within the window [w0+1, w1] (e.g. 16 to 30) cycles, or it will not fail within w1 cycles?

In [None]:
df['label1'] = 1*(df['RUL'] <= 30)
df['label2'] = 1*(df['RUL'] <= 30) + 1*(df['RUL'] <= 15)

In [None]:
df

Generate a sample feature based on a rolling mean over `s2`

In [None]:
df['a2'] = df['s2'].rolling(5, min_periods=1).mean()
df.head()

Build this rolling mean feature as well as a standard deviation feature for all sensors

In [None]:
for i in range(1,22):
    df['a'+str(i)] = df.groupby('engine_id')['s'+str(i)].rolling(5, min_periods=1).mean().reset_index(drop=True)
    df['sd'+str(i)] = df.groupby('engine_id')['s'+str(i)].rolling(5, min_periods=1).std().reset_index(drop=True)

In [None]:
df.shape

Clean missing data

In [None]:
df.dropna(inplace=True)

In [None]:
df_copy = df.copy(deep=True)

Normalize all values to an interval between 0 and 1

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler() 
df.loc[:, df.columns != 'RUL'] = scaler.fit_transform(df.loc[:, df.columns != 'RUL']) 

In [None]:
df.head()

In [None]:
df.loc[:, ['engine_id', 'RUL', 'label1', 'label2']] = df_copy.loc[:, ['engine_id', 'RUL', 'label1', 'label2']]

Seperate DataFrame into one containing all features and another containing the target variable

In [None]:
df_X = df.drop(['engine_id', 'RUL', 'label1', 'label2'], axis=1)
df_X.info()

### Binary Classification

In [None]:
df_y = df['label1']
df_y

#### Sperate train and test data

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2)

#### Build a Decision Tree Classifier

**Step 1.** Import the model you want to use

**Step 2.** Make an instance of the Model and define parameters (optional)

**Step 3.** Training the model on the data, storing the information learned from the data.

**Step 4.** Predict labels for new data (new images)

In [None]:
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth=3, max_features='auto')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

model.score(X_test, y_test)

Calculate F1-Score

In [None]:
from sklearn.metrics import f1_score

f1_score(y_test, y_pred) 

Calculate Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_pred)

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print('Accuracy: %s' % model.score(X_test, y_test))
print('F1-Score: %s' % f1_score(y_test, y_pred))

In [None]:
from sklearn.svm import SVC

model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print('Accuracy: %s' % model.score(X_test, y_test))
print('F1-Score: %s' % f1_score(y_test, y_pred))

### Multi-class Classification

In [None]:
df_y = df['label2']
df_y

Sperate train and test data

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2)

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print('Accuracy: %s' % model.score(X_test, y_test))
print('F1-Score: %s' % f1_score(y_test, y_pred, average='macro'))

In [None]:
confusion_matrix(y_test, y_pred)