# Introduction to "real" data: aircraft engine

**Predictive maintenance** encompasses a variety of topics, including but not limited to: *failure prediction, failure diagnosis (root cause analysis), failure detection, failure type classification, and recommendation of mitigation or maintenance actions after failure*.

This predictive maintenance template focuses on the techniques used to predict when an in-service machine will fail, so that maintenance can be planned in advance. The template is comparable to real industry data, since it contains mainly sensor readings and operational data such as settings, flight cycles, engine identifiers, etc.

This template uses the example of **simulated aircraft engine run-to-failure events** to demonstrate the predictive maintenance modeling process. The implicit assumption of modeling data as done below is that the asset of interest has a **progressing degradation pattern**, which is reflected in the asset's sensor measurements. By examining the asset's sensor values over time, the machine learning algorithm can learn the relationship between the sensor values and changes in sensor values to the historical failures in order to predict failures in the future.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from pylab import rcParams
rcParams['figure.figsize'] = 30, 20

Import data from csv file `./data/PM_train.csv`

In [None]:
df = pd.read_csv('./data/PM_train.csv')

Inspect data

In [None]:
df.info()

Import from tsv file

In [None]:
df = pd.read_csv('./data/PM_train.txt', sep='\s+')
df = pd.read_csv('./data/PM_train.txt', delim_whitespace=True)

Import file with header lines

Display first lines of DataFrame

In [None]:
df.head()

Get basic information on column `engine_id`

In [None]:
df[['engine_id']].describe()

Plot some columns as line plots

In [None]:
df[['engine_id', 'cycle', 's1', 's2', 's3', 's4']].plot(subplots=True)

The data consists of multiple multivariate time series with `cycle` as the time unit, together with 21 sensor readings for each cycle. Each time series can be assumed as being generated from a different engine of the same type. Each engine is assumed to start with different degrees of initial wear and manufacturing variation, and this information is unknown to the user. 

In this simulated data, the engine is assumed to be operating normally at the start of each time series. It starts to degrade at some point during the series of the operating cycles. The degradation progresses and grows in magnitude. When a predefined threshold is reached, then the engine is considered unsafe for further operation. In other words, the last cycle in each time series can be considered as the failure point of the corresponding engine. Taking the sample training data shown in the following table as an example, the engine with id=1 fails at cycle 192, and engine with id=2 fails at cycle 287.

In [None]:
df.loc[(df['engine_id'] == 20), ['engine_id', 'cycle', 's2']].plot(subplots=True)