# Machine Fault Classification with MAFAULDA

This notebook explores data from [MAFAULDA](https://www02.smt.ufrj.br/~offshore/mfs/page_01.html) which is a multivariate time-series dataset acquired by sensors on Machinery Fault Simulator (MFS) Alignment-Balance-Vibration (ABVT). Data from [Kaggle](https://www.kaggle.com/datasets/uysalserkan/fault-induction-motor-dataset/data) includes two different simulated states: normal function and imbalance fault. 

The objective is to build a time-series classification model to distinguish between normal and imbalance motor condition.

#### Data are stored in CSV files and saved in a label-based folder structure

In [1]:
!tree ../data/. -L 3 -I "*.csv"

[01;34m../data/.[0m
├── [01;34mimbalance[0m
│   └── [01;34mimbalance[0m
│       ├── [01;34m10g[0m
│       ├── [01;34m15g[0m
│       ├── [01;34m20g[0m
│       ├── [01;34m25g[0m
│       ├── [01;34m30g[0m
│       ├── [01;34m35g[0m
│       └── [01;34m6g[0m
└── [01;34mnormal[0m
    └── [01;34mnormal[0m

12 directories, 0 files


#### From the folder structure we can extract the labels

In [2]:
import os

labels = []
for dirname, _, filenames in os.walk("../data"):
    for filename in filenames:
        file_addr = os.path.join(dirname, filename)
        class_name = "-".join(file_addr.split('.csv')[0].split("/")[-3:-1])
        labels.append(class_name)

print(set(labels))

{'imbalance-30g', 'imbalance-15g', 'imbalance-25g', 'imbalance-20g', 'imbalance-35g', 'imbalance-6g', 'normal-normal', 'imbalance-10g'}


## Data Frame

According with their website:
- There are 8 features: 'tachometer', 'underhang_axial', 'underhang_radiale', 'underhang_tangential', 'overhang_axial', 'overhang_radiale', 'overhang_tangential', 'microphone'.
- Each sequence was generated at 50 kHz sampling rate during 5 s, totaling 250.000 samples.

Let's instanciate and concatenate all dataframes that belongs to normal functioning label.

In [7]:
import os
import pandas as pd
from random import sample

col_names = [
    'tachometer', 'underhang_axial', 'underhang_radiale', 'underhang_tangential',
    'overhang_axial', 'overhang_radiale', 'overhang_tangential', 'microphone'
]

normal_dfs = []
for dirname, _, filenames in os.walk("../data"):
    for filename in filenames:
        file_addr = os.path.join(dirname, filename)
        if file_addr.endswith('.csv'):
            # Infer labels from folder structure
            label = "-".join(file_addr.split('.csv')[0].split("/")[-3:-1])

            if "normal" in label:
                print(file_addr)
                df = pd.read_csv(file_path, names=col_names)
                normal_dfs.append(df)

print([f"{df.shape}" for df in normal_dfs])

../data/normal/normal/12.288.csv
../data/normal/normal/13.1072.csv
../data/normal/normal/14.336.csv
../data/normal/normal/15.1552.csv
../data/normal/normal/16.1792.csv
../data/normal/normal/17.2032.csv
../data/normal/normal/18.432.csv
../data/normal/normal/19.6608.csv
../data/normal/normal/20.2752.csv
../data/normal/normal/21.7088.csv
../data/normal/normal/22.3232.csv
../data/normal/normal/23.552.csv
../data/normal/normal/24.576.csv
../data/normal/normal/25.6.csv
../data/normal/normal/26.624.csv
../data/normal/normal/27.4432.csv
../data/normal/normal/28.8768.csv
../data/normal/normal/29.4912.csv
../data/normal/normal/30.72.csv
../data/normal/normal/31.744.csv
../data/normal/normal/32.9728.csv
../data/normal/normal/33.5872.csv
../data/normal/normal/34.2016.csv
../data/normal/normal/35.4304.csv
../data/normal/normal/36.4544.csv
../data/normal/normal/37.6832.csv
../data/normal/normal/38.2976.csv
../data/normal/normal/39.3216.csv
../data/normal/normal/40.3456.csv
../data/normal/normal/41.7

In [8]:
normal_df = pd.concat(normal_dfs, ignore_index=True)

normal_df.describe()

Unnamed: 0,tachometer,underhang_axial,underhang_radiale,underhang_tangential,overhang_axial,overhang_radiale,overhang_tangential,microphone
count,12250000.0,12250000.0,12250000.0,12250000.0,12250000.0,12250000.0,12250000.0,12250000.0
mean,0.001024785,0.0295372,0.0007297756,0.000831716,0.009245404,0.005104041,0.03855342,0.01462299
std,1.597937,1.238304,0.3365933,0.04673923,0.1393458,0.0359269,0.6781154,0.1227685
min,-1.1591,-4.5898,-2.0037,-0.21865,-0.61157,-0.14595,-2.0376,-0.26864
25%,-0.60527,-0.96484,-0.2714825,-0.031772,-0.08811175,-0.019042,-0.442525,-0.088734
50%,-0.550755,0.167885,0.032534,0.0033034,0.023947,0.0065705,0.0507045,-0.0045009
75%,-0.4748875,1.177825,0.31216,0.034506,0.11653,0.031143,0.4879,0.10113
max,5.093,2.0097,0.67768,0.22705,0.34316,0.10466,2.3503,0.60896


In [4]:
from ydata_profiling import ProfileReport

profile = ProfileReport(normal_df, title="Normal Data")
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
profile.to_file("normal_data_report.html")

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
len(normal_dfs)

49