# Multiple Classifier Approach

This Jupyter notebook provides a full example of the Multiple Classifier Approach.

## Imports and setup:

The following code cell deals with all the imports and initial setup. The seed of the numpy random number generator is fixed to create reproducible results and the ray-tune framework is initialized.

In [1]:
from ml4pdm.evaluation.metrics import loss_asymmetric, score_performance, loss_false_positive_rate, loss_false_negative_rate
from ml4pdm.evaluation import Evaluator
from ml4pdm.parsing import DatasetParser
from ml4pdm.prediction import MultipleClassifierApproach
from ml4pdm.transformation import AttributeFilter, SklearnWrapper
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import ray
from ray import tune
from ray.tune.suggest.bohb import TuneBOHB

np.random.seed(2)
ray.init(include_dashboard=False)

{'node_ip_address': '192.168.0.18',
 'raylet_ip_address': '192.168.0.18',
 'redis_address': '192.168.0.18:6379',
 'object_store_address': 'tcp://127.0.0.1:51249',
 'raylet_socket_name': 'tcp://127.0.0.1:56580',
 'webui_url': None,
 'session_dir': 'C:\\Users\\Paul\\AppData\\Local\\Temp\\ray\\session_2021-09-20_12-04-10_575119_7520',
 'metrics_export_port': 61573,
 'node_id': 'e28552be6a2cc938aa48d694eebb7f132f5757b0b717da4f61c60c36'}

## Prepare dataset:

The base for the datasets is the CMAPSS FD001. The train and test datasets are prepared by removing non-changing as well as settings features. After that a min max scaling is also applied per feature.

In [2]:
train_dataset, test_dataset = DatasetParser.get_cmapss_data(test = True)
train_dataset = AttributeFilter.remove_features(train_dataset, [1, 2, 3, 4, 8, 13, 19, 22])
test_dataset = AttributeFilter.remove_features(test_dataset, [1, 2, 3, 4, 8, 13, 19, 22])
scaling = SklearnWrapper(MinMaxScaler(), SklearnWrapper.extract_timeseries_concatenated, SklearnWrapper.rebuild_timeseries_concatenated)
train_dataset = scaling.fit_transform(train_dataset)
test_dataset = scaling.transform(test_dataset)

## Hyperparameter optimization:

The following code cell performs hyperparameter optimization using the ray-tune framework. The parameters "C" and "degree" are optimized using the asymmetric loss function.

In [None]:
def pipeline_training(config, data=None):
    mca = MultipleClassifierApproach(range(3,151,3), SVC, **config)
    mca.fit(data["train_dataset"])
    preds = mca.predict(data["test_dataset"])
    tune.report(loss=loss_asymmetric(data["test_dataset"].target, preds), mse=mean_squared_error(data["test_dataset"].target, preds))


data_dict = {
    "train_dataset": train_dataset,
    "test_dataset": test_dataset,
}

algo = TuneBOHB(seed=2, max_concurrent=6)

analysis = tune.run(
    tune.with_parameters(pipeline_training, data=data_dict),
    search_alg=algo,
    metric="loss",
    mode="min",
    num_samples=-1,
    time_budget_s=int(2*60*60),
    resources_per_trial={"cpu": 1},
    config={
        "C": tune.uniform(0.01, 10),
        "kernel": "poly",
        "degree": tune.choice(range(2,7,1)),
    }
)

best_config = analysis.get_best_config(metric="loss", mode="min")
print("Best config: ", best_config)


## Evaluate Approach:

The best config that was obtained in the above hyperparameter optimization is evaluated using various metrics. This allows for comparing it with other approaches.

In [3]:
best_config = {'C': 6.407663184896999, 'kernel': 'poly'}

train_dataset, test_dataset = DatasetParser.get_cmapss_data(test = True)

mca = make_pipeline(SklearnWrapper(MinMaxScaler(), SklearnWrapper.extract_timeseries_concatenated, SklearnWrapper.rebuild_timeseries_concatenated), 
                    MultipleClassifierApproach(range(15,151,15), SVC, **best_config))

evaluator = Evaluator(None, [mca], None, [loss_asymmetric, mean_squared_error, score_performance, mean_absolute_error,
                                          mean_absolute_percentage_error, loss_false_positive_rate, loss_false_negative_rate])

results = evaluator.evaluate_train_test_split(train_dataset, test_dataset)[0]
for i in [2,4,5,6]:
    results[i] *= 100
print("S:\t{:.2f}\nMSE:\t{:.2f}\nA(%):\t{:.2f}\nMAE:\t{:.2f}\nMAPE:\t{:.2f}\nFPR(%):\t{:.2f}\nFNR(%):\t{:.2f}".format(*results))

S:	1527.73
MSE:	510.64
A(%):	53.00
MAE:	16.72
MAPE:	28.52
FPR(%):	25.00
FNR(%):	22.00
