# Multivariate Classification


##### Reference : https://sktime-backup.readthedocs.io/en/v0.13.3/examples/02_classification.html


### Import Dependencies

In [29]:
# general 
import numpy as np
import pandas as pd
!pip install sktime

# for data pre-processing
from sklearn.model_selection import train_test_split

# for model evaluation
from sklearn.metrics import multilabel_confusion_matrix, accuracy_score, confusion_matrix, classification_report



### Load datasets

Edit path for your computer

In [None]:
# load the data
X = np.load("../data/X-data.npy")
y = np.load("../data/y-data.npy")
#make y 1-Dimensional because this is what SKTIME wants
y = np.argmax(y, axis=1)

#defining signs --> edit for specific subset of data
actions = np.array ( ['alligator', 'radio', 'moon', 'sleep', 'grandpa', 'tiger', 'pencil', 'sleepy', 'grandma', 'chocolate'])

### Splitting Train and Test Data

In [31]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

### First Model: DrCIF Algorithm
- The DrCIF algorithm stands for "Distribution-Radius-based Classifier with Interval Features". It is a time series classification algorithm that is based on the random forest of decision trees. The algorithm uses random intervals of the time series data to capture important temporal patterns and build decision trees on top of them. The DrCIF classifier from sktime.classification.interval_based module is an implementation of the DrCIF algorithm. In the code you provided, the n_estimators parameter of DrCIF is set to 10, which means that the algorithm will use a random forest of 10 decision trees.

### Performing time series classification using the DrCIF algorithm

### keep in mind for DrCIfF time running is more than 1h. 

In [16]:
from sktime.classification.interval_based import DrCIF
from sktime.transformations.panel.compose import ColumnConcatenator

clf = ColumnConcatenator() * DrCIF(n_estimators=10)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.46987951807228917

### Second model (baseline model): TimeSeriesForestClassifier:
The TimeSeriesForestClassifier is another implementation of the random forest algorithm for time series classification, available in sktime.classification.interval_based module. Unlike DrCIF, this algorithm does not use interval-based features, but instead it applies random feature selection and random subspace projection to the input data. The n_estimators parameter of TimeSeriesForestClassifier is set to 5, which means that the algorithm will use a random forest of 5 decision trees.

In [38]:
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.transformations.panel.compose import ColumnConcatenator

clf = ColumnConcatenator() * TimeSeriesForestClassifier(n_estimators=5)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

  warn(msg)
  warn(msg)


0.3614457831325301

### keep in mind executing RandomizedSearchCV for Time Series forest Classifier takes 3h. 

In [82]:
from sklearn.model_selection import RandomizedSearchCV
#from sktime.classification.interval_based import TimeSeriesForestClassifier
#from sktime.transformations.panel.compose import ColumnConcatenator

# Define the parameter grid for the time series forest classifier
param_grid = {
    'tsf__n_estimators': [10, 20, 30],
    'tsf__max_depth': [None, 10, 20],
    'tsf__min_interval': [1, 3, 5]
}

# Define the pipeline with ColumnConcatenator and TimeSeriesForestClassifier
tsf_clf = ColumnConcatenator() * TimeSeriesForestClassifier()

#clf = RandomizedSearchCV(rocket, parameters, random_state=42, n_iter=5, verbose=True)
clf= RandomizedSearchCV(
    tsf_clf,
    param_distributions=param_grid,
    n_iter=5, 
    cv=5,  
    #n_jobs=-1,  
    random_state=42,
    verbose=True,
   
)

search=clf.fit(X_train, y_train)
#search.fit(X_train, y_train)


print(search.best_params_)


#best_clf = search.best_estimator_
#accuracy = best_clf.score(X_test, y_test)
#print("Accuracy of the best model on the test set: {:.2f}%".format(accuracy * 100))


Fitting 5 folds for each of 5 candidates, totalling 25 fits




{'tsf__n_estimators': 10, 'tsf__min_interval': 1, 'tsf__max_depth': None}


The hyperparameters found are:

- tsf__n_estimators: The number of trees in the random forest ensemble. In our case, it suggests using 10 trees.
- tsf__min_interval: The minimum time lag to consider for the time series forest. In our case, it suggests using a minimum interval of 1.
- tsf__max_depth: The maximum depth of the trees in the forest. In our case, it suggests using None, which means that the trees will be grown until all the leaves are pure.
- These hyperparameters can be used by the model to make predictions on new data. 

### Muse classifier((Multi-Scale Shapelet Ensemble))

- The MUSE classifier uses a bag-of-words approach, where a set of shapelets are learned from the training data to represent patterns in the time series. The shapelets are extracted at different scales and used to build a dictionary, which is used to map time series to bag-of-words representations. The bag-of-words representations are then used to train an ensemble of decision trees. During testing, the MUSE classifier extracts the bag-of-words representation of the test time series and uses the decision trees to make predictions. Overall, the MUSE classifier provides an effective and efficient way to perform time series classification with high accuracy.

In [39]:
from sktime.classification.dictionary_based import MUSE
from sktime.transformations.panel.compose import ColumnConcatenator

clf = ColumnConcatenator() * MUSE(window_inc=4, use_first_order_differences=False) 
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

  self._fit(X, y)


0.5301204819277109

### PCATransformer classifier
- The PCATransformer transformer is used to perform PCA on the input time series data, reducing its dimensionality while retaining the most important information. The transformed data is then used to train a classifier, such as the TimeSeriesForestClassifier, to perform classification.
Overall, the use of PCA in the classification pipeline can help improve the accuracy and efficiency of the classifier by reducing the dimensionality of the input data while retaining important information.

In [36]:
from sktime.transformations.panel.pca import PCATransformer
from sktime.classification.interval_based import TimeSeriesForestClassifier

from sktime.classification.compose import ClassifierPipeline

pipeline = ColumnConcatenator() * ClassifierPipeline(
    TimeSeriesForestClassifier(n_estimators=5), [PCATransformer()]


)
pipeline.fit(X_train, y_train)

  warn(msg)


ClassifierPipeline(classifier=TimeSeriesForestClassifier(n_estimators=5),
                   transformers=[ColumnConcatenator(), PCATransformer()])