# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
46,0 -1.8552 1 -1.8490 2 -1.8452 3 ...
48,0 -1.9434 1 -1.8579 2 -1.8503 3 ...
38,0 -2.1322 1 -2.1192 2 -2.0902 3 ...
102,0 -1.9872 1 -1.9700 2 -1.9509 3 ...
5,0 -1.9828 1 -1.9789 2 -1.9373 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransfomer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:05<00:20,  5.07s/it]

Feature Extraction:  40%|████      | 2/5 [00:09<00:14,  4.99s/it]

Feature Extraction:  60%|██████    | 3/5 [00:15<00:10,  5.17s/it]

Feature Extraction:  80%|████████  | 4/5 [00:22<00:05,  5.60s/it]

Feature Extraction: 100%|██████████| 5/5 [00:26<00:00,  5.35s/it]

Feature Extraction: 100%|██████████| 5/5 [00:26<00:00,  5.37s/it]




Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,0.000277,250.000879,0.353207,0.004549,-0.000143,-0.095346,...,0.08151,0.08151,0.127671,0.138673,1.155607,1.512407,2.34281,3.057773,3.636715,4.04832
1,0.0,0.0,0.0,1.0,-1.4e-05,250.000034,0.327059,0.001584,-4.7e-05,0.21678,...,0.08151,0.08151,0.127671,0.127671,1.091466,1.455328,2.282876,2.972878,3.543566,4.003471
2,0.0,0.0,0.0,1.0,-0.000175,249.999868,0.367173,0.005356,5.4e-05,-0.083815,...,0.08151,0.092513,0.138673,0.184769,1.173798,1.469656,2.234479,2.851525,3.339198,3.74121
3,0.0,1.0,0.0,1.0,0.000147,249.999068,0.331236,0.006163,-0.000114,0.15526,...,0.046288,0.092513,0.092513,0.204643,1.120764,1.489912,2.307367,2.997843,3.514915,3.89956
4,0.0,0.0,0.0,1.0,1.4e-05,249.999772,0.332914,0.007391,-0.000246,0.22478,...,0.08151,0.092513,0.138673,0.219798,1.244003,1.493691,2.278361,3.026199,3.610979,4.104844


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:05<00:20,  5.09s/it]

Feature Extraction:  40%|████      | 2/5 [00:10<00:15,  5.04s/it]

Feature Extraction:  60%|██████    | 3/5 [00:14<00:09,  4.95s/it]

Feature Extraction:  80%|████████  | 4/5 [00:19<00:04,  4.91s/it]

Feature Extraction: 100%|██████████| 5/5 [00:24<00:00,  4.79s/it]

Feature Extraction: 100%|██████████| 5/5 [00:24<00:00,  4.82s/it]




  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:01<00:06,  1.74s/it]

Feature Extraction:  40%|████      | 2/5 [00:03<00:05,  1.71s/it]

Feature Extraction:  60%|██████    | 3/5 [00:05<00:03,  1.72s/it]

Feature Extraction:  80%|████████  | 4/5 [00:06<00:01,  1.70s/it]

Feature Extraction: 100%|██████████| 5/5 [00:08<00:00,  1.59s/it]

Feature Extraction: 100%|██████████| 5/5 [00:08<00:00,  1.62s/it]




0.9056603773584906

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
0,0 0.079106 1 0.079106 2 -0.903497 3...,0 0.394032 1 0.394032 2 -3.666397 3...,0 0.551444 1 0.551444 2 -0.282844 3...,0 0.351565 1 0.351565 2 -0.095881 3...,0 0.023970 1 0.023970 2 -0.319605 3...,0 0.633883 1 0.633883 2 0.972131 3...
30,0 -0.771623 1 -0.771623 2 -2.32382...,0 0.372042 1 0.372042 2 -0.29603...,0 -0.145753 1 -0.145753 2 1.71501...,0 -0.031960 1 -0.031960 2 0.383526 3...,0 0.167792 1 0.167792 2 0.229050 3...,0 -0.362219 1 -0.362219 2 -0.23970...
39,0 0.901645 1 0.901645 2 -0.05469...,0 2.581916 1 2.581916 2 -0.01142...,0 -0.353783 1 -0.353783 2 -0.009521 3...,0 -0.455437 1 -0.455437 2 -0.250357 3...,0 0.106535 1 0.106535 2 -0.069248 3...,0 0.245030 1 0.245030 2 0.005327 3...
20,0 -0.071819 1 -0.071819 2 -0.360728 3...,0 0.354963 1 0.354963 2 -2.704719 3...,0 0.275074 1 0.275074 2 0.892838 3...,0 -1.033389 1 -1.033389 2 0.066584 3...,0 0.743081 1 0.743081 2 -0.271664 3...,0 -0.825646 1 -0.825646 2 0.122515 3...
29,0 0.118553 1 0.118553 2 -0.545332 3...,0 0.419456 1 0.419456 2 0.371223 3...,0 -0.283447 1 -0.283447 2 0.707172 3...,0 0.135832 1 0.135832 2 0.159802 3...,0 -0.079901 1 -0.079901 2 -0.090555 3...,0 0.050604 1 0.050604 2 0.474080 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:08<00:34,  8.56s/it]

Feature Extraction:  40%|████      | 2/5 [00:16<00:25,  8.38s/it]

Feature Extraction:  60%|██████    | 3/5 [00:24<00:16,  8.30s/it]

Feature Extraction:  80%|████████  | 4/5 [00:32<00:08,  8.27s/it]

Feature Extraction: 100%|██████████| 5/5 [00:41<00:00,  8.28s/it]

Feature Extraction: 100%|██████████| 5/5 [00:41<00:00,  8.23s/it]




Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-8.618429,10.629914,0.229193,-0.002871,-6.1e-05,-0.164268,...,0.320753,0.647776,1.124025,1.459587,3.130035,1.599592,2.614086,3.516918,4.094068,4.316511
1,1.0,0.0,0.0,1.0,680.848161,12647.878199,5.481374,0.08044,-0.052293,3.983688,...,0.223718,0.437095,0.80654,1.424715,3.226796,1.65941,2.827616,3.720341,4.294787,4.499051
2,1.0,0.0,1.0,1.0,442.099383,6541.21025,4.043676,-0.030065,-0.01418,1.558775,...,0.399949,0.677092,1.178445,1.754999,3.330219,1.743262,2.966554,3.953301,4.373258,4.484304
3,1.0,0.0,0.0,1.0,113.745549,356.056167,1.297926,0.004898,-0.002951,1.043021,...,0.096509,0.096509,0.26116,0.288342,1.515164,1.54343,2.424844,3.185694,3.766752,4.154904
4,1.0,0.0,0.0,1.0,75.777011,232.319298,1.388404,0.013368,0.0,0.648085,...,0.096509,0.192626,0.192626,0.288342,1.156045,1.592283,2.550087,3.310667,3.823423,4.16513
