# Unsupervised Timeseries Representations
## Goal
Explore the extent of representation transferability allowed
by the algorithm described in [Unsupervised Scalable Representation Learning for Multivariate Time Series](https://arxiv.org/abs/1901.10738).

* Compare the transferability of representations learned using TimeNet and the convolutional architecture.
* Measure info change as output sequence length decreases, compare to principal components
* Introduce measure/standard testing procedure for transferability

## Method
1. Divide UCR timeseries dataset
    * Subset for learning representations
    * Subset for SVM classification
2. Train encoder network(s) on dataset to learn representations
3. Generate representations for classification dataset
4. Train and test SVM on classification dataset representations
   * Test on representation dataset
   * Test on different dataset
5. Compare with TimeNet if time

The dataset curation method is the focus of this research. We will test the transferability of
representations learned from two independent variables:
* dataset size
* dataset diversity

How to implement triplet loss? Find triplets

### Background
Info to include:
* Timeseries intro
* UCR dataset
* Concept and benefits of representations
* Encoder architecture
* Transfer ability
* What algo devs did to test transferability and how this extends that

In [4]:
# Imports
import pandas as pd
import numpy as np
import matrixprofile as mp
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras import layers, models
from keras.utils import Sequence
import keras.backend as K
import tensorflow as tf
import tensorflow_addons as tfa
import uea_ucr_datasets as archive
import sktime
from functools import partial

import sys
sys.path.insert(1, '../UnsupervisedScalableRepresentationLearningTimeSeries/')
from scikit_wrappers import CausalCNNEncoderClassifier

In [5]:
# Load data
catalogue = archive.list_datasets()

d = archive.Dataset(catalogue[0])
X = []
for x in d:
    X.append(x[0])
X = np.array(X)


def standardize(X):
    mean = np.nanmean(X)
    std = np.sqrt(np.nanvar(X))

    for sample in X:
        sample = (sample - mean) / std

    return X

In [None]:
# CNN Encoder
X = standardize(X)

cnn_encoder = CausalCNNEncoderClassifier(nb_random_samples=5,
                                         depth=10,
                                         channels=40,
                                         out_channels=320,
                                         kernel_size=3,
                                         cuda=True)

cnn_encoder.fit(X, verbose=True)

In [None]:
# TimeNet


In [None]:
# Compare

### MatrixProfile on PTSD FMRI data

In [None]:
df_train = pd.read_excel('data/train_data_age_matched_split.xlsx')
df_test = pd.read_excel('data/test_data_age_matched_split.xlsx')

In [None]:
X_train = df_train.loc[1:, ~df_train.columns.isin(['SpotID', 'Paths'])].to_numpy().T
y_train = df_train.loc[0, ~df_train.columns.isin(['SpotID', 'Paths'])].to_numpy()

X_test = df_test.loc[1:, ~df_test.columns.isin(['SpotID', 'Paths'])].to_numpy().T
y_test = df_test.loc[0, ~df_test.columns.isin(['SpotID', 'Paths'])].to_numpy()

In [None]:
plt.figure(figsize=(20, 5))
profile, figures = mp.analyze(X_train[-1])