**Before You Start**: Make sure that you ran `0_setup_workshop.ipynb`!

# ECG Arrhythmia detection workshop

This project aims to demonstrate the Anomaly Detection module implemented in Darts
### Dataset
The MIT-BIH Supraventricular Arrhythmia Database (SVDB) contains 2 channels, and 78 half-hour ECG recordings obtained from 47 objects between 1975-1979.

### Task
Develop an anomaly detection model to identity arrhythmia in the ECG signal.

## Task #1
### Load data of a patient into a darts timeseries object

In [None]:
patient_number = "842"

In [None]:
import os
from darts import TimeSeries

# Load data into darts TimeSeries object
fpath = os.path.join("data", "anomaly_detection", "multivariate", "SVDB", f"{patient_number}.test.csv")
timeseries = TimeSeries.from_csv(fpath, time_col='timestamp')
ts_ecg = timeseries[['ECG1','ECG2']]
ts_anomaly = timeseries['is_anomaly']

### Visualize signal and anomaly

In [None]:
import matplotlib.pyplot as plt

fig,ax = plt.subplots(1,1, figsize=(10,5))
# TODO visualize ECG1 signal in the ax, label it as 'ECG1' and use linewidth 0.5

# TODO visualize the anomaly in the same axis - but shift it by -2, label='is_anomly, color should be red, line width 0.5


## Task #2
### Identify a region of ~15000 datapoints with anomalies (Otherwise training takes a while) and visualize it

In [None]:
# Identify a subset for demonstration
start, end = 15000, 30000
# Create subset time series ecg and anomaly object
ts_ecg_subset= # TO FILL
ts_anomaly_subset = # TO FILL

In [None]:
# Visualize the subset
fig,ax = plt.subplots(figsize=(10, 5))
ts_ecg_subset['ECG1'].plot(label='ECG1', lw=1.)
((ts_anomaly_subset/2)-1.5).plot(label='is_anomaly', color='r')

## Task #3
### Create training and test sets (e.g., 10k/5k, 12k/2k datapoints)

In [None]:
# Create train and test dataset for demonstration
train_end, test_end = # TO FILL 
ts_ecg_train = # TO FILL
ts_ecg_test =  # TO FILL
ts_anomaly_test = # TO FILL

In [None]:
# Visualize the train / test set as well as the test set anomalies
fig,ax = plt.subplots(figsize=(10, 5))
ts_ecg_train['ECG1'].plot(label='train', lw=1.)
ts_ecg_test['ECG1'].plot(label='test', lw=1.)
((ts_anomaly_test/2)-1.5).plot(label='is_anomaly', color='r')

## Task #4
### Assess data properties such as periodicity and identify most common period.

In [None]:
from darts.utils.statistics import plot_acf

"""
Visualise signal auto correlation to identify most common periodicity
# TODO - use the plot_acf() method https://unit8co.github.io/darts/generated_api/darts.utils.statistics.html
# TODO - highlight the most prominent periodicity
Hints:
- ts (TimeSeries) – The TimeSeries whose ACF should be plotted.
- m (Optional[int]) – Optionally, a time lag to highlight on the plot.
- max_lag (int) – The maximal lag order to consider.
"""
plot_acf() # TO FILL

In [None]:
# Identified most common period
period = # TO FILL

## Task #5
### Develop an anomaly detection model step by step by (see figure below):
1. Creating a forecasting model based on the train timeseries ECG data
2. Create historical forecasting for the test timeseries ECG data
3. Create anomaly scores using 2 different scores based on the forecasted and actual ECG signal

<img src="images/ad_inside_anomaly_model.png" alt="Image" width="60%" height="60%">

Links:
- Forecasting models: https://unit8co.github.io/darts/generated_api/darts.models.forecasting.html
- Scorers: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.html?highlight=scorer#

#### Create a Forecasting model

In [None]:
from darts.models import LinearRegressionModel

# Instatiate of a forecasting model - e.g. RegressionModel with a defined lag
forecasting_model = # TO FILL (https://unit8co.github.io/darts/generated_api/darts.models.forecasting.html)

# Train the forecasting model on the training dataset // forecasting_model.fit(...)
# TODO 

In [None]:
# Visualization of predicted and actual signal
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(10, 10))

# Create historical predictions // forecasting_model.historical_forecasts(...)
ts_ecg_test_predicted = # TO FILL 
# Calculate residuals of these predictions = (y_true - y_pred) // forecasting_model.residuals(...)
ts_ecg_residuals = # TO FILL

ts_ecg_test_predicted['ECG1'].plot(ax=ax1, label='ECG1_predicted')
ts_ecg_test['ECG1'].plot(ax=ax1, color='r', label='ECG1_test')
ts_ecg_residuals['ECG1'].plot(ax=ax2, color='b', label='Difference')

#### Use a NormScorer for scoring

In [None]:
from darts.ad.scorers import NormScorer
# Instantiate a NormScorer https://unit8co.github.io/darts/generated_api/darts.ad.scorers.norm_scorer.html
scorer = # TO FILL

In [None]:
# Calculate anomaly scores by taking the pointwise norm 1 (L1) and visualize the score
scores =  # TO FILL
scores.plot(label='Anomaly Score')

In [None]:
# Evaluate the calculated anomaly score using utility methods in darts
from darts.ad.utils import eval_metric_from_scores
eval_metric_from_scores(
    pred_scores= # TO FILL, 
    anomalies= # TO FILL, 
    window= # TO FILL, 
    metric= # TO FILL (e.g. 'AUC_ROC')
)

#### Use fittable KMeansScorer for scoring
The Norm scorer point-wise calculates the norm between the predicted and actual time series. Since predicting the peaks of the ECG signals is challenging for the model, the biggest differences between the actual and predicted values are mostly found at the peak locations.

To overcome this issue stemming from the point-wise comparison, a windowing approach inherent in the KMeanScorer will be used. The KMeanScorer is a trainable scorer, meaning that on the anomaly-free dataset, it fits 'k' centroids on the vectors/chunks of the ECG signal created by using a sliding window with a width of 'w'. The scoring is done by determining the closest centroid distance to any future ECG window with the same size 'w'. Since the training was done on anomaly-free windows, those windows containing anomalies will have larger distances even from the closest centroid.

As the figure below illustrates, the KMeanScorer can be directly used on the time series itself. However, we will use the previously developed forecasting model to create historical predictions for the train dataset and train the KMeanScorer on the absolute difference between the actual training and forecasted training datasets.

#### Training & Scoring
<img src="images/kmeansscorer.png" alt="Image" width="70%" height="70%">

<img src="images/ad_windowing.png" alt="Image" width="70%" height="70%">

In [None]:
from darts.ad.scorers import KMeansScorer
"""
Instantiate a KMeanScorer // https://unit8co.github.io/darts/generated_api/darts.ad.scorers.kmeans_scorer.html
Parameters: 
    - window (int) – Size of the window used to create the subsequences of the series.
    - k (int) – The number of clusters to form as well as the number of centroids to generate by the KMeans model.
    - component_wise (bool) – Boolean value indicating if the score needs to be computed for each component independently (True) 
            or by concatenating the component in the considered window to compute one score (False). Default: False
"""
scorer = # TO FILL

In [None]:
"""
Create historical forecasting on the training dataset for the scorer to be able to train
Hint:
    Use the trained forecasting_model
"""
ts_ecg_train_predicted = # TO FILL

In [None]:
"""
Fit the scorer from prediction
Hint:
    Use the method fit_from_prediction(actual_series, pred_series)
"""
# TODO

#### Remark

The function `diff_fn` passed as a parameter to the scorer, will transform pred_series and actual_series into one series. `diff_fn` can be any of Darts ["per time step" metrics](https://unit8co.github.io/darts/generated_api/darts.metrics.html). By default, it compute the absolute difference (`darts.metrics.ae`). If `pred_series` and `actual_series` are lists of series, `diff_fn` will be applied to all pairwise elements of the sequences.


In [None]:
# Use the trained scorer on the forecasted and actual test dataset
scores = # TO FILL 
# Visualize anomaly score
scores.plot(label='Anomaly Score')

In [None]:
# Calculate the performance on the scorer using built-in utils methods in darts
eval_metric_from_scores(
    pred_scores= # TO FILL,
    anomalies= # TO FILL,
    window= # TO FILL ,
    metric= # TO FILL (e.g. 'AUC_ROC')
)

## Task #6
### Develop the anomaly detection models by using the Forecasting Anomaly Model via dedicated Darts API interface
This exercise aims to illustrate the power of the darts anomaly detection module by hiding all of the previously made steps under the hood into one dedicated AnomalyModel and corresponding APIs

We'll use the already pretrained forecasting model, but you can also give an un-trained model and call `ForecastingAnomalyModel.fit()` with `allow_model_training=True`.

In [None]:
from darts.ad.anomaly_model.forecasting_am import ForecastingAnomalyModel
from darts.ad.scorers import NormScorer, KMeansScorer

# Instantiate the anomaly model with: one forecasting model, and one or more scorers (and corresponding parameters)
anomaly_model = ForecastingAnomalyModel(
    model= # TO FILL,
    scorer=[
         # TO FILL,
         # TO FILL
    ],
)

# Fit anomaly model
anomaly_model.fit(# TO FILL)

#### Create anomaly scores and prediction in one step


In [None]:
"""
Score with the anomaly model (forecasting + scoring) // https://unit8co.github.io/darts/generated_api/darts.ad.anomaly_model.forecasting_am.html
Hint:
    fit(series, allow_model_training=False, return_model_prediction=True
"""
anomaly_scores, predictions = # TO FILL

### Visualize the forecasted signal as well as the anomalies

In [None]:
# Forecasting
ts_ecg_test['ECG1'].plot(label='test', lw=1.)
predictions['ECG1'].plot(label='prediction', lw=1.)

In [None]:
# Anomalies
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 5))
anomaly_scores[0].plot(ax=ax1, label="NormScorer") # indeces corresponding to the scorers
anomaly_scores[1].plot(ax=ax2, label="KMeansScorer") # indeces corresponding to the scorers

### Leverage the inbuilt darts visualization tool to evaluate and show anomalies

In [None]:
# Visualize and evaluate detection of anomalies
anomaly_model.show_anomalies(
    series= # TO FILL,
    anomalies= # TO FILL,
    metric= # TO FILL (e.g. "AUC_ROC"),
)

## Task #7
### Use a detector to binarize the anomaly scores

Link:
- https://unit8co.github.io/darts/generated_api/darts.ad.detectors.html?highlight=detector

In [None]:
from darts.ad.detectors import QuantileDetector, ThresholdDetector

# Instantiate a QuantileDetector detector
detector =  # TO FILL 

fig, ax = plt.subplots()
(detector.fit_detect(anomaly_scores[0])-0).plot(lw=1., label='NormScorer - detected_anomaly')
# TODO visualize the detected NormScorer & KMeanScorer  anomalies as well as the actual anomalies 

ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

In [None]:
# Instantiate a ThresholdDetector detector
detector =  # TO FILL

fig, ax = plt.subplots()
(detector.detect(anomaly_scores[0])-0).plot(lw=1, label='NormScorer - detected_anomaly')
# TO DO visualize the detected KMeans anomalies as well as the actual anomalies

ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')