# Anomaly Detection - project

This project aims to demonstrate the Anomaly Detection module implemented in Darts
### Dataset
The MIT-BIH Supraventricular Arrhythmia Database (SVDB) contains 2 channels, and 78 half-hour ECG recordings obtained from 47 objects between 1975-1979.

### Task
Develop an anomaly detection model to identity arrhythmia in the ECG signal.

## Downloading ECG dataset

In [None]:
# Download data from data source to a local holder
import io
import os
import zipfile
import requests

# URL of the zip file
zip_url = "https://my.hidrive.com/api/sharelink/download?id=lmCmAjUP"

# Folder path to save the downloaded zip file
folder_path = "dataset"

# Create the folder if it doesn't exist
if not os.path.exists(folder_path):
    os.makedirs(folder_path)

# Send a GET request to download the zip file
response = requests.get(zip_url)

# Check if the request was successful
if response.status_code == 200:
    # File path to save the downloaded zip file
    file_path = os.path.join(folder_path, "svdb.zip")

    # Save the zip file to the local drive
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Zip file downloaded successfully.")

    # Extract the zip file
    with zipfile.ZipFile(file_path, 'r') as zip_ref:
        zip_ref.extractall(folder_path)
    print("Zip file extracted successfully.")
else:
    print("Failed to download the zip file.")

## Install required libaries for anomaly detection

In [None]:
# Install the library darts
!git clone https://github.com/unit8co/darts.git
!pip install darts/. -q

## Load ECG data for a patient of interest

In [None]:
patient_number = '842'

In [None]:
from darts import TimeSeries

# Load data into darts TimeSeries object
timeseries = TimeSeries.from_csv(f"./dataset/multivariate/SVDB/{patient_number}.test.csv", time_col='timestamp')
ts_ecg = timeseries[['ECG1','ECG2']]
ts_anomaly = timeseries['is_anomaly']

### Visualize signal and anomaly

In [None]:
import matplotlib.pyplot as plt

fig,ax = plt.subplots(1,1, figsize=(15,5))
ts_ecg['ECG1'].plot(ax=ax,label='ECG1')
(ts_anomaly-2).plot(ax=ax,label='is_anomaly',color='r')


### Task #1
- Identify a region of 15000 datapoints with anomalies (Otherwise training takes for a while) and visualize it

In [None]:
# Identify a subset for demonstration
shift = 15000
start = 0
end = 15000
ts_ecg_subset= ts_ecg[start+shift:end+shift]
ts_anomaly_subset = ts_anomaly[start+shift:end+shift]

In [None]:
# Visualize the subset
fig,ax = plt.subplots(1,1, figsize=(15,5))
ts_ecg_subset['ECG1'].plot(ax=ax,label='ECG1')
((ts_anomaly_subset/2)-1.5).plot(ax=ax,label='is_anomaly',color='r')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper center', borderaxespad=0)

### Task #2
- Create training and test set size (e.g., 10k/5k, 12k/2k datapoints)

In [None]:
# Create train and test dataset for demonstration
split = 10000
ts_ecg_train = ts_ecg_subset[:split]
ts_ecg_test = ts_ecg_subset[split:15000]
ts_anomaly_test = ts_anomaly_subset[split:15000]

In [None]:
fig,ax = plt.subplots(1,1, figsize=(15,5))
ts_ecg_train['ECG1'].plot(ax=ax,label='train')
ts_ecg_test['ECG1'].plot(ax=ax,color='b', label='test')
((ts_anomaly_test/2)-1.5).plot(ax=ax,label='is_anomaly',color='r')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper center', borderaxespad=0)

In [None]:
### Task #3
- Assess data properties such as periodicity and identify most common period.

In [None]:
from darts.utils.statistics import plot_acf

# Visualise signal auto correlation to identify most common periodicity
plot_acf(ts=ts_ecg_subset['ECG1'], max_lag=220)
plt.show()

In [None]:
# Identified most common period
period = 90

### Task #4
Develop an anomaly detection model by choosing a Darts forecasting model and one or multiple scorers

Links:
- Forecasting models: https://unit8co.github.io/darts/generated_api/darts.models.forecasting.html
- Scorers: https://unit8co.github.io/darts/generated_api/darts.ad.scorers.html?highlight=scorer#

In [None]:
from darts.ad.anomaly_model.forecasting_am import ForecastingAnomalyModel
from darts.models import *
from darts.ad.scorers import *

# Instatiate of a forecasting model - e.g. RegressionModel with a defined lag
forecasting_model = LinearRegressionModel(lags=period)

# Instantiate the anomaly model with: one forecasting model, and one or more scorers (and corresponding parameters)
anomaly_model = ForecastingAnomalyModel(
    model=forecasting_model,
    scorer=[
         NormScorer(ord=1),
         # KMeansScorer(k=50, window=2*period, component_wise=False)
    ],
)

# Fit anomaly model
START = 2 * period
anomaly_model.fit(ts_ecg_train, start=START, allow_model_training=True)

### Task #5
Assess the performance of the trained forecasting model on the test set.

In [None]:
from darts.metrics import mae, rmse
# Score with the anomaly model (forecasting + scoring)
anomaly_scores, model_forecasting = anomaly_model.score(
    ts_ecg_test, start=2*period, return_model_prediction=True
)

# Compute the MAE and RMSE on the test set
print(f"On testing set -> MAE: {mae(model_forecasting, ts_ecg_test)}, RMSE: {rmse(model_forecasting, ts_ecg_test)}")

### Task #6
Visualize the forecasted signal as well as the anomalies

In [None]:
# Forecasting
fig, ax = plt.subplots(1,1)
window = 10000
ts_ecg_test['ECG1'][2*period:2*period+window].plot(ax=ax,color='k',linewidth=1,label='test')
model_forecasting['ECG1'][:window].shift(-10).plot(ax=ax,color='b',linewidth=1,label='prediction')

In [None]:
# Anomalies
anomaly_scores[0].plot() # indeces corresponding to the scorers

### Task #7
Leverage the inbuilt darts visualization to evaluate and show anomalies

In [None]:
# Visualize and evalute detection of anomalies
anomaly_model.show_anomalies(
    series=ts_ecg_test,
    anomalies=ts_anomaly_test,
    start=START,
    metric="AUC_ROC",
)
#plt.show()

### Task #8
Assess one or multiple Detectors:

Link:
- https://unit8co.github.io/darts/generated_api/darts.ad.detectors.html?highlight=detector

In [None]:
from darts.ad.detectors import QuantileDetector

# Instantiate a detector
detector =  QuantileDetector(0,0.70)

fig, ax = plt.subplots(1,1)
(detector.fit_detect(anomaly_scores[0])-0).plot(ax=ax, color='b',linewidth=1,label='NormScorer - detected_anomaly')
(detector.fit_detect(anomaly_scores[1])-2).plot(ax=ax, color='k',linewidth=1,label='KMeanScorer - detected_anomaly')
(ts_anomaly_test-4).plot(ax=ax, color='r',linewidth=1,label='is_anomaly')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)

### Task #9
Develop a anomaly detection model using only a scorer:

```
scorer = KMeansScorer(k=k, window=2*period, component_wise=False)
scorer.fit(<series>)
scorer.show_anomalies(<series, [actual_anomalies, metric]>)
scorer.eval_accuracy(<actual_anomalies, series, [metric]>)
```

In [None]:
import numpy as np

for k in [70, 80, 90]:
  scorers[k] = {}
  for period in [140,150,160]:
    print(f"{k}-{period}")
    scorer = KMeansScorer(k=k, window=2*period, component_wise=False)
    scorer.fit(ts_ecg_train)
    scorers[k][period] = scorer.eval_accuracy(
                                        anomalies=ts_anomaly_test,
                                        series=ts_ecg_test,
                                        metric="AUC_ROC",
                                      )


In [None]:
scorer.show_anomalies(
  actual_anomalies=ts_anomaly_test,
  series=ts_ecg_test,
  metric="AUC_ROC",
)