# Demonstration of Deep Outlier Detection Models
1. [Introduction](#1-introduction)
2. [Demonstration on Classical Dataset](#2-demonstration-on-classical-dataset)
    1. [Load Data](#21-load-data)
    2. [Model Setting](#22-model-setting)
    3. [Performance Comparation](#23-performance-comparation)
4. [Reference](#reference)

## 1. Introduction

This demostration shows the performace of shallow outlier detection models in several synthetic and classical dataset. The models covered in this demostration includes:

1. **Deep SVDD** Deep One-Class Classification. (ICML'18)
2. **REPEN** Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. (KDD'18)
3. **RDP** Unsupervised Representation Learning by Predicting Random Distances. (IJCAI'20)
4. **RCA** A Deep Collaborative Autoencoder Approach for Anomaly Detection. (IJCAI'21)
5. **GOAD** Classification-Based Anomaly Detection for General Data. (ICLR'20)
6. **Neutral** Neural Transformation Learning for Deep Anomaly Detection Beyond Images. (ICML'21)


In [8]:
import numpy as np
from numpy import percentile
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.font_manager
from tqdm.notebook import tqdm
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import make_moons, make_blobs

## 2. Demonstration on Classical Dataset

In [9]:
from pyod.utils.utility import standardizer
from pyod.utils.utility import precision_n_scores
from sklearn.metrics import roc_auc_score
from scipy.io import loadmat
from time import time
import os

## 2.1.  Load Data
All the following datasets are downloaded from 
Outlier Detection DataSets (ODDS): http://odds.cs.stonybrook.edu/#table1

In [16]:
mat_file_list = ['arrhythmia.mat',
                 'cardio.mat',
                 'ionosphere.mat',
                 'letter.mat',
                 'lympho.mat',
                 'mnist.mat',
                 'musk.mat',
                 'optdigits.mat',
                 'pendigits.mat',
                 'pima.mat',
                 'satellite.mat',
                 'satimage-2.mat',
                 'shuttle.mat',
                 'vertebral.mat',
                 'vowels.mat',
                 'wbc.mat']

## 2.2. Model Setting

In [11]:
from deepod.models.dsvdd import DeepSVDD
from deepod.models.rdp import RDP
from deepod.models.repen import REPEN
from deepod.models.rca import RCA
from deepod.models.goad import GOAD
from deepod.models.dif import DeepIsolationForest
from deepod.models.neutral import NeuTraL
from deepod.models.icl import ICL

In [32]:
classifiers = {
	'DeepSVDD': DeepSVDD(verbose=0),
	'RDP': RDP(verbose=0),
	'REPEN':REPEN(verbose=0),
	'RCA': RCA(verbose=0),
	'GOAD': GOAD(verbose=0),
	'Neutral': NeuTraL(verbose=0),
	#'ICL': ICL(verbose=0),
    #'DIF': DeepIsolationForest(),
	}

classifiers_indices = dict(zip(list(classifiers.keys()), range(len(classifiers))))

## 2.3. Performance Comparation

In [34]:
# initialize the container for saving the results
df_columns = ['Data', '# Samples', '# Dimensions', 'Outlier Perc'] + list(classifiers_indices.keys())

roc_df = pd.DataFrame(columns=df_columns)
n_ite = 1
n_classifiers = len(classifiers)

for j in tqdm(range(len(mat_file_list))):
    mat_file = mat_file_list[j]
    print("\n... Processing", mat_file, '...')
    
    data = loadmat(os.path.join('datasets', mat_file))
    X = data['X']
    y = data['y'].ravel()
    outliers_fraction = np.count_nonzero(y) / len(y)
    outliers_percentage = round(outliers_fraction * 100, ndigits=4)

    # construct containers for saving results
    roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]
    roc_mat = np.zeros(n_classifiers)

    random_state = np.random.RandomState()

    X_norm = standardizer(X)

    for clf_name, clf in classifiers.items():
        clf.fit(X_norm)
        test_scores = clf.decision_function(X_norm)

        roc = round(roc_auc_score(y, test_scores), ndigits=4)
        # prn = round(precision_n_scores(y, test_scores), ndigits=4)

        roc_mat[classifiers_indices[clf_name]] = roc

    roc_list = roc_list + list(roc_mat)
    temp_df = pd.DataFrame(roc_list).transpose()
    temp_df.columns = df_columns
    roc_df = pd.concat([roc_df, temp_df], axis=0)

  0%|          | 0/16 [00:00<?, ?it/s]


... Processing arrhythmia.mat ...


100%|██████████| 10/10 [00:00<00:00, 208.57it/s]
100%|██████████| 10/10 [00:00<00:00, 209.15it/s]



... Processing glass.mat ...


100%|██████████| 10/10 [00:00<00:00, 423.44it/s]
100%|██████████| 10/10 [00:00<00:00, 383.29it/s]



... Processing ionosphere.mat ...


100%|██████████| 10/10 [00:00<00:00, 280.60it/s]
100%|██████████| 10/10 [00:00<00:00, 280.84it/s]



... Processing letter.mat ...


100%|██████████| 10/10 [00:00<00:00, 68.60it/s]
100%|██████████| 10/10 [00:00<00:00, 68.88it/s]



... Processing lympho.mat ...


100%|██████████| 10/10 [00:00<00:00, 561.92it/s]
100%|██████████| 10/10 [00:00<00:00, 555.78it/s]



... Processing mnist.mat ...


100%|██████████| 10/10 [00:00<00:00, 14.51it/s]
100%|██████████| 10/10 [00:00<00:00, 14.42it/s]



... Processing musk.mat ...


100%|██████████| 10/10 [00:00<00:00, 35.12it/s]
100%|██████████| 10/10 [00:00<00:00, 35.20it/s]



... Processing optdigits.mat ...


100%|██████████| 10/10 [00:00<00:00, 19.44it/s]
100%|██████████| 10/10 [00:00<00:00, 19.38it/s]



... Processing pendigits.mat ...


100%|██████████| 10/10 [00:00<00:00, 16.20it/s]
100%|██████████| 10/10 [00:00<00:00, 16.26it/s]



... Processing pima.mat ...


100%|██████████| 10/10 [00:00<00:00, 145.31it/s]
100%|██████████| 10/10 [00:00<00:00, 145.88it/s]



... Processing satellite.mat ...


100%|██████████| 10/10 [00:00<00:00, 17.10it/s]
100%|██████████| 10/10 [00:00<00:00, 17.09it/s]



... Processing satimage-2.mat ...


100%|██████████| 10/10 [00:00<00:00, 18.85it/s]
100%|██████████| 10/10 [00:00<00:00, 18.86it/s]



... Processing shuttle.mat ...


In [26]:
roc_df

Unnamed: 0,Data,# Samples,# Dimensions,Outlier Perc,DeepSVDD,RDP,REPEN,RCA,GOAD,ICL,Neutral
0,arrhythmia,452,274,14.6018,0.655,0.7538,0.7167,0.7666,0.3461,0.3899,0.6183


## Reference

1. Ting et al. [**Isolation Distributional Kernel A New Tool for Point & Group Anomaly Detection**](https://ieeexplore.ieee.org/abstract/document/9573389) *IEEE Transactions on Knowledge and Data Engineering*, 2021.
2. Bandaragoda et al. [**Isolation‐based anomaly detection using nearest‐neighbor ensembles.**](https://proceedings.neurips.cc/paper_files/paper/2022/file/cf93972b116ca5268827d575f2cc226b-Paper-Datasets_and_Benchmarks.pdf) *Computational Intelligence*, 2018.
3. Han et al. [**Adbench: Anomaly detection benchmark**](https://proceedings.neurips.cc/paper_files/paper/2022/file/cf93972b116ca5268827d575f2cc226b-Paper-Datasets_and_Benchmarks.pdf) *Advances in Neural Information Processing Systems*, 2022.
4. [**DeepOD** (github.com/xuhongzuo/DeepOD)](https://github.com/xuhongzuo/DeepOD/tree/main)