Skip to content

Latest commit

 

History

History

notebooks

Anomaly Detection Algorithms

Hotelling's T-squared statistic

Hotelling's statistic is one of the most popular statistical process control techniques. It is based on the Mahalanobis distance. Generally, it measures the distance between the new vector of values and the previously defined vector of normal values additionally using variances.

[notebook] [paper]

Hotelling's T-squared statistic + Q statistic (SPE index) based on PCA

The combined index is based on PCA. Hotelling’s T-squared statistic measures variations in the principal component subspace. Q statistic measures the projection of the sample vector on the residual subspace. To avoid using two separated indicators (Hotelling's T-squared and Q statistics) for the process monitoring, we use a combined one based on logical or.

[notebook] [paper]

Isolation Forest

Isolation Forest or iForest builds an ensemble of iTrees for a given data set, then anomalies are those instances which have short average path lengths on the iTrees.

[notebook] [paper]

LSTM-based NN (LSTM)

LSTM-based neural network for anomaly detection using reconstruction error as an anomaly score.

[notebook] [paper]

Feed-Forward Autoencoder

Feed-forward neural network with autoencoder architecture for anomaly detection using reconstruction error as an anomaly score.

[notebook] [paper]

Convolutional Autoencoder (Conv-AE)

A reconstruction convolutional autoencoder model to detect anomalies in timeseries data using reconstruction error as an anomaly score.

[notebook] [paper]

LSTM Autoencoder (LSTM-AE)

If you inputs are sequences, rather than vectors or 2D images, then you may want to use as encoder and decoder a type of model that can capture temporal structure, such as a LSTM. To build a LSTM-based autoencoder, first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.

A reconstruction sequence-to-sequence (LSTM-based) autoencoder model to detect anomalies in timeseries data using reconstruction error as an anomaly score.

[notebook] [paper] [paper]

LSTM Variational Autoencoder (LSTM-VAE)

A reconstruction LSTM variational autoencoder model to detect anomalies in timeseries data using reconstruction error as an anomaly score.

[notebook] [paper] [code]

Variational Autoencoder (VAE)

A reconstruction variational autoencoder (VAE) model to detect anomalies in timeseries data using reconstruction error as an anomaly score. VAE is an autoencoder that learns a latent variable model for its input data. So instead of letting your neural network learn an arbitrary function, you are learning the parameters of a probability distribution modeling your data. If you sample points from this distribution, you can generate new input data samples: a VAE is a "generative model".

[notebook] [paper1] [paper2] [code]

MSCRED

MSCRED - Multi-Scale Convolutional Recurrent Encoder-Decoder first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses across different time steps. In particular, different levels of the system statuses are used to indicate the severity of different abnormal incidents. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations patterns and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, with the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. The intuition is that MSCRED may not reconstruct the signature matrices well if it never observes similar system statuses before.

[notebook] [paper]

MSET

MSET - multivariate state estimation technique is a non-parametric and statistical modeling method, which calculates the estimated values based on the weighted average of historical data. In terms of procedure, MSET is similar to some nonparametric regression methods, such as, auto-associative kernel regression.

[notebook] [paper]