Latest commit 9773c98 Jan 9, 2019
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
__init__.py Dec 23, 2018
gaussian_anomaly_detection.py Dec 24, 2018

# Anomaly Detection Using Gaussian Distribution

## Jupyter Demos

▶️ Demo | Anomaly Detection - find anomalies in server operational parameters like `latency` and `threshold`

## Gaussian (Normal) Distribution

The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.

Let's say:

If x is normally distributed then it may be displayed as follows.

- mean value,

- variance.

- "~" means that "x is distributed as ..."

Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:

## Estimating Parameters for a Gaussian

We may use the following formulas to estimate Gaussian parameters (mean and variation) for ith feature:

- number of training examples.

- number of features.

## Density Estimation

So we have a training set:

We assume that each feature of the training set is normally distributed:

Then:

## Anomaly Detection Algorithm

1. Choose features that might be indicative of anomalous examples ().
2. Fit parameters using formulas:

1. Given new example x, compute p(x):

Anomaly if

- probability threshold.

## Algorithm Evaluation

The algorithm may be evaluated using F1 score.

The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

Where:

tp - number of true positives.

fp - number of false positives.

fn - number of false negatives.