Anomaly Detection Using Gaussian Distribution
Gaussian (Normal) Distribution
The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.
If x is normally distributed then it may be displayed as follows.
Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:
Estimating Parameters for a Gaussian
We may use the following formulas to estimate Gaussian parameters (mean and variation) for ith feature:
So we have a training set:
We assume that each feature of the training set is normally distributed:
Anomaly Detection Algorithm
- Given new example x, compute p(x):
The algorithm may be evaluated using F1 score.
The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
tp - number of true positives.
fp - number of false positives.
fn - number of false negatives.