-
Notifications
You must be signed in to change notification settings - Fork 8
Tukey's Outlier Filter
To detect outliers within a distribution of data points, a method based on [1] is used. By subtracting the 1st Quantile Q1
from the 3rd Quantile Q3
we get a robust statistics called Inter Quartile Range (IQR). A value y
of this distribution is then considered as an outlier, if either y > Q3 + k*IQR
or y < Q1 - k*IQR
is given. The free parameter k
is typically set to 1.5, but can be chosen freely to set the sensitivity of detection.
The 25th and 75th quantiles of the error distribution have to be given in the metric store and the flag gets set according
Formula: iqr = quantile_75 - quantile_25
input < quantile_25 - (iqr_scaling * iqr) => flag = -1
input > quantile_75 + (iqr_scaling * iqr) => flag = 1
where iqr_scaling
is a free choosable parameter (default: iqr_scaling = 1.5
) and defines sensitivity of the detection.
{
"TukeysFilter": {
"scheduler_options": {},
"worker_options": {
"service1:eu:cpu": {
"quantile_25": "service.service1.eu.quantile_25*",
"quantile_75": "service.service1.eu.quantile_75*",
"iqr_scaling": 1.5,
"metrics": "host*region:eu*cpu*",
"default": 0
}
}
}
}
- quantile_25 Expression for Quantile_25 Metrics
- quantile_75 Expression for Quantile_25 Metrics
- metrics Expression for (e.g.) Instance Metrics to compare with Quantile_25/Quantile_75
- iqr_scaling Sensitivity Factor for Scaling of Inter Quartile Range
- default used default value for missing datapoints
Comparison of Instance Level Metrics within a service group to identify e.g. memory leaks or deviations from the "norm".
[1] John Tukey, Exploratory Data Analysis, Addison-Wesley, 1977, pp. 43-44.