Tukey's Outlier Filter

Algorithm

To detect outliers within a distribution of data points, a method based on [1] is used. By subtracting the 1st Quantile Q1 from the 3rd Quantile Q3 we get a robust statistics called Inter Quartile Range (IQR). A value y of this distribution is then considered as an outlier, if either y > Q3 + k*IQR or y < Q1 - k*IQR is given. The free parameter k is typically set to 1.5, but can be chosen freely to set the sensitivity of detection.

Evaluation

The 25th and 75th quantiles of the error distribution have to be given in the metric store and the flag gets set according Formula: iqr = quantile_75 - quantile_25

input < quantile_25 - (iqr_scaling * iqr) => flag = -1
input > quantile_75 + (iqr_scaling * iqr) => flag = 1

where iqr_scaling is a free choosable parameter (default: iqr_scaling = 1.5) and defines sensitivity of the detection.

Configuration

{  
    "TukeysFilter": {
        "scheduler_options": {},
        "worker_options": {
            "service1:eu:cpu": {
                "quantile_25": "service.service1.eu.quantile_25*",
                "quantile_75": "service.service1.eu.quantile_75*",
                "iqr_scaling": 1.5,
                "metrics": "host*region:eu*cpu*",
                "default": 0
            }
        }
    }
}

quantile_25 Expression for Quantile_25 Metrics
quantile_75 Expression for Quantile_25 Metrics
metrics Expression for (e.g.) Instance Metrics to compare with Quantile_25/Quantile_75
iqr_scaling Sensitivity Factor for Scaling of Inter Quartile Range
default used default value for missing datapoints

Use Cases

Comparison of Instance Level Metrics within a service group to identify e.g. memory leaks or deviations from the "norm".

References

[1] John Tukey, Exploratory Data Analysis, Addison-Wesley, 1977, pp. 43-44.

Provide feedback

Saved searches