Skip to content
This repository has been archived by the owner on Jan 7, 2022. It is now read-only.

Tukey's Outlier Filter

Puneeth Nanjundaswamy edited this page Aug 26, 2015 · 11 revisions

Algorithm

To detect outliers within a distribution of data points, a method based on [1] is used. By subtracting the 1st Quantile Q1 from the 3rd Quantile Q3 we get a robust statistics called Inter Quartile Range (IQR). A value y of this distribution is then considered as an outlier, if either y > Q3 + k*IQR or y < Q1 - k*IQR is given. The free parameter k is typically set to 1.5, but can be chosen freely to set the sensitivity of detection.

Evaluation

The 25th and 75th quantiles of the error distribution have to be given in the metric store and the flag gets set according Formula: iqr = quantile_75 - quantile_25

  • input < quantile_25 - (iqr_scaling * iqr) => flag = -1
  • input > quantile_75 + (iqr_scaling * iqr) => flag = 1

where iqr_scaling is a free choosable parameter (default: iqr_scaling = 1.5) and defines sensitivity of the detection.

Configuration

{  
    "TukeysFilter": {
        "scheduler_options": {},
        "worker_options": {
            "service1:eu:cpu": {
                "quantile_25": "service.service1.eu.quantile_25*",
                "quantile_75": "service.service1.eu.quantile_75*",
                "iqr_scaling": 1.5,
                "metrics": "host*region:eu*cpu*",
                "default": 0
            }
        }
    }
}
  • quantile_25 Expression for Quantile_25 Metrics
  • quantile_75 Expression for Quantile_25 Metrics
  • metrics Expression for (e.g.) Instance Metrics to compare with Quantile_25/Quantile_75
  • iqr_scaling Sensitivity Factor for Scaling of Inter Quartile Range
  • default used default value for missing datapoints

Use Cases

Comparison of Instance Level Metrics within a service group to identify e.g. memory leaks or deviations from the "norm".

References

[1] John Tukey, Exploratory Data Analysis, Addison-Wesley, 1977, pp. 43-44.

Clone this wiki locally