Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 3.03 KB

changefinder.md

File metadata and controls

73 lines (55 loc) · 3.03 KB

Algorithm: ChangeFinder

The detector currently uses a well-known anomaly detection framework ChangeFinder. Implementation of the algorithm is based on: aihara/changefinder.

Our implementation only supports 1D inputs for now.

Model Selection

We can specify hyperparameters of the ChangeFinder algorithm on the config file config/datadog.ini. Currently, there are four hyperparamters r, k, T1 and T2, and you can use cli/model_selection.py to find optimal k. Finding the best k value is called model selection.

For instance, when we have the following config file

$ cat config/datadog.ini
[general]
pidfile_path: /var/run/changefinder.pid
interval: 30

[datadog.cpu]
query: system.load.norm.5{chef_environment:production,chef_role:worker6-staticip} by {host}
r: 0.02
k: 7
T1: 10
T2: 5

[datadog.queue]
query: avg:queue.system.running{*}
r: 0.02
k: 7
T1: 10
T2: 5

cli/model_selection.py finds optimal k for each query by replaying a specific date-time range as:

$ python cli/model_selection.py --start='2016-08-10 11:45' --end='2016-08-10 12:00' --timezone='UTC'
[datadog.queue] avg:queue.system.running{*}
	k = 9 (AIC = -3418.877639)
[datadog.cpu] system.load.norm.5{chef_environment:production,chef_role:worker6-staticip} by {host}
	k = 7 (AIC = -185683.392533)

Options are:

  • --max_k — Max value of possible k.
  • --start — Datetime replay from.
  • --end — Datetime replay to.
  • --timezone — Timezone what you are assuming on --start and --end options.

The result means that:

  • According to a replay trial for the data points from 2016-08-10 11:45 UTC to 2016-08-10 12:00 UTC
    • for a config [datadog.queue], k = 6 is optimal in [1, 50]
    • for a config [datadog.cpu], k = 5 is optimal in [1, 50]

AIC is criterion to decide which k is the best, and smaller values indicate better k.

References