Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly Detection (anomaly model, scorer, detector, aggregator) #1256

Merged
merged 151 commits into from
Dec 22, 2022
Merged
Show file tree
Hide file tree
Changes from 101 commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
85c7a7d
AD first Ver
Oct 4, 2022
5ac2fd5
AD first Version
Oct 4, 2022
5743d58
added ForecastingAnomalyModel/FilteringAnomalyModel, and scorers: Kme…
Oct 5, 2022
2450e9a
implemented GaussianMixtureScorer and allow multiple scorer inputs
Oct 7, 2022
900b95f
Added comments and possibility to input a list of scorers in AnomalyM…
Oct 10, 2022
8408c8f
Clean whitespace
Oct 10, 2022
e9880e1
Clean whitespace2
Oct 10, 2022
6b65a0e
Clean whitespace2
Oct 10, 2022
626da3d
Clean whitespace with VScode
Oct 10, 2022
887f933
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Oct 14, 2022
babe8c7
Changed diff() position and added characteristic_length parameters
Oct 14, 2022
cd48097
renamed submodule
hrzn Oct 15, 2022
7d0b369
small changes
hrzn Oct 15, 2022
56eaf9d
small improvements
hrzn Oct 16, 2022
1c0d4f4
small changes
hrzn Oct 16, 2022
72799b0
Accepts all types UTS, MTS, list(UTS or MTS)
Oct 28, 2022
7f20166
move _diff() in child, so that scorers have all the same signature
Oct 28, 2022
7a37038
replaced L1, L2, and Abs_diff with Norm
Oct 31, 2022
e6a72da
add component_wise to WassersteinScorer
Oct 31, 2022
b117bd7
add component_wise to Kmeans
Oct 31, 2022
c63cef8
add component_wise to LOF
Oct 31, 2022
247001c
add component_wise to GaussianMixture
Oct 31, 2022
9782272
Accept num_samples for probabilistic models forecasting
Nov 3, 2022
729a1d9
Minor changes
Nov 4, 2022
d8fb10f
add comments, add likelihood
Nov 9, 2022
8661d6d
add laplace, + window parameter + parameter alllow_retrain
Nov 10, 2022
3b83b11
add cauchy and gamma likelihood
Nov 11, 2022
71f12dc
add utils.py, detectors, aggregators
julien12234 Nov 14, 2022
71e3a6e
removed show function for now
julien12234 Nov 14, 2022
e17a046
add show_anomalies() and show_anomalies_from_scores()
julien12234 Nov 16, 2022
c0bd73f
small changes
julien12234 Nov 16, 2022
a34c479
Merge branch 'master' of github.com:unit8co/darts into feat/anomaly_d…
hrzn Nov 20, 2022
c89c1c2
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 21, 2022
fc29b78
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Nov 21, 2022
ca551e3
Some docstring improvements to AnomalyModels
hrzn Nov 21, 2022
1f8c52a
corrected Kmeans, LFO and Gaussian Scorer + added input from PR
julien12234 Nov 22, 2022
fa0618e
test commit
julien12234 Nov 22, 2022
003febd
negative LFO and gaussian
julien12234 Nov 23, 2022
45757b4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 25, 2022
b63e65c
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Nov 28, 2022
7683bae
pre pull
julien12234 Nov 28, 2022
323fbc1
from prediciton structure
julien12234 Nov 28, 2022
0765d49
improved show_anomalies, changed structure _from_prediction
julien12234 Nov 30, 2022
8ffd0c1
small mistake in eval_accuracy in utils.py
julien12234 Nov 30, 2022
c8bea7f
return type of eval_acc
julien12234 Dec 1, 2022
17f5d68
changed way eval_acc returns in anomaly_model
julien12234 Dec 1, 2022
fa46383
added test for agg, dect, and scorers. upgrade agg trainable
julien12234 Dec 2, 2022
92013d1
added parameter return_UTS, and added test for scorers and anomaly_model
julien12234 Dec 3, 2022
124a221
small mistake in anomaly_model
julien12234 Dec 3, 2022
441bf24
New structure in files
julien12234 Dec 6, 2022
c3a56f5
Added warnings
julien12234 Dec 7, 2022
9987ad6
small change in wasserstein
julien12234 Dec 8, 2022
0a8b3f7
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 9, 2022
d294740
filtering_am and forecasting_am
julien12234 Dec 9, 2022
c3efb69
Small improvements
hrzn Dec 9, 2022
13a365d
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 9, 2022
b5ad2fa
Fix test names
hrzn Dec 9, 2022
7f1582f
add pyod to requirements
hrzn Dec 9, 2022
6d771b2
rename scorers
hrzn Dec 9, 2022
7c02b1d
scorers imports
hrzn Dec 9, 2022
5c67ce2
Changed handling of kwargs in AD models
hrzn Dec 9, 2022
6948b61
update tests
hrzn Dec 9, 2022
3f1b21e
return single TimeSeries from score() in some cases
hrzn Dec 10, 2022
ec6baf4
small naming improvements
hrzn Dec 10, 2022
5e6f65e
Some improvements to anomaly models
hrzn Dec 10, 2022
68d388d
Small improvements to scorers
hrzn Dec 11, 2022
11ee748
Some small improvements
hrzn Dec 11, 2022
0d3464b
Fix tests
hrzn Dec 11, 2022
40aa67f
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 12, 2022
e7640d7
Norm scorer docstring
hrzn Dec 12, 2022
d6e79af
test toy example agg and detectors
julien12234 Dec 12, 2022
7f5c30b
small docstring improvements
hrzn Dec 12, 2022
759ec9a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 12, 2022
b9e50fb
Add vectorization todos
hrzn Dec 12, 2022
0695719
test toy example scorers
julien12234 Dec 12, 2022
a76759a
test toy example scorers
julien12234 Dec 12, 2022
b0047c5
test toy example PyOD
julien12234 Dec 13, 2022
0b09b8f
test toy example NLL scorers
julien12234 Dec 13, 2022
3f1ddf6
test toy example poisson nll scorer
julien12234 Dec 13, 2022
fcd623e
test toy example univariate anomaly_models
julien12234 Dec 13, 2022
b0d405a
test toy example univariate covariates forecasting_anomaly_models
julien12234 Dec 13, 2022
af4a489
update threshold detector docstring
hrzn Dec 13, 2022
34ca833
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 13, 2022
6df9ed6
change way to output string messages
julien12234 Dec 13, 2022
e3f486c
first implementation of julien H's PR review
julien12234 Dec 13, 2022
c160234
first implementation of julien H's PR review 2
julien12234 Dec 13, 2022
f745a45
anomaly_model forecasting multivariate test
julien12234 Dec 14, 2022
457ad5b
anomaly_model multivariate, w=1,2, len()=2 test for NLL scorers
julien12234 Dec 14, 2022
b40c7c7
changed NLL scorers: call scipy.stats function
julien12234 Dec 14, 2022
6d9279f
changed in anomaly_models (inner to outer for series and scorers)
julien12234 Dec 14, 2022
7b80625
Small changes to PyOD detector
hrzn Dec 15, 2022
4164c57
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
d5cc56a
improvements to wasserstein scorer docstring
hrzn Dec 15, 2022
10c8d6b
change in eval acc
julien12234 Dec 15, 2022
5cdfc6d
change in eval acc, new function _eval_accuracy_from_scores
julien12234 Dec 15, 2022
cb59cd4
Small improvements to aggregators
hrzn Dec 15, 2022
10a79ab
Small docstrings improvements
hrzn Dec 15, 2022
a435f6a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
7a0f5e7
Utils docstring
hrzn Dec 15, 2022
33aeea4
change in detectors (vectorization and accepts list of param if multi…
julien12234 Dec 15, 2022
afd7c61
remove exp in PyODScorer... and updated test
julien12234 Dec 15, 2022
080e0f9
new test with np.testing
julien12234 Dec 15, 2022
883d587
agg accept only MTS or sequence of MTS
julien12234 Dec 16, 2022
409f215
removed old detectors
julien12234 Dec 16, 2022
005003a
new multivariate test for filtering anomaly model
julien12234 Dec 16, 2022
fa7f271
small changes to utils docstrings
hrzn Dec 16, 2022
aef8127
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 16, 2022
a4b8e53
test assert_array_almost_equal decimal 2
julien12234 Dec 16, 2022
81554b0
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
0e70677
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
7a7b553
second implementation of julien H's PR review
julien12234 Dec 16, 2022
1f1a9b1
vectorization of NLL scorers
julien12234 Dec 16, 2022
09a8ac6
problem with test_univariate_FilteringAnomalyModel
julien12234 Dec 16, 2022
107ebaf
replace abs by __abs__ in test_univariate_covariate_ForecastingAnomal…
julien12234 Dec 16, 2022
6a5bed4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 16, 2022
cb2a127
replace abs by __abs__ in ALL test_univariate_covariate_ForecastingAn…
julien12234 Dec 17, 2022
4a9619b
Increase coverage of scorers tests
hrzn Dec 20, 2022
816377b
Imports in submodules
hrzn Dec 20, 2022
323bda8
Some improvements to utils
hrzn Dec 20, 2022
edad060
Some improvements
hrzn Dec 20, 2022
8ea8ea7
significant rework of quantile detector
hrzn Dec 21, 2022
f4ef944
Rework threshold detector
hrzn Dec 21, 2022
ff62c3a
Rework NLL scorers
hrzn Dec 22, 2022
7e59cea
Rename NLL scorers files
hrzn Dec 22, 2022
ca9efc3
vectorize windowing in k-means
hrzn Dec 22, 2022
b1a73d9
vectorization of windowing in PyOD and Wasserstein
hrzn Dec 22, 2022
6041a67
Docstring improvements
hrzn Dec 22, 2022
b545d18
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
96206fa
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
5899096
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
3213467
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
18f9f49
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
2b8c9a9
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb714ef
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
8e0a488
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
22d9474
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
46b7603
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
18613c1
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb00de2
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
cbcbe1b
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
5a45fe6
Update darts/ad/scorers/__init__.py
hrzn Dec 22, 2022
b5ffd08
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
f597d21
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
b366367
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
ffbd101
Update darts/ad/scorers/kmeans_scorer.py
hrzn Dec 22, 2022
986aa2a
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 22, 2022
f362b5a
PR comments
hrzn Dec 22, 2022
05f4378
Formatting
hrzn Dec 22, 2022
d5195ab
Update darts/ad/scorers/pyod_scorer.py
hrzn Dec 22, 2022
a13edd2
Small docstring improvement
hrzn Dec 22, 2022
82b65cb
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions darts/ad/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
Anomaly Detection
-----------------
"""

from .anomaly_model.filtering_am import FilteringAnomalyModel
from .anomaly_model.forecasting_am import ForecastingAnomalyModel
4 changes: 4 additions & 0 deletions darts/ad/aggregators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
Anomaly Aggregators
-------------------
"""
301 changes: 301 additions & 0 deletions darts/ad/aggregators/aggregators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
"""
Aggregators
-----------
Module for aggregators. An aggregator combines multiple series of detected anomalies into one.
"""

# TODO:
# - add customize aggregators
# - add in trainable aggregators
# - log regression
# - decision tree
# - show all combined (info about correlation, and from what path did
# the anomaly alarm comes from)

from abc import ABC, abstractmethod
from typing import Any, Sequence, Union

import numpy as np

from darts import TimeSeries
from darts.ad.utils import _intersect, eval_accuracy_from_binary_prediction
from darts.logging import raise_if, raise_if_not


class Aggregator(ABC):
def __init__(self, *args: Any, **kwargs: Any) -> None:
pass

@abstractmethod
def __str__(self):
"returns the name of the aggregator"
pass

@abstractmethod
def _predict_core(self):
"returns the aggregated results"
pass

def _check_input(self, list_series: Sequence[TimeSeries]) -> TimeSeries:
"""
Checks for input if:
- it is a Sequence
- it contains at least two elements
- each element of input is
- a deterministic TimeSeries
- binary (only values equal to 0 or 1)
- all elements needs to have the same width/dimension
"""

raise_if_not(
isinstance(list_series, Sequence),
f"Input needs to be a Sequence, found type {type(list_series)}.",
)

raise_if(
len(list_series) <= 1,
f"Input list needs to contain at least two time series, found {len(list_series)}.",
)

for idx, series in enumerate(list_series):

raise_if_not(
isinstance(series, TimeSeries),
f"Element of list needs to be Timeseries, found type {type(series)} for element at index {idx}.",
)

raise_if_not(
np.array_equal(
series.values(copy=False),
series.values(copy=False).astype(bool),
),
f"Series in list needs to be binary, series at index {idx} is not.",
)

raise_if_not(
series.is_deterministic,
"Series in list must be deterministic (one sample per timestamp per dimension),"
+ f" found {series.n_samples} values for series at index {idx}.",
)

if idx == 0:
series_width = series.width
series_0 = series

raise_if_not(
series.width == series_width,
"Element of list needs to have the same dimension/width,"
+ f" found width {series.width} and {series_width}.",
)

series_0, list_series[idx] = _intersect(series_0, series)
julien12234 marked this conversation as resolved.
Show resolved Hide resolved

for idx, series in enumerate(list_series):
if idx > 0:
list_series[idx] = series.slice_intersect(series_0)

raise_if(
len(list_series[idx]) == 0,
f"Element {idx} of `list_series` must have a non empty intersection"
+ " with the other series of the sequence.",
)

list_series[0] = series_0

return list_series

def _predict(self, list_series: Sequence[TimeSeries]) -> TimeSeries:

np_series = np.concatenate(
[s.all_values(copy=False) for s in list_series], axis=2
)

list_pred = []
for idx, width in enumerate(range(list_series[0].width)):
list_pred.append(self._predict_core(np_series[:, width, :], idx))

return TimeSeries.from_times_and_values(
list_series[0].time_index, list(zip(*list_pred))
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
)

def eval_accuracy(
self,
actual_anomalies: TimeSeries,
list_series: Sequence[TimeSeries],
window: int = 1,
metric: str = "recall",
) -> Union[float, Sequence[float]]:
"""Aggregates the list of series given as input into one series and evaluates
the results against true anomalies.

Parameters
----------
actual_anomalies
The ground truth of the anomalies (1 if it is an anomaly and 0 if not)
list_series
The list of binary series to aggregate
window
Integer value indicating the number of past samples each point represents
in the list_series. The parameter will be used by the function
``_window_adjustment_anomalies()`` in darts.ad.utils to transform
actual_anomalies.
metric
Metric function to use. Must be one of "recall", "precision",
"f1", and "accuracy".
Default: "recall"

Returns
-------
Union[float, Sequence[float]]
Score for the time series
"""

raise_if_not(
isinstance(actual_anomalies, TimeSeries),
f"`actual_anomalies` must be of type TimeSeries, found type {type(actual_anomalies)}.",
)

series = self.predict(list_series)

raise_if_not(
actual_anomalies.width == series.width,
"`actual_anomalies` must have the same width as the series in the sequence "
+ f"`list_series`, found width {actual_anomalies.width} and expected {series.width}.",
)

return eval_accuracy_from_binary_prediction(
actual_anomalies, series, window, metric
)


class NonFittableAggregator(Aggregator):
"Base class of Aggregators that do not need training."

def __init__(self) -> None:
super().__init__()

# indicates if the Aggregator is trainable or not
self.trainable = False

def predict(self, list_series: Sequence[TimeSeries]) -> TimeSeries:
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
"""Aggregates the list of series given as input into one series.

Parameters
----------
list_series
The list of binary series to aggregate

Returns
-------
TimeSeries
Aggregated results
"""
list_series = self._check_input(list_series)
return self._predict(list_series)


class FittableAggregator(Aggregator):
"Base class of Aggregators that do need training."

def __init__(self) -> None:
super().__init__()

# indicates if the Aggregator is trainable or not
self.trainable = True

# indicates if the Aggregator has been trained yet
self._fit_called = False

def check_if_fit_called(self):
"""Checks if the Aggregator has been fitted before calling its `score()` function."""

raise_if_not(
self._fit_called,
f"The Aggregator {self.__str__()} has not been fitted yet. Call `fit()` first.",
)

def fit(self, actual_anomalies: TimeSeries, list_series: Sequence[TimeSeries]):
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
"""Fit the aggregators on the given list of series.

Parameters
----------
actual_anomalies
The ground truth of the anomalies (1 if it is an anomaly and 0 if not)
list_series
The list of binary series to aggregate

Returns
-------
TimeSeries
Aggregated results
"""
self.len_training_set = len(list_series)
list_series = self._check_input(list_series)
self.width_trained_on = list_series[0].width
actual_anomalies = actual_anomalies.slice_intersect(list_series[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe assert instead that actual_anomalies.has_same_time_as(series) (for our single multivariate series).


raise_if_not(
isinstance(actual_anomalies, TimeSeries),
f"`actual_anomalies` must be of type TimeSeries, found type {type(actual_anomalies)}.",
)

raise_if_not(
actual_anomalies.width == self.width_trained_on,
"`actual_anomalies` must have the same width as the series in the sequence `list_series`,"
+ f" found width {actual_anomalies.width} and width {self.width_trained_on}.",
)

for idx, s in enumerate(list_series):
list_series[idx] = s.slice_intersect(actual_anomalies)

raise_if(
len(list_series[0]) == 0,
"`actual_anomalies` must have a non-empty time intersection with the series in the"
+ " sequence `list_series`.",
)

np_training_data = np.concatenate(
[s.all_values(copy=False) for s in list_series], axis=2
)
np_actual_anomalies = actual_anomalies.all_values(copy=False)

models = []
for width in range(self.width_trained_on):
self._fit_core(
np_training_data[:, width, :], np_actual_anomalies[:, width].flatten()
)
models.append(self.model)

self.models = models
self._fit_called = True

def predict(self, list_series: Sequence[TimeSeries]) -> TimeSeries:
"""Aggregates the list of series given as input into one series.

Parameters
----------
list_series
The list of binary series to aggregate

Returns
-------
TimeSeries
Aggregated results
"""
self.check_if_fit_called()

list_series = self._check_input(list_series)

raise_if_not(
len(list_series) == self.len_training_set,
f"The model was trained on a list of length {self.len_training_set}, and found for prediciton"
+ f" a list of different length {len(list_series)}.",
)

raise_if_not(
all([s.width == self.width_trained_on for s in list_series]),
"all series in `series` must have the same width as the data used for training the"
+ f" detector model, training width {self.width_trained_on}.",
)

return self._predict(list_series)
23 changes: 23 additions & 0 deletions darts/ad/aggregators/and_aggregator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
"""
And Aggregator
--------------

Aggregator that identifies a time point as anomalous only if it is
included in all the input anomaly lists.
"""

import numpy as np

from darts.ad.aggregators.aggregators import NonFittableAggregator


class AndAggregator(NonFittableAggregator):
def __init__(self) -> None:
super().__init__()

def __str__(self):
return "AndAggregator"

def _predict_core(self, np_series: np.ndarray, width: int) -> np.ndarray:
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
# TODO vectorize
return [0 if 0 in timestamp else 1 for timestamp in np_series]
39 changes: 39 additions & 0 deletions darts/ad/aggregators/ensemble_sklearn_aggregator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""
Ensemble scikit-learn aggregator
--------------------------------

Aggregator wrapped around the Ensemble model of sklearn.
`sklearn https://scikit-learn.org/stable/modules/ensemble.html`_.
"""

import numpy as np
from sklearn.ensemble import BaseEnsemble

from darts.ad.aggregators.aggregators import FittableAggregator
from darts.logging import raise_if_not


class EnsembleSklearnAggregator(FittableAggregator):
def __init__(self, model) -> None:

raise_if_not(
isinstance(model, BaseEnsemble),
f"Scorer is expecting a model of type BaseEnsemble (from sklearn ensemble), \
found type {type(model)}.",
)

self.model = model
super().__init__()

def __str__(self):
return "EnsembleSklearnAggregator: {}".format(
self.model.__str__().split("(")[0]
)

def _fit_core(
self, np_series: np.ndarray, np_actual_anomalies: np.ndarray
) -> np.ndarray:
self.model.fit(np_series, np_actual_anomalies)

def _predict_core(self, np_series: np.ndarray, width: int) -> np.ndarray:
return self.models[width].predict(np_series)