General Usage

In order to use the transition-based outlier detection algorithm DOOTS on your clustered data, first the object has to be initialized:

import pandas
from ots_eval.outlier_detection.doots import DOOTS

data = pandas.DataFrame(data, columns=['object_id', 'time', 'cluster_id'])
detector = DOOTS(data, weighting=False, jaccard=False)

Explanation of the parameter:

Parameter		Default	Datatype	Description
`data`	-	-	pandas.DataFrame	with first column being the objectID, second being the timestamp, third being the clusterID
`jaccard`	optional	False	boolean	indicating if jaccard index should be used
`weighting`	optional	False	boolean	indicating if more distant past should be weighted lower than nearer past

The names of the columns in the DataFrames are not relevant but the order of them. The DataFrame may contain further columns but only the first three are considered.

The outliers can then be calculated by calling

outlier_result = detector.calc_outlier_degree()
clusters, outlier_result = detector.mark_outliers(tau=0.5)

The function calc_outlier_degree computes the degree of being an outlier for every subsequence. With mark_outliers and the threshold parameter tau all outliers are marked. The function returns the data DataFrame with an additional column 'outlier' indicating, if a data point is an outlier and which type of outlier it is, and the outlier result, which is a pandas.DataFrame with columns 'object_id', 'start_time', 'end_time', 'cluster_end_time', 'rating', 'distance' and 'outlier'. The outlier types are:

-1: transition-based outlier
-2: intuitive outlier
-3: transition-based as well as intuitive outlier

With

clusters, outlier_result = detector.get_outliers(tau=0.5)

the outliers are calculated immediately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doots.md

doots.md

General Usage

Files

doots.md

Latest commit

History

doots.md

File metadata and controls

General Usage