Skip to content

Latest commit

 

History

History
38 lines (25 loc) · 2.22 KB

doots.md

File metadata and controls

38 lines (25 loc) · 2.22 KB

General Usage

In order to use the transition-based outlier detection algorithm DOOTS on your clustered data, first the object has to be initialized:

import pandas
from ots_eval.outlier_detection.doots import DOOTS

data = pandas.DataFrame(data, columns=['object_id', 'time', 'cluster_id'])
detector = DOOTS(data, weighting=False, jaccard=False)

Explanation of the parameter:

Parameter
DefaultDatatypeDescription
data--pandas.DataFramewith first column being the objectID, second being the timestamp, third being the clusterID
jaccardoptionalFalsebooleanindicating if jaccard index should be used
weightingoptionalFalsebooleanindicating if more distant past should be weighted lower than nearer past

The names of the columns in the DataFrames are not relevant but the order of them. The DataFrame may contain further columns but only the first three are considered.

The outliers can then be calculated by calling

outlier_result = detector.calc_outlier_degree()
clusters, outlier_result = detector.mark_outliers(tau=0.5)

The function calc_outlier_degree computes the degree of being an outlier for every subsequence. With mark_outliers and the threshold parameter tau all outliers are marked. The function returns the data DataFrame with an additional column 'outlier' indicating, if a data point is an outlier and which type of outlier it is, and the outlier result, which is a pandas.DataFrame with columns 'object_id', 'start_time', 'end_time', 'cluster_end_time', 'rating', 'distance' and 'outlier'. The outlier types are:

  • -1: transition-based outlier
  • -2: intuitive outlier
  • -3: transition-based as well as intuitive outlier

With

clusters, outlier_result = detector.get_outliers(tau=0.5)

the outliers are calculated immediately.