Skip to content

Latest commit

 

History

History
48 lines (32 loc) · 2.33 KB

cots.md

File metadata and controls

48 lines (32 loc) · 2.33 KB

General Usage

In order to use the clustering algorithm C(OTS)^2 on your data, first the object has to be initialized:

import pandas
from ots_eval.clustering.cots import COTS

data = pandas.DataFrame(data, columns=['object_id', 'time', 'feature1', 'feature2'])
cots = COTS(data, min_cf=0.2, sw=3)

Explanation of the parameter:

Parameter
DefaultDatatypeDescription
data--pandas.DataFramewith first column being the objectID, second being the timestamp and following columns being the features
min_cfoptional0.015floatthreshold for the minimum connection factor for inserting edges to the graph
swoptional3intwidth of sliding window

The names of the columns in the DataFrames are not relevant but the order of them. All columns after the second one are interpreted as features.

The clusters can then be calculated by calling

clusters = cots.create_clusters()

create_clusters returns a pandas.DataFrame with columns 'ObjectID', 'Time', features.., 'cluster' containing the data and cluster belonging/noise of all objects at all timestamps.

If clusters without noise are desired

clusters = cots.get_clusters_df(min_cf=0.1, sw=5)

should be used. Here, min_cf and sw can be adapted if necessary.

When working with arrays, then the following methods should be used:

clusters = cots.get_clusters(min_cf=0.1, sw=5)
noisy_clusters = cots.get_clusters_with_noise(min_cf=0.1, sw=5)

With the methods get_factors and get_factors_df the calculated factors in form of a 2-dim array or pandas.DataFrame can be investigated. Possible factor names are 'similarity', 'adaptability', 'connection', 'temporal_connection' or 'temporal_connection_sw', whereby 'temporal_connection_sw' additionally needs the size of the sliding window.

factors = cots.get_factors(factor_type='similarity')
factors_df = cots.get_factors_df(factor_type='temporal_connection_sw', sw=3)