In order to use the clustering algorithm C(OTS)^2 on your data, first the object has to be initialized:
import pandas
from ots_eval.clustering.cots import COTS
data = pandas.DataFrame(data, columns=['object_id', 'time', 'feature1', 'feature2'])
cots = COTS(data, min_cf=0.2, sw=3)
Explanation of the parameter:
Parameter | Default | Datatype | Description | |
---|---|---|---|---|
data | - | - | pandas.DataFrame | with first column being the objectID, second being the timestamp and following columns being the features |
min_cf | optional | 0.015 | float | threshold for the minimum connection factor for inserting edges to the graph |
sw | optional | 3 | int | width of sliding window |
The names of the columns in the DataFrames are not relevant but the order of them. All columns after the second one are interpreted as features.
The clusters can then be calculated by calling
clusters = cots.create_clusters()
create_clusters
returns a pandas.DataFrame with columns 'ObjectID', 'Time', features.., 'cluster' containing the data and cluster belonging/noise of all objects at all timestamps.
If clusters without noise are desired
clusters = cots.get_clusters_df(min_cf=0.1, sw=5)
should be used. Here, min_cf
and sw
can be adapted if necessary.
When working with arrays, then the following methods should be used:
clusters = cots.get_clusters(min_cf=0.1, sw=5)
noisy_clusters = cots.get_clusters_with_noise(min_cf=0.1, sw=5)
With the methods get_factors
and get_factors_df
the calculated factors in form of a 2-dim array or pandas.DataFrame can be investigated. Possible factor names are 'similarity', 'adaptability', 'connection', 'temporal_connection' or 'temporal_connection_sw'
, whereby 'temporal_connection_sw'
additionally needs the size of the sliding window.
factors = cots.get_factors(factor_type='similarity')
factors_df = cots.get_factors_df(factor_type='temporal_connection_sw', sw=3)