General Usage

When using CLOSE as stability measure for over-time clusterings, many settings are possible. First, CLOSE has to be initialized (test_data is a 2-dim array containing the data):

import pandas
from ots_eval.stability_evaluation.close import CLOSE

data = pandas.DataFrame(test_data, columns=['object_id', 'time', 'cluster_id'])
rater = CLOSE(data, measure='mae', minPts=2, output=True, jaccard=True, weighting=True, exploitation_term=True)

Explanation of the parameters:

Parameter		Default	Datatype	Description
`data`	-	-	pandas.DataFrame	with first column being the objectID, second being the timestamp, third being the clusterID
`measure`	optional	'mae'	string callable	describing the quality measure that should be used a cluster measuring function
`minPts`	optional	2	int	used for densitiy based quality measure only
`output`	optional	False	boolean	indicating if intermediate results should be printed
`jaccard`	optional	False	boolean	indicating if jaccard index should be used in CLOSE
`weighting`	optional	False	boolean	indicating if more distant past should be weighted lower than nearer past
`exploitation_term`	optional	False	boolean	indicating if exploitation term for penalization of outliers should be used

The names of the columns in the DataFrame are not relevant but the order. The DataFrame may contain further columns but only the first three are considered.

Now, the clustered data set can be evaluated with CLOSE. There are two variants of quality measures:

quality measures for clusters
quality measures for clusterings

When using the first type of quality measures, the original formula of CLOSE can be used calling the function

clustering_score = rater.rate_clustering(start_time=None, end_time=None, return_measures=False)

where start_time and end_time indicate the time intervall which should be considered. If start_time and end_time are None, the first and last timestamp are considered as boundary, respectively. return_measures indicates, if the individual components of the CLOSE formula should be returned.

The second type of quality measures can be used by using the modified formula of CLOSE calling

clustering_score = rater.rate_time_clustering(start_time=None, end_time=None)

where start_time and end_time indicate the time intervall which should be considered. If they are None, the first and last timestamp are considered as boundary, respectively.

Exploitation Term

The exploitation term is originally introduced in order to penalize outliers in CLOSE. It appends N_co / N_o to the CLOSE formula, where N_co defines the number of clustered objects and N_o represents the number of all objects. When considering it as penalization term in CLOSE, it is calculated globally for the whole over-time clustering. But it can also be used as quality measure for example when the clusters are calculated by DBSCAN. In that case, it is computed per timestamp in order to evaluate the individual time clusterings.

How to use it?

You can use the exploitation term as a penalization term by setting exploitation_term=True when creating the CLOSE object

CLOSE(data, exploitation_term=True)

It is also possible to use the exploitation term as quality measure. Therefore you have to call CLOSE as follows:

CLOSE(data, measure="exploit")

Since the exploitation term has then to be calculated per timestamp the modified formula of CLOSE for quality measures regarding the time clusterings has to be used. Therefore, instead of using the common function rate_clustering() you have to call

rate_time_clustering()

Examples

CLOSE with DBSCAN with the exploitation term as quality measure:

rater = CLOSE(data, 'exploit')
clustering_score = rater.rate_time_clustering()

CLOSE with DBSCAN, mean average error as quality measure and global exploitation term for outlier penalization:

rater = CLOSE(data, 'mae', exploitation_term=True)
clustering_score = rater.rate_clustering()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

close.md

close.md

General Usage

Exploitation Term

How to use it?

Examples

CLOSE with DBSCAN with the exploitation term as quality measure:

CLOSE with DBSCAN, mean average error as quality measure and global exploitation term for outlier penalization:

Files

close.md

Latest commit

History

close.md

File metadata and controls

General Usage

Exploitation Term

How to use it?

Examples

CLOSE with DBSCAN with the exploitation term as quality measure:

CLOSE with DBSCAN, mean average error as quality measure and global exploitation term for outlier penalization: