### [Clustering of time series data](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.6594&rep=rep1&type=pdf)

http://www.cs.columbia.edu/~gravano/Papers/2017/tods17.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.6594&rep=rep1&type=pdf

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3984869/

https://www.datanovia.com/en/blog/types-of-clustering-methods-overview-and-quick-start-r-code/#partitioning-clustering

https://datawookie.netlify.com/blog/2017/04/clustering-time-series-data/

### [Clustering of Time Series Subsequences is Meaningless](http://www.cs.ucr.edu/~eamonn/meaningless.pdf)

- Whole Clustering: The notion of clustering here is similar to that of conventional clustering of discrete objects. Given a set of individual time series data, the objective is to group similar time series into the same cluster.
- Subsequence Clustering: Given a single time series, sometimes in the form of streaming time series, individual time series (subsequences) are extracted with a sliding window. Clustering is then performed on the extracted time series subsequences.

In this work we make a surprising claim. Clustering of time series subsequences is meaningless! In particular,
clusters extracted from these time series are forced to obey a certain constraints that are pathologically unlikely to be
satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially
random.

Since we use the word “meaningless” many times in this paper, we will take the time to define this term. All
useful algorithms (with the sole exception of random number generators) produce output that depends on the input.
For example, a decision tree learner will yield very different outputs on, say, a credit worthiness domain, a drug
classification domain, and a music domain. We call an algorithm “meaningless” if the output is independent of the
input. As we prove in this paper, the output of STS clustering does not depend on input, and is therefore
meaningless.

#### Background on Clustering

Algorithm Hierarchical Clustering
1. Calculate the distance between all objects. Store the results in a distance matrix.
2. Search through the distance matrix and find the two most similar clusters/objects.
3. Join the two clusters/objects to produce a cluster that now has at least 2 objects.
4. Update the matrix by calculating the distances between this new cluster and all other clusters.
5. Repeat step 2 until all cases are in one cluster.

Algorithm k-means
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by assigning them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the memberships found above are correct.
5. If none of the N objects changed membership in the last iteration, exit. Otherwise go to 3

We can use these two numbers to create a fraction:
between set X and Y distance
within set X distance

$$ clustering \ meaningfulness = \frac{within \ set \ X \ distance}{between \ set \ X \ and \ Y  \ distance} $$

We can justify calling this number “clustering meaningfulness” since it clearly measures just that. If, for any dataset,
the clustering algorithm finds similar clusters each time regardless of the different initial seeds, the numerator should
be close to zero. In contrast, there is no reason why the clusters from two completely different, unrelated datasets
should be similar. Therefore, we should expect the denominator to be relatively large. So overall we should expect
that the value of clustering meaningfulness  be close to zero when X and Y are sets of cluster centers derived
from different datasets.

The implications of Theorem 1 become clearer when we consider the following well documented fact. For any
dataset, the weighted (by cluster membership) average of k clusters must sum up to the global mean. The implication
for STS clustering is profound. Since the global mean for STS clustering is a straight line, then the weighted average
of k-clusters must in turn sum to a straight line. However, there is no reason why we should expect this to be true of
any dataset, much less every dataset. This hidden constraint limits the utility of STS clustering to a vanishing small
set of subspace of all datasets. The out-of-phase sine waves as cluster centers that we get from the last section
conforms to this theorem, since their weighted average, as expected, sums to a straight line

An even more tantalizing piece of evidence exists. In the 1920’s “data miners” were excited to find that by
preprocessing their data with repeated smoothing, they could discover trading cycles. Their joy was shattered by a
theorem by Evgeny Slutsky (1880-1948), who demonstrated that any noisy time series will converge to a sine wave
after repeated applications of moving window smoothing (Kendall, 1976). While STS clustering is not exactly the
same as repeated moving window smoothing, it is clearly highly related. 

Algorithm motif-based-clustering
1. Decide on a value for k.
2. Discover the K-motifs in the data, for K = k × c (c is some constant, in the region of about 2 to 30)
3. Run k-means, or k partitional hierarchical clustering, or any other clustering algorithm on the subsequences covered by Kmotifs