Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to limit the range of similarity computation? (Dynamic Time Warping) [K-Shape] #370

Open
StefanR44 opened this issue Nov 25, 2021 · 7 comments

Comments

@StefanR44
Copy link

StefanR44 commented Nov 25, 2021

I would like to have DTW only move within a certain range, so that a point will not be matched to another point that is more than x steps away.

I hope this visualization helps:
https://i.imgur.com/hlw9iOF.png

Thanks in advance

Picture edited from Paparrizos and Gravano, "k-Shape: Efficient and Accurate Clustering of Time Series" (2016)

DTW range limit
!

@StefanR44 StefanR44 changed the title Is it possible to limit the range of similarity computation? (Dynamic Time Warping) Is it possible to limit the range of similarity computation? (Dynamic Time Warping) [K-Shape] Nov 25, 2021
@GillesVandewiele
Copy link
Contributor

That is most definitely possible. Please check our user guide: https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html

The things you are looking for are called constraints. We support two types: itakura and sakoe_chiba

@StefanR44
Copy link
Author

That is most definitely possible. Please check our user guide: https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html

The things you are looking for are called constraints. We support two types: itakura and sakoe_chiba

Ah, thanks a bunch @GillesVandewiele! I did not realize this was still k-shapes related. I will read into it!

@GillesVandewiele
Copy link
Contributor

Most welcome. I am not sure about how to use it for K-shape myself at the moment. It might require some patching. In case you manage to make it work, please comment here on how you did it!

@StefanR44
Copy link
Author

Will do! I will do some fiddling with K-shapes in some ways. The biggest issue with K-shape for me is, that it requires a dataset of equal-sized time series. For my use-case I am clustering time series of energy consumption and sometimes I do not have them for the whole year but only for a few months.
Do you know by chance, if there is a way to do the clustering based on actual dates or months (e.g. summer series only matched with other summer series)?

@GillesVandewiele
Copy link
Contributor

Not immediately from the top of my head... You could cut the longer timeseries into smaller windows as a preprocessing step or pad the shorter ones with noise perhaps? These are rather rudimentary and probably not the ideal solution though.

@StefanR44
Copy link
Author

Yes, those are possible workarounds if everything else fails, I am actually on it right now 👍

@StefanR44
Copy link
Author

StefanR44 commented Nov 28, 2021

Did not really succeed on that, as I realized that K-Shape does not use DTW but SBD. Now the questions is, how can that be adjusted? I was checking the functions for SBD usage, I guess this is where it is used / created. I tried to backtrack y_shifted_sbd_vec. It is a function from tslearn: from tslearn.metrics import cdist_normalized_cc, y_shifted_sbd_vec

Is there a way to find out more about it? When i wanted to go to the definition I just got this:
from .cycc import cdist_normalized_cc, y_shifted_sbd_vec

What does it exactly do?


def _shape_extraction(self, X, k):
        sz = X.shape[1]
        Xp = y_shifted_sbd_vec(self.cluster_centers_[k], X[self.labels_ == k],
                               norm_ref=-1,
                               norms_dataset=self.norms_[self.labels_ == k])
        S = numpy.dot(Xp[:, :, 0].T, Xp[:, :, 0])
        Q = numpy.eye(sz) - numpy.ones((sz, sz)) / sz
        M = numpy.dot(Q.T, numpy.dot(S, Q))
        _, vec = numpy.linalg.eigh(M)
        mu_k = vec[:, -1].reshape((sz, 1))

        # The way the optimization problem is (ill-)formulated, both mu_k and
        # -mu_k are candidates for barycenters
        # In the following, we check which one is best candidate
        dist_plus_mu = numpy.sum(numpy.linalg.norm(Xp - mu_k, axis=(1, 2)))
        dist_minus_mu = numpy.sum(numpy.linalg.norm(Xp + mu_k, axis=(1, 2)))
        if dist_minus_mu < dist_plus_mu:
            mu_k *= -1

        return mu_k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants