Resource : https://github.com/alan-turing-institute/sktime/blob/f2e096f3f009d7d2f811c3fe9550b4812984b77c/examples/segmentation_with_clasp.ipynb

Prerequisites:

In [None]:
import pandas as pd

from sktime.annotation.clasp import ClaSPSegmentation, find_dominant_window_sizes

from sktime.annotation.plotting.utils import (
    plot_time_series_with_change_points,
    plot_time_series_with_profiles,
)
from sktime.datasets import load_electric_devices_segmentation

## Segmentation:

Time series segmentation aims to find regions in a time series data that are different from each other. Transition point from one region to another one is called change point. Change point detection is the task of finding these points and this is what introduced in this notebook.

In [None]:
# Using a built-in data:
y , period , change_points = load_electric_devices_segmentation()

print("Period length:" ,period)
print("Change points:" ,change_points)

In [None]:
plot_time_series_with_change_points("Electric Devices", y, change_points)

## Segmentation with ClaSP

Classification Score Profile (ClaSP) splits the time series into two sub-sections and the change point is found by training binary classification. The point which identifies this partition best is choosen.
This algorithm needs at least two parameters:
* period length of the time series
* the number of change points

In [None]:
clasp = ClaSPSegmentation(period_length=period, n_cps=5, fmt="sparse")
# Both fitting and predicting from time series data:
found_cps = clasp.fit_predict(y)
profiles = clasp.profiles
scores = clasp.scores
print("The found change points are:", found_cps.to_numpy())

From these profiles the highest probability is choosen as change point:
For 5 regions, there are 4 profiles and points.

In [None]:
profiles

In [None]:
# Highest scores in every profile:
scores

The positions of these scores will be the change points.

### Comparison of found points and true points:

In [None]:
plot_time_series_with_profiles(
    "Electric Devices",
    y,
    profiles,
    change_points,
    found_cps,
)

To choose a proper window length for this algorithm, there is a built-in function to calculate it from the time series.

In [None]:
dominant_period_size = find_dominant_window_sizes(y)
print("Dominant Period:", dominant_period_size)