# Part 3. Time series classification - exercise

> Try your best in one of the UCR datasets!

Today you'll apply the knowledge acquired in part 3 to classify one of the datasets
from the popular UCR archive for time series classification (TSC). You don't have 
to build the TSC algorithm from scratch if you don't want to, but rather make use
of high level tools, such as:
- [aeon](https://github.com/aeon-toolkit/aeon): Implements every kind of TSC algorithm 
covered in this course (distance-based, dictionary-based, deep learning based, 
rocket-based...). It is compatible with scikit learn.
- [tsai](https://github.com/timeseriesAI/tsai): Implements many deep neural architectures
for TSC, beyond InceptionTime (e.g. Transformers for time series). It also has a
fast implementation of ROCKET-based algorithms. It is built on top of fastai (which
in turn is built on top of Pytorch)
- [tslearn](https://github.com/tslearn-team/tslearn#available-features): Compatible
with scikit-learn.
- [sk-time](https://github.com/sktime/sktime). Impklements multiple TSC algorithms,
including all of the ones seen in this course

The data we will use in this session comes from The City of Melbourne, Australia. They have developed an 
automated pedestrian counting system to better understand pedestrian activity 
within the municipality, such as how people use different city locations at 
different time of the day. Their objective is to analyse this data in order to 
facilitate decision making and urban planning for the future. To see an interactive
webapp with the pedestrianconting system in action, check it out [here](https://www.pedestrian.melbourne.vic.gov.au/#date=11-06-2018&time=4). 

To create a TSC dataset out of this experiment, time series researchers have 
extracted data of 10 locations for the whole year 2017. The series represent
pedestrian count for 12 months of the year 2017. Classes correspond location of sensor placement: 
- Class 1: Bourke Street Mall (North) 
- Class 2: Southern Cross Station 
- Class 3: New Quay 
- Class 4: Flinders St Station Underpass 
- Class 5: QV Market-Elizabeth (West) 
- Class 6: Convention/Exhibition Centre 
- Class 7: Chinatown-Swanston St (North) 
- Class 8: Webb Bridge 
- Class 9: Tin Alley-Swanston St (West) 
- Class 10: Southbank 

There is nothing to infer from the order of examples in the train and test set.

## Download the data

In [3]:
from fastai.data.external import untar_data

path = untar_data('https://timeseriesclassification.com/aeon-toolkit/MelbournePedestrian.zip')
print(path)

/home/victor/.fastai/data/MelbournePedestrian


In [5]:
path.ls()

(#13) [Path('/home/victor/.fastai/data/MelbournePedestrian/README.md'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TEST.txt'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian.txt'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TEST.arff'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TRAIN.ts'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_nmv_TEST.arff'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TRAIN.txt'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_nmv_TEST.ts'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TRAIN.arff'),Path('/home/victor/.fastai/data/MelbournePedestrian/MelbournePedestrian_TEST.csv')...]

The files `MelbournePedestrian_TRAIN.txt` and `MelbournePedestrian_TEST.txt` contain
the hourly-based pedestrian counts for each of the locations, 1 row per day (24 hours). 
```
1.0000000e+00   9.7000000e+01   4.2000000e+01   2.0000000e+01   1.0000000e+01   1.4000000e+01   3.3000000e+01   1.1300000e+02   4.2200000e+02   8.7500000e+02   1.0030000e+03   1.3510000e+03   1.6130000e+03   2.9370000e+03   2.9540000e+03   2.1670000e+03   2.3300000e+03   2.1910000e+03   2.6210000e+03   2.4000000e+03   1.8920000e+03   1.2530000e+03   8.4400000e+02   4.3800000e+02   2.0400000e+02
```
In the example above, the first number represents the class (1 in this case), and
the rest 24 numbers are the pedestrian count at every hour of the day.

From now on, you'll be in charge of loading the data, preprocessing and visualising it, 
splitting the training set into train/valid sets if needed, and of course, run whatever TSC
algorithm you choose and compare them. 

You can also add covariates to the dataset, i.e., auxiliary variables that make the
data multivariate and can help the model understand better the patterns 
(e.g., hour of the day, rolling mean, ...)

To evaluate the results on the test set, you can use any classic classification 
metric, such as the accuracy, the F1 (multiclass), precision, or recall. Also, 
have a look at the confusion matrix, and plot the samples that are misclassified 
with a higher error, to see if you find patterns in they way the model fails.