# Evaluating a timeseries model

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tommoral/24-sacl-ai-4-sciences/blob/main/session-3/03-evaluating-ts-models.ipynb)

```
Authors: Thomas Moreau
         Mathurin Massias
         Alexandre Gramfort
```


The purpose of this notebook is to show the caveats of evaluating time-series models.
Let's take an example of some financial quotes

In [None]:
from pathlib import Path
import pandas as pd

DATA_FILE = "data/quotes.parquet"
URL_REPO = "https://github.com/tomMoral/24-sacl-ai-4-sciences/raw/main/session-3/"

if Path(DATA_FILE).exists():
    pd.read_parquet(DATA_FILE)
else:
    pd.read_parquet(f"{URL_REPO}{DATA_FILE}")

In [None]:
quotes.to_parquet()

In [None]:
_, ax = plt.subplots(figsize=(10, 6))
_ = quotes.plot(ax=ax)

Let's assume we want to predict the stock price of "Chevron" from the stock price of the other companies at each time point.

In [None]:
quotes

In [None]:
from sklearn.ensemble import GradientBoostingRegressor

X, y = quotes.drop(columns=["Chevron"]), quotes["Chevron"]
regressor = GradientBoostingRegressor()

In [None]:
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import cross_val_score

cv = TimeSeriesSplit()
scores = cross_val_score(regressor, X, y, cv=cv)
print(f'Mean R2: {scores.mean():.2f}')

<div class="alert alert-success">

**QUESTION:**

- It seems that we have the perfect regressor. Is this normal? You will use the function `statsmodels.tsa.stattools.acf` look at the correlation between sucessive samples.
- Did we break any assumption of the cross-validation procedure?

</div>

Solution is in: `solutions/00b-time_series_split.py`

Let's check the different type of cross-validation that are available in scikit-learn:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py