# Lab Time Series (Students version)
 

We can use the following libraries:

In [2]:
import random # standard random generation tools library 
import math # standard math tools library
import matplotlib.pyplot as plt # plotting library
import numpy as np # numerical computation library
import pandas as pd # data analysis library

from statsmodels.graphics.tsaplots import plot_acf # acf plotting function
from scipy.optimize import curve_fit # curve fitting function

In [3]:
import sys # info about the python version used
print(sys.version)

3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0]


First download the file fictional_population.csv

It contains a time series $ x(t)$  with the population of a fictional city every year from 1921 to 2021. For the sake of simplicity, the column "date" represents the number of years since the beginning of the data acquisition (so 1 is 1921, 2 is 1922, etc).

## Exercise 1: visualizing and isolating the trend

### Question 1

- After importing data (pandas dataframe is recommended), plot the time series.

### Question 2

We choose to model the trend of the first part of the time series with a degree 2 polynomial:

$$ \hat{m}(t) = a + b.t + c.t^2 $$

- do a least square regression to find the values of coefficients $a$, $b$ and $c$

- plot the data and the model on a same picture

## Exercise 2 : model of the residuals

### Question 3

Here the residual time series is defined as $r(t)$ such that

$$ x(t) = \hat{m}(t) + r(t) $$

- plot the residual time series $r(t)$

- plot its ACF (function plot_acf can be used)

- from a visual examination of the residuals and the ACF, do you think the process is IID? (justify in one sentence)


### Question 4

We propose to make an AR(1) model of the residuals, denoted $ \hat{r}(t) $, that is to say

$$   \hat{r}(t)  =  \phi \hat{r}(t-1) + w(t) $$

where $w(t)$ is white noise.

To find the value of the parameter $ \phi $, do as follows: 

- plot $\hat{r}(t)$ as a function of  $\hat{r}(t-1)$ and check visually that a linear model is not absurd

- do a least square regression to compute the most probable value of $\phi$

### Question 5



Now the complete model of the time series is

$$ y(t) = \hat{m}(t) + \hat{r}(t) + w(t) $$

Note also that

$$ w(t) = \hat{r}(t) - \phi.\hat{r}(t-1) $$

- plot the new residual time series $w(t)$ (for $t>1$)

- plot its ACF (function plot_acf can be used)

- from a visual examination of the residuals and the ACF, do you think the process is IID? (justify in one sentence)

## Exercise 3: IID tests on residual time series

In this section, we test quantitatively if the residual time series can be considered as IID or not. For this purpose, we use two tests: the difference-sign test and the sample ACF test.

### Question 6: difference-sign test

The idea of this test is the following: if a time series $x(t)$ is IID, then it is possible the number of times when $x(t+1) > x(t)$ and see if it is consistent with the observations.

#### Principle

The idea is that for an IID time series, the probability for $x(t+1) > x(t)$ is 1/2, so the expected value $ \mu _S $ of the number of such observations over the whole time series is

$$ \mu _S = \frac{n-1}{2} $$

By an analogous reasoning, we can compute that the variance of the number of times $x(t+1) > x(t)$ over the time series, it is
$$ \sigma _S ^2 = \frac{n+1}{12}$$

#### Test

Thus, we can test that the measured number of times $x(t+1) > x(t)$, denoted $n_S$ is consistent with a usual test on a normal distribution. Using $ \alpha = 0.05 $, we reject the hypothesis that the time series is IID if

$$ \frac{|n_S - \mu _S|}{\sigma _S ^2 }  > \Phi _{1- \alpha/2} = 1.96 $$

(c.f. course on hypothesis testing)

- Test if the residual times $ \hat{r}(t) $ and $ \hat{w}(t) $ can be considered or not as IID according to this test.

### Question 7: sample autocorrelation function test

For this test, we admit that the sample autorrelation function of an IID time series of size $n$ is also approximately IID and follows a normal distribution with mean 0 and variance $1/n$.

#### Test

Consequently, we should have 95% of the ACF values falling in the interval $ \left[ \frac{-1.96}{\sqrt{n}}  ;  \frac{+1.96}{\sqrt{n}} \right] $.

- Test graphically if the residual times $ \hat{r}(t) $ and $ \hat{w}(t) $ can be considered or not as IID according to this test, you can for instance add the corresponding boundaries on the ACF plot.

#### Comment

- What do you think about the sensibility of these two tests?