Stationarity: Time-series data is stationary when its statistical features do not change over time i.e. a constant mean and standard deviation. The covariance is independent of time. A stationary time series is a time series that has no trend. You can detect non-stationarity using the Dickey-Fuller Test and you can remove non-stationarity using differencing.

For time series data to be stationary, the data must exhibit 3 properties over time:
1. Constant Mean:  
A stationary time series will have a constant mean throughout the entire series.
As an example, if we were to draw the mean of the series, this holds as the mean throughout all of the time. 
A good example where the mean wouldn’t be constant is if we had some type of trend. With an upward or downward trend, for example, the mean at the end of our series would be noticeably higher or lower than the mean at the beginning of the series.
![image.png](attachment:image.png)

2. Constant Variance:  
![image.png](attachment:image-2.png)

3. Constant Autocorrelation Structure

## Dickey-Fuller test

The Dickey-Fuller test is a statistical hypothesis test that allows you to detect non-stationarity. 

In [None]:
from statsmodels.tsa.stattools import adfuller
adf, pval, usedlag, nobs, crit_vals, icbest =  adfuller(co2_data.co2.values)
print('ADF test statistic:', adf)
print('ADF p-values:', pval)
print('ADF number of lags used:', usedlag)
print('ADF number of observations:', nobs)
print('ADF critical values:', crit_vals)
print('ADF best information criterion:', icbest)

![image.png](attachment:image.png)
The null hypothesis of the ADF test is that a unit root is present in the time series. The alternative hypothesis is that the data is stationary.

The second value is the p-value. If this p-value is smaller than 0.05 you can reject the null hypothesis (reject non-stationarity) and accept the alternative hypothesis (stationarity). In this case, we cannot reject the null hypothesis and will have to assume that the data is non-stationary. As you have seen the data, you know that there is a trend, so this also confirms the result we obtained.

## Differencing

You can remove the trend from your time series. The goal is to have only seasonal variation

In [None]:
prev_co2_value = co2_data.co2.shift()
differenced_co2 = co2_data.co2 - prev_co2_value
differenced_co2.plot()

# If you redo the ADF test on the differenced data, you will confirm that this data is now indeed stationary:
adf, pval, usedlag, nobs, crit_vals, icbest =  adfuller(differenced_co2.dropna())
print('ADF test statistic:', adf)
print('ADF p-values:', pval)
print('ADF number of lags used:', usedlag)
print('ADF number of observations:', nobs)
print('ADF critical values:', crit_vals)
print('ADF best information criterion:', icbest)

![image.png](attachment:image.png)