#### Seasonal

A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known frequency.

#### White noise 

Time series that show no autocorrelation are called white noise.
![image.png](attachment:image.png)

##### Statistical stationarity
- A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time. 

##### Autocorrelation
- Just as correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values of a time series.

##### Lag
- The lag operator (also known as backshift operator) is a function that shifts (offsets) a time series such that the “lagged” values are aligned with the actual time series.

![image.png](attachment:image.png)


# ARIMA 


- Autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. 

- Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). 

- ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied one or more times to eliminate the non-stationarity

- ARIMA captures standard temporal structures (patterned organizations of time) in the input dataset

1. AR 

    - A pure Auto Regressive (AR only) model is one where Yt depends only on its own lags. That is, Yt is a function of the ‘lags of Yt’.
    ![image.png](attachment:image.png)
    
    where, $Y{t-1}$ is the lag1 of the series, $\beta1$ is the coefficient of lag1 that the model estimates and $\alpha$ is the intercept term, also estimated by the model.

2. MA
    - A pure Moving Average (MA only) model is one where Yt depends only on the lagged forecast errors.
    ![image.png](attachment:image.png)
    where the error terms are the errors of the autoregressive models of the respective lags. The errors Et and E(t-1) are the errors from the following equations :

![image.png](attachment:image.png)

An ARIMA model is one where the time series was "differenced" at least once to make it stationary and you combine the AR and the MA terms. So the equation becomes:

![image.png](attachment:image.png)

###### Note:
- Differencing in statistics is a transformation applied to time-series data in order to make it stationary. A stationary time series' properties do not depend on the time at which the series is observed.

- In order to difference the data, the difference between consecutive observations is computed.

![image.png](attachment:image.png)


- Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series.

- Sometimes it may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second order differencing:
![image.png](attachment:image.png)

- Another method of differencing data is seasonal differencing, which involves computing the difference between an observation and the corresponding observation in the previous year. This is shown as:
![image.png](attachment:image.png)
- The differenced data is then used for the estimation of an ARMA model.

### How ARIMA Works
The ARIMA algorithm is especially useful for datasets that can be mapped to "stationary" time series.

The statistical properties of stationary time series, such as autocorrelations, are independent of time.

Datasets with stationary time series usually contain a combination of signal and noise. The signal may exhibit a pattern of sinusoidal oscillation or have a seasonal component. 

ARIMA acts like a filter to separate the signal from the noise, and then extrapolates the signal in the future to make predictions.

## Exponential Smoothing (ETS) Algorithm

Exponential Smoothing (ETS) is a commonly-used local statistical algorithm for time-series forecasting.

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older.

### 1. Simple exponential smoothing
This method is suitable for forecasting data with no clear trend or seasonal pattern.
![image.png](attachment:image.png)

Using the naïve method, all forecasts for the future are equal to the last observed value of the series,
![image.png](attachment:image.png)
for h = 1, 2,…. Hence, the naïve method assumes that the most recent observation is the only important one, and all previous observations provide no information for the future. This can be thought of as a weighted average where all of the weight is given to the last observation.

Using the average method, all future forecasts are equal to a simple average of the observed data,
![image.png](attachment:image.png)
for  h = 1,2,…. Hence, the average method assumes that all observations are of equal importance, and gives them equal weights when generating forecasts.

### 2. Weighted average form
The forecast at time T+1 is equal to a weighted average between the most recent observation yT and the previous forecast:
![image.png](attachment:image.png)

where 0≤α≤1 is the smoothing parameter. Similarly, we can write the fitted values as:
![image.png](attachment:image.png)

for  t=1,…,T. (Recall that fitted values are simply one-step forecasts of the training data.)
The process has to start somewhere, so we let the first fitted value at time 1 be denoted by ℓ0 (which we will have to estimate). Then:
![image.png](attachment:image.png)

![image.png](attachment:image.png)

### How ETS Works
- The ETS algorithm is especially useful for datasets with seasonality and other prior assumptions about the data.

- ETS computes a weighted average over all observations in the input time series dataset as its prediction. 

- The weights are exponentially decreasing over time, rather than the constant weights in simple moving average methods. The weights are dependent on a constant parameter, which is known as the smoothing parameter.

## Non-Parametric Time Series (NPTS) Algorithm
The Amazon Forecast Non-Parametric Time Series (NPTS) algorithm is a scalable, probabilistic baseline
forecaster. 

It predicts the future value distribution of a given time series by sampling from past observations. 

The predictions are bounded by the observed values. NPTS is especially useful when the time series is intermittent (or sparse, containing many 0s) and bursty

1. NPTS

    - In this variant, predictions are generated by sampling from all observations in the training range of the time series. 

    - However, instead of uniformly sampling from all of the observations, this variant assigns weight to each of the past observations according to how far it is from the current time step where the prediction is needed.

    - In particular, it uses weights that decay exponentially according to the distance of the past observations.

    - In this way, the observations from the recent past are sampled with much higher probability than the observations from the distant past. This assumes that the near past is more indicative for the future than the distant past.

2. Seasonal NPTS

    - The seasonal NPTS variant is similar to NPTS except that instead of sampling from all of the observations, it uses only the observations from the past seasons.

    - By default, the season is determined by the granularity of the time series.

    - For example, for an hourly time series, to predict for hour t, this variant samples from the observations corresponding to the hour t on the previous days. Similar to NPTS, observation at hour t on the previous day is given more weight than the observations at hour t on earlier days.

3. Climatological Forecaster
    - The climatological forecaster variant samples all of the past observations with uniform probability

4. Seasonal Climatological Forecaster
    
    - Similar to seasonal NPTS, the seasonal climatological forecaster samples the observations from past seasons, but samples them with uniform probability.

5. Seasonal Features

    - To determine what corresponds to a season for the seasonal NPTS and seasonal climatological forecaster, use the features listed in the following table. The table lists the derived features for the supported basic time frequencies, based on granularity. Amazon Forecast includes these feature time series, so you don't have to provide them.
    ![image.png](attachment:image.png)

### How NPTS Works
- Similar to classical forecasting methods, such as exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA), NPTS generates predictions for each time series individually.

- The time series in the dataset can have different lengths. The time points where the observations are available are called the training range and the time points where the prediction is desired are called the prediction range

## DeepAR+ Algorithm

Amazon Forecast DeepAR+ is a supervised learning algorithm for forecasting scalar (one-dimensional)
time series using recurrent neural networks (RNNs). 

Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future.

=> When your dataset contains hundreds of feature time series, the DeepAR+ algorithm outperforms the standard ARIMA and ETS methods

### How DeepAR+ Works

- During training, DeepAR+ uses a training dataset and an optional testing dataset. It uses the testing dataset to evaluate the trained model. 
- In general, the training and testing datasets don't have to contain the same set of time series. You can use a model trained on a given training set to generate forecasts for the future of the time series in the training set, and for other time series. 
- Both the training and the testing datasets consist of (preferably more than one) target time series. 

The training dataset consists of a target time series, zi,t, and two associated feature time series, xi,1,t and xi,2,t.
![image.png](attachment:image.png)

DeepAR+ supports only feature time series that are known in the future. This allows you to run
counterfactual "what-if" scenarios. For example, "What happens if I change the price of a product in
some way?

- Each target time series can also be associated with a number of categorical features. You can use these to encode that a time series belongs to certain groupings. 
- Using categorical features allows the model to learn typical behavior for those groupings, which can increase accuracy. 
- A model implements this by learning an embedding vector for each group that captures the common properties of all time series in the group.

### Exclusive Features of Amazon Forecast DeepAR+ over Amazon  SageMaker DeepAR
The Amazon Forecast DeepAR+ algorithm improves upon the Amazon SageMaker DeepAR algorithm with the following new features:

• Learning rate scheduling

    - During a single training run, DeepAR+ can reduce its learning rate. This often reduces loss and forecasting error.
    
• Model averaging

    - When you use multiple models for training with the DeepAR+ algorithm, Amazon Forecast averages the training runs. This can reduce forecasting error and dramatically increase model stability. Your DeepAR+ model is more likely to provide robust results every time you train it.
    
• Weighted sampling

    - When you use a very large training dataset, DeepAR+ applies streaming sampling to ensure convergence despite the size of the training dataset. A DeepAR+ model can be trained with millions of time series in a matter of hours.

## Prophet Algorithm
Prophet is a popular local Bayesian structural time series model. The Amazon Forecast Prophet algorithm
uses the Prophet class of the Python implementation of Prophet

### How Prophet Works

Prophet is especially useful for datasets that:

    • Contain an extended time period (months or years) of detailed historical observations (hourly, daily, or
    weekly)

    • Have multiple strong seasonalities

    • Include previously known important, but irregular, events

    • Have missing data points or large outliers

    • Have non-linear growth trends that are approaching a limit 

Prophet is an additive regression model with a piecewise linear or logistic growth curve trend. It includes
a yearly seasonal component modeled using Fourier series and a weekly seasonal component modeled
using dummy variables.