# [Machine Learning Strategies for Time Series Forecasting](https://www.researchgate.net/publication/236941795_Machine_Learning_Strategies_for_Time_Series_Forecasting)



## Random Forest

#### Tutorials:

- [Ensemble learning for time series forecasting in R](https://petolau.github.io/Ensemble-of-trees-for-forecasting-time-series/)

- [Using regression trees for forecasting double-seasonal time series with trend in R](https://petolau.github.io/Regression-trees-for-forecasting-time-series-in-R/)

- [Forecasting with Random Forests](https://pythondata.com/forecasting-with-random-forests/) 

- [Code Example](https://www.ifweassume.com/2014/09/random-forest-for-time-series.html) [here](https://github.com/jradavenport/random-forest-timeseries)

- [Why Random Forests can’t predict trends and how to overcome this problem?](https://medium.com/datadriveninvestor/why-wont-time-series-data-and-random-forests-work-very-well-together-3c9f7b271631)



#### Shortreads:

- [Random forests model for one day ahead load forecasting](https://ieeexplore.ieee.org/document/7110975)

- [Short-Term Load Forecasting Using Random Forests
](https://www.researchgate.net/publication/278656354_Short-Term_Load_Forecasting_Using_Random_Forests)

- [Comparison of ARIMA and Random Forest
time series models](https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-15-276)

- [Variable Selection in Time Series Forecasting Using
Random Forests](https://www.mdpi.com/1999-4893/10/4/114)

Random forest  accepts a vector x=(x1,...xk) for each observation and tries to correctly predict output y. So you need to convert your training data to this format:

In [None]:
import pandas as pd

def table2lags(table, max_lag, min_lag=0, separator='_'):
    """ Given a dataframe, return a dataframe with different lags of all its columns """
    values=[]
    for i in range(min_lag, max_lag + 1):
        values.append(table.shift(i).copy())
        values[-1].columns = [c + separator + str(i) for c in table.columns]
    return pd.concat(values, axis=1)

When you have matrix of x values, you can feed it, for example, to a scikit-learn regressor:



In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor().fit(x[2:], df['z'][2:])

Your model could much improve if you used not only raw lagged values as features, but also their different aggregations: mean, other linear combinations (e.g. ewm), quantiles, etc. Including additional linear combinations into a linear model is useless, but for tree-based models it can be of much help.

### [Deep Learning for Time Series Forecasting](https://machinelearningmastery.com/how-to-get-started-with-deep-learning-for-time-series-forecasting-7-day-mini-course/)

- MLP for Time Series Forecasting (Multilayer Perceptron model or MLP for univariate time series forecasting)
- CNN for Time Series Forecasting
- LSTM for Time Series Forecasting
- CNN-LSTM for Time Series Forecasting
- Encoder-Decoder LSTM Multi-step Forecasting



### [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

- [Forecasting Short Time Series with LSTM Neural Networks](https://gallery.azure.ai/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2)

- [FORECASTING ECONOMIC AND FINANCIAL TIME SERIES: ARIMA VS. LSTM](https://arxiv.org/pdf/1803.06386.pdf)

#### Tutorials:

- [Using LSTMs to forecast time-series](https://towardsdatascience.com/using-lstms-to-forecast-time-series-4ab688386b1f)

- [Kaggle Sales Forecast LSTM](https://www.kaggle.com/carmnejsu/sales-forecast-lstm-67-beginner-friendly)

- [Kaggle Example](https://www.kaggle.com/niyamatalmass/machine-learning-for-time-series-analysis)

- [Datacamp Stock Market Predictions with LSTM in Python](https://www.datacamp.com/community/tutorials/lstm-python-stock-market)

- [TIME SERIES PREDICTION USING LSTM DEEP NEURAL NETWORKS](https://www.altumintelligence.com/articles/a/Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks): [git](https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction)

https://machinelearningmastery.com:

- [Time Series Forecasting with the Long Short-Term Memory Network in Python](https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/)
- [Vanilla LSTM, Stacked LSTM, Bidirectional LSTM, CNN LSTM, ConvLSTM](https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/)
- [LSTM Models for Multi-Step Time Series Forecasting](https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/)




An early attempt to tackle this was to use a simple feedback type approach for neurons in the network where the output was fed-back into the input to provide context on the last seen inputs. These were called Recurrent Neural Networks (RNNs). Whilst these RNNs worked to an extent, they had a rather large downfall that any significant uses of them lead to a problem called the Vanishing Gradient Problem. We will not expand on the vanishing gradient issue any further than to say that RNNs are poorly suited in most real-world problems due to this issue, hence, another way to tackle context memory needed to be found.

This is where the Long Short Term Memory (LSTM) neural network came to the rescue. Like RNN neurons, LSTM neurons kept a context of memory within their pipeline to allow for tackling sequential and temporal problems without the issue of the vanishing gradient affecting their performance.

### GRU

- [RNN, LSTM and GRU tutorial](https://jhui.github.io/2017/03/15/RNN-LSTM-GRU/): [kaggle](https://www.kaggle.com/charel/learn-by-example-rnn-lstm-gru-time-series)

- [Kaggle: Intro to Recurrent Neural Networks LSTM, GRU](https://www.kaggle.com/thebrownviking20/intro-to-recurrent-neural-networks-lstm-gru/notebook)

A gated recurrent unit (GRU) is part of a specific model of recurrent neural network that intends to use connections through a sequence of nodes to perform machine learning tasks associated with memory and clustering, for instance, in speech recognition.

GRU is better than LSTM as it is easy to modify and doesn't need memory units, therefore, faster to train than LSTM and give as per performance. Actually, the key difference comes out to be more than that: Long-short term (LSTM) perceptrons are made up using the momentum and gradient descent algorithms.

### [NNAR](https://otexts.com/fpp2/nnetar.html)

- Lagged values of the time series can be used as inputs to a neural network.
- NNAR(p, k): p lagged inputs and k nodes in the single hidden layer.
- NNAR(p, 0) model is equivalent to an ARIMA(p, 0, 0) model but without stationarity restrictions. 
- Seasonal NNAR(p, P, k): inputs (yt−1, yt−2, . . . , yt−p, yt−m, yt−2m, yt−Pm) and k neurons in the hidden layer.
- NNAR(p, P, 0)m model is equivalent to an ARIMA(p, 0, 0)(P,0,0)m model but without stationarity restrictions

[Forecasting time series with neural networks in R](http://kourentzes.com/forecasting/2017/02/10/forecasting-time-series-with-neural-networks-in-r/)


#### [nnetar](https://www.rdocumentation.org/packages/forecast/versions/8.4/topics/nnetar): [repo](https://github.com/robjhyndman/forecast/blob/master/R/nnetar.R)

- The nnetar() function fits an NNAR(p, P, k)m model.
- If p and P are not specified, they are automatically selected.
- For non-seasonal time series, default p = optimal number of lags (according to the AIC) for a linear AR(p) model.
- For seasonal time series, defaults are P = 1 and p is chosen from the optimal linear model fitted to the seasonally adjusted data.
- Default k = (p + P + 1)/2 (rounded to the nearest integer).
- [Prediction intervals for NNETAR models](https://robjhyndman.com/hyndsight/nnetar-prediction-intervals/)




#### Longreads

- [Comparative Study of Wavelet-SARIMA and Wavelet- NNAR
Models for Groundwater Level in Rajshahi District](http://www.iosrjournals.org/iosr-jestft/papers/vol10-issue7/Version-1/A1007010115.pdf)
- [Comparison of ARIMA and NNAR Models for Forecasting Water Treatment Plant's Influent Characteristics](https://www.researchgate.net/publication/324525859_Comparison_of_ARIMA_and_NNAR_Models_for_Forecasting_Water_Treatment_Plant's_Influent_Characteristics)
- [A NEURAL NETWORK AUTOREGRESSION MODEL TO FORECAST
PER CAPITA DISPOSABLE INCOME ](https://pdfs.semanticscholar.org/93b7/9de5d49e26e933b5a731318390fe907d4957.pdf)

### [XGBoost](https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/)

- [r forecastxgb](https://github.com/ellisp/forecastxgb-r-package/) 

- [Timeseries forecasting using extreme gradient boosting](http://freerangestats.info/blog/2016/11/06/forecastxgb)

- [Time-series Prediction using XGBoost](http://www.georgeburry.com/time-series-xgb/) 

- [Forecasting Markets using eXtreme Gradient Boosting (XGBoost)](https://www.r-bloggers.com/forecasting-markets-using-extreme-gradient-boosting-xgboost/) 

- [Slides: XGBOOST AS A TIME-SERIES
FORECASTING TOOL](http://maddatascientist.eu/wp-content/uploads/2018/06/xgboost_forecasting_eng.pdf)