# TITLE

This notebook was prepared by: 

Sahil Nisar (sn3028@nyu.edu)

Suniya Raza (sr5748@nyu.edu)

Vinicius Moreira (vgm236@nyu.edu)

Graduate School of Arts and Science (GSAS) at New York University (NYU)

2022

PLACEHOLDER FOR SUMMARY

## 1. Introduction 

## 2. Literature Review

## 3. Forecasting methods

#### 3.1 Univariate Models

**Trailing 3-period average**: 

This simple estimator plays the role of a naïve benchmark. It is an average of three periods, which can be 3-months or 3-quarters, depending on the variable used in our analysis.

\begin{align*}
  \hat{\mu}_{t+1} &= \frac{1}{T}\sum^{t=3}_{t=1} {\hat\mu_t}\\
  &\text{where } \hat{\mu}_{t} \text{is the independent variable}\\
\end{align*}

**Exponential Smoothing**: 

A weighted average of lagged values, with weights decaying exponentially the longer the lag. Exponential smoothing takes into account all past data, whereas moving average only takes into account $k$ past data points.


$$ X_{t+1} = \alpha X_{t} + \alpha (1- \alpha) X_{t-1} + \alpha (1- \alpha)^2 X_{t-2} + {...} $$


where $0 \le \alpha \le 1 $ is the smoothing parameter. You choose how many lags to use. We will use four lags here.

**ARIMA**: 

A stochastic process $ \{X_t\} $ is called an *autoregressive moving
average process*, or ARMA($ p,q $), if it can be written as


<a id='equation-arma'></a>
$$
X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} +
    \epsilon_t + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} \tag{28.5}
$$

where $ \{ \epsilon_t \} $ is white noise.

In what follows we **always assume** that the roots of the polynomial $ \phi(z) $ lie outside the unit circle in the complex plane.

This condition is sufficient to guarantee that the ARMA($ p,q $) process is covariance stationary.

In fact, it implies that the process falls within the class of general linear processes.

We define an ARIMA(p, d, q) model as the mixture of an AR(p) and MA(q) model with differencing (to help make the process stationary)

#### 3.2 Linear Regression and Machine Learning Models

**Simple Linear Regression (OLS):**

he most common technique to estimate a linear relationship between variables is Ordinary Least Squares (OLS). OLS model is solved by finding the parameters that minimize the sum of squared residuals.

The model can be defined, in the matrix form, as:

$$
y = X\beta + u
$$

To solve for the unknown parameter $ \beta $, we want to minimize
the sum of squared residuals

$$
\underset{\hat{\beta}}{\min} \hat{u}'\hat{u}
$$

Rearranging the first equation and substituting into the second
equation, we can write

$$
\underset{\hat{\beta}}{\min} \ (Y - X\hat{\beta})' (Y - X\hat{\beta})
$$

Solving this optimization problem gives the solution for the
$ \hat{\beta} $ coefficients

$$
\hat{\beta} = (X'X)^{-1}X'y
$$

**Ridge / Lasso / Elastic Net:**

These models are very closely related to traditional OLS, but they focus on regularization of parameters to avoid overfitting.

The Lasso model generates predictions using but optimizes over a slightly different loss function:

$$
\underset{\hat{\beta}}{\min} \ (Y - X\hat{\beta})' (Y - X\hat{\beta}) + \alpha\hat{\beta}
$$

where $ \alpha $ is the regularization parameter. The additional term penalizes large coefficients and in practice, effectively sets coefficients to zero for features that are not informative about the target.


Ridge regressions places a particular form of constraint on the parameters $\beta$, which is chosen to minimize the penalized sum of squares:

$$
\underset{\hat{\beta}}{\min} \ (Y - X\hat{\beta})' (Y - X\hat{\beta}) + \lambda\hat{\beta}'\hat{\beta}
$$

This means that if the $\beta$ take on large values, the optimization function is penalized, but not zero (only reducing the impact of "irrelevant" features of the model).

The elastic net algorithm uses a weighted combination of Ridge and Lasso forms of regularization. 

#### 3.3 More Complex Econometric Methods

**Dynamic Factor Model:**

In a dynamic factor model, we model a potentially large number of macroeconomic series as being driven by a much smaller number of latent factors, which are estimated through a principal component analysis. 

Principal component analysis is an unsupervised algorithm, based on feature correlation, used for dimensionality reduction. The premise is simply to take data of higher dimensions, and reduce to a lower dimension.

Often times, in higher dimensional data, it isn't possible to create visual representations of relationships between variables. Through applying PCA, it then becomes possible to reduce the dimensions of the data and display variable relationships. This tool also allows easier visualization and noise filtering, among other applications.

The PCA must be used when three conditions apply:

1. Reduce the number of variables
2. Ensure that each variable is independent of one another
3. Assume that the interpretation of the independent variables is less important

How does a PCA work?

a. Calculate a matrix that summarizes how the variables are related one another (the covariance matrix).

b. Then separate it between direction (eigenvectors) and magnitude (eigenvalues)

c. By projecting the data into a smaller space, we reduce dimension, but keep the original variables in our model

Mathematically, the first principal component is the direction in space along which projections have the largest variance. The second principal component is the direction which maximizes variance among all directions orthogonal to the first. The k-th component is the variance-maximizing direction orthogonal to the previous k-1 components.

With those principal components, we use them as explanatory variables in an OLS.

**Vector autoregressions (VARs):**

Description

#### 3.4 Nonlinear Algorithms


**Random Forest:**

Description

**Gradient Boosted Decision Trees:**

Description

**K-Nearest Neighbors:**

Description

**Support Vector Regression:**

Description

#### 3.5 Neural Networks

**Dense:**

Description

**LSTM:**

Description

## 4. Applications to US GDP growth

## 5. Applications to Brazil industrial production growth

## 6. Conclusion

## 7. References

References (make this more professional)

https://machinelearningmastery.com/exponential-smoothing-for-time-series-forecasting-in-python/

https://python-advanced.quantecon.org/arma.html

https://www.bounteous.com/insights/2020/09/15/forecasting-time-series-model-using-python-part-two/

Towards Science: A One-Stop Shop for Principal Component Analysis (https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c).

In Depth: Principal Component Analysis (https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html).

Advanced Data Analysis from an Elementary Point of View (https://www.stat.cmu.edu/~cshalizi/uADA/15/lectures/17.pdf).

Applications of Principal Component Analysis (PCA) (https://iq.opengenus.org/applications-of-pca/).