# L12c: Recurrent Neural Networks

___

In this lecture, we continue our discussion of artificial neural networks by introducing recurrent neural networks (RNNs). RNNs are a type of neural network that is particularly well-suited for processing sequences of data, such as time series or natural language. In this lecture, we will cover the following topics:

* __What are RNNs?__: Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequential data by retaining information about previous inputs through their _internal memory_. This makes them particularly effective for tasks such as language modeling, time-series prediction, and speech recognition, where context and dependencies between data points are crucial.
* __How do RNNs work?__: RNNs maintain a hidden state that is updated at each time step based on the current input and the previous hidden state. This allows them to capture temporal dependencies in the data. The basic building block of an RNN is a recurrent layer, which processes the input sequence one element at a time while updating its hidden state.
* __Training RNNs__: Recurrent Neural Networks (RNNs) are trained using backpropagation through time (BPTT), which _unrolls the network_ across sequential steps to compute gradients and update shared weights. However, this process is prone to the __vanishing gradients problem__, where gradients shrink exponentially during backpropagation, hindering learning of long-term dependencies, and the __exploding gradients problem__, where unchecked gradient growth destabilizes training. These challenges led to advanced architectures like Long short-term memory (LSTMs) and Gated Recurrent Units (GRUs), which use gating mechanisms to regulate information flow better and mitigate gradient issues.

Sources for this lecture include:
* [Goodfellow et al., Deep Learning Book, 2017 MIT Press](http://www.deeplearningbook.org/)

To get a general overview of RNNs, let's check out the following [video from the IBM technology channel](https://www.yout-ube.com/watch?v=Gafjk7_w1i8) on YouTube. It provides a good introduction to the topic and covers some of the key concepts we will discuss in this lecture.
___

## Setup, Data and Prequisites
Let's set up the computational environment, e.g., importing the necessary libraries (and codes) by including the `Include.jl` file.

In [1]:
include("Include.jl");

### Data
Fill in a description of the data here. 

In [2]:
X, rawdata = let

    # raw data -
    rawdata = CSV.read(joinpath(_PATH_TO_DATA, "Temp-ITH-YTD-NOAA-2025.csv"), DataFrame); # load the data from a CSV file into a DataFrame
    X = @select rawdata :TMIN :TMAX; # Wow! Grab the Tmax and Tmin using the @select macro from the DataFramesMeta.jl package.

    # return -
    X,rawdata;
end;

## General problem: Modeling a Sequence
Suppose we a _sequence of data_ $(x_1, x_2, \ldots, x_T)$ where $T$ is the length of the sequence, and $x_i$ is the $i$-th element (token) of the sequence. 
* _Example sequences_: in natural language processing, $x_{i}$ could be words or characters in a sentence in a word. In time series analysis, $x_t$ could be a measurement, i.e., temperature, pressure, price, etc at time $i$.

## What are RNNs?
Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequential data by retaining information about previous inputs through their _internal memory_. 

* _Do feedforward neural networks have memory?_ No, feedforward neural networks process do not retain information about previous inputs. Thus, once training is over, the parameters (weights and bias values) do not change. This means that the network is done learning and done changing. When we feed in values, a FNN simply applies the operations that make up the network, using the values it has learned.
* _How are RNNs different from feedforward neural networks?_ RNNs have connections that loop back on themselves, allowing them to maintain a _hidden state_ that captures information about previous inputs. This makes RNNs particularly effective for tasks such as language modeling, time-series prediction, and speech recognition, where context and dependencies between data points are crucial. 

### Elman Network: Mathemtcal Formulation
The Elman network is a simple type of RNN that consists of an input layer, a hidden layer, and an output layer. The hidden layer has recurrent connections that allow it to maintain a hidden state over time:

* [Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179-211.](https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1402_1)

__At each time step__: an Elman RNN takes an _input_ and the previous hidden state (memory) and computes the output entry at time $t$. 

Let the input vector at time $t$ be denoted as $\mathbf{x}_t\in\mathbb{R}^{d_{in}}$, the hidden state at time $t$ as $\mathbf{h}_t\in\mathbb{R}^{h}$, and the output at time $t$ as $\mathbf{y}_t\in\mathbb{R}^{d_{out}}$. The RNN can be described by the following equations:
$$
\begin{align*}
\mathbf{h}_t &= \sigma_{h}(\mathbf{U}_h \mathbf{h}_{t-1} + \mathbf{W}_x \mathbf{x}_t + \mathbf{b}_h) \\
\mathbf{y}_t &= \sigma_{y}(\mathbf{W}_y \mathbf{h}_t + \mathbf{b}_y)
\end{align*}
$$
where the parameters are:
* _Network weights_: the term $\mathbf{U}_h\in\mathbb{R}^{h\times{h}}$ is the weight matrix for the hidden state, $\mathbf{W}_x\in\mathbb{R}^{h\times{d_{in}}}$ is the weight matrix for the input, and $\mathbf{W}_y\in\mathbb{R}^{d_{out}\times{h}}$ is the weight matrix for the output
* _Network bias_: the $\mathbf{b}_h\in\mathbb{R}^{h}$ terms denotes the bias vector for the hidden state, and $\mathbf{b}_y\in\mathbb{R}^{d_{out}}$ is the bias vector for the output.
* _Activation function_: the $\sigma_{h}$ function is a _hidden layer activation function_, such as the sigmoid or hyperbolic tangent (tanh) function, which introduces non-linearity into the RNN. The activation function $\sigma_{y}$ is an _output activation function_ that can be a softmax function for classification tasks or a linear function for regression tasks.

Let's build a simple Elman RNN to better understand how it works.

In [3]:
d_in, dh, len, batch_size = 4, 6, 1, 1;
x = rand(Float32, (d_in, len));
h = zeros(Float32, dh);
rnn = RNN(d_in => dh, tanh_fast)
y = rnn(x, h);   # [y] = [d_out, len, batch_size]

In [4]:
x

4×1 Matrix{Float32}:
 0.08799863
 0.5071629
 0.6970396
 0.12102342

In [5]:
rnn.cell.Wh

6×6 Matrix{Float32}:
 -0.640331   -0.577036    0.422144   0.302013   0.311283   -0.608568
  0.0706657   0.271955   -0.572958  -0.322408  -0.460827    0.0084206
 -0.251138    0.187096    0.54768   -0.64055    0.0545763  -0.678802
 -0.646051   -0.545371    0.54139   -0.43489    0.212093    0.364059
  0.569772   -0.431087   -0.668269   0.251932  -0.0897167   0.442536
  0.0908543  -0.0588846  -0.684212  -0.474873  -0.582005   -0.284068

In [6]:
elmanmodel = let

    # iniitialize -
    number_of_inputs = 1; # dimension of the input
    number_of_outputs = 10; # dimension of the output
    number_of_hidden_units = 2; # number of hidden neurons
    σ₁ = NNlib.tanh_fast; # activation function
    σ₂ = NNlib.tanh_fast; # activation function

    # build the model (with random parameters)
    model = build(MyElamanRecurrentLayerModel, (
        number_of_inputs = number_of_inputs,
        number_of_outputs = number_of_outputs,
        number_of_hidden_units = number_of_hidden_units,
        batchsize = 1,
        σ₁ = σ₁,
        σ₂ = σ₂,
    ));

    # return the model
    model
end;

Fill me in here.

### Jordan Network: Mathematical Formulation
The Jordan network is another type of RNN that is similar to the Elman network but has a different architecture. In a Jordan network, the output layer is connected back to the hidden layer, allowing the network to maintain a hidden state based on the output at the previous time step.
* [Jordan, Michael I. (1997-01-01). "Serial Order: A Parallel Distributed Processing Approach". Neural-Network Models of Cognition — Biobehavioral Foundations. Advances in Psychology. Vol. 121. pp. 471–495. doi:10.1016/s0166-4115(97)80111-2. ISBN 978-0-444-81931-4. S2CID 15375627.](https://www.sciencedirect.com/science/article/pii/S0166411597801112?via%3Dihub)

__At each time step__: a Jordan RNN takes an _input_ and, the previous hidden state (memory) and the previous output, and computes the output entry at time $t$. Thus, the Jordan network has a similar structure to the Elman network but with a different way of maintaining the hidden state (i.e., the output layer is connected back to the hidden layer).

Let the input vector at time $t$ be denoted as $\mathbf{x}_t\in\mathbb{R}^{d_{in}}$, the hidden state at time $t$ as $\mathbf{h}_t\in\mathbb{R}^{h}$, 
the state vector at time $t$ as $\mathbf{s}_t\in\mathbb{R}^{h}$, and the output at time $t$ as $\mathbf{y}_t\in\mathbb{R}^{d_{out}}$. Then, the Jordan RNN can be described by the following equations:
$$
\begin{align*}
\mathbf{h}_t &= \sigma_{h}(\mathbf{U}_h \mathbf{s}_{t-1} + \mathbf{W}_h \mathbf{s}_t + \mathbf{b}_h) \\
\mathbf{y}_t &= \sigma_{y}(\mathbf{W}_y \mathbf{h}_t + \mathbf{b}_y) \\
\mathbf{s}_t &= \sigma_{s}(\mathbf{W}_{ss} \mathbf{s}_{t-1} + \mathbf{W}_{sy} \mathbf{y}_{t-1} + \mathbf{b}_s) \\
\end{align*}
$$
where the parameters are:

## Training RNNs
The training process for RNNs is similar to that of feedforward neural networks, but with a few key differences. The main difference is that RNNs are trained using _backpropagation through time_ (BPTT), which unrolls the network across sequential steps to compute gradients and update shared weights. 

## Lab
In Lab `L12d`, we will implement (and train) one of the modifications of RNNs, the Long Short-Term Memory (LSTM) network. 
* _What is this_? LSTMs are a type of RNN that is designed to better capture long-term dependencies in sequential data. They achieve this by using a gating mechanism that regulates the flow of information into and out of the hidden state. This allows LSTMs to learn to remember or forget information as needed, making them more effective for tasks such as language modeling and time-series prediction.

# Today?
That's a wrap! What are some of the interesting things we discussed today?