<h1><center>Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communications</center></h1>
<br>
<br>
<p>Andrew Van</p>

## The Motivation - Modeling Nonlinear systems

- Linear systems follow 2 properties:
    - Superposition: $f(x+y) = f(x) + f(y)$
    - Homogeneity: $f(\alpha x) = \alpha f(x)$
- Non-linear systems are those that are missing one or all of these properties.
    - Because of this, it is generally difficult to study and obtain analytical models of non-linear systems
    - However, we can still study/utilize non-linear systems through black boxes (given some input to my mystery box, what is the output?)
<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/blackbox.png">
</div>

## Nonlinear systems (continued)

- Most technical systems become non-linear at higher operational points (closer to saturation).
    - More energy efficient, but because of non-linearities they become unpredicatable
    - We use inefficient linear systems because of this
    - Biomechanical systems use their full dynamic range (up to saturation) and are very efficient, and also throughly nonlinear.
        - Inspirations from biology?

## Echo State Networks (ESNs)

- An approach to learning black-box models of nonlinear systems
- A type of artificial recurrent neural network
    - Feedback loops in their synaptic connections
    - Can maintain activation in absence of input
    - Exhibit dynamic memory
    - Can learn to mimic any target system with arbitrary accuracy
<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/recurrent.svg">
</div>

## What (was) novel about ESNs?

- At the time of publication (2004), recurrent neural networks were very difficult to train
    - Vanishing Gradient Problem
    - Suboptimal Solutions due to contrained model complexity
- ESNs eschew backprojection
    - Create a large "reservoir" of neurons (50 - 1000) of random connections and only modify the input/output neurons
    - No cyclic dependencies between trained readout connections, and training ESN becomes a simple linear regression task

## Demonstrating the utility of ESNs

- Demonstrated ESN approach for a Mackey-Glass system (MGS)
    - Standard benchmark system for time series prediction studies
    - Generates an irregular time series
- Two steps for using ESN
    - Train with signal generated from original MGS as teacher signal
    - Use it to predict original signal some steps ahead
<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/mackeyglass.png" width="33%">
</div>

## Training

- Create 1000 neurons ("reservoir") with sparse interconnections (1%) and one output neuron with random connections back into reservoir
- 3000 step teacher sequence generated from original MGS, and fed into the output neuron
    - Excites the internal neurons through the output feedback connections
    - After initial transient period, exhibit systematic individual variations of the teacher sequence
        - Act as "echo functions" for the driving signal
        - Sparsity of interconnections lets reservoir decompose into many loosely coupled subsystems

## Another Example (Signal Generator)

<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/FreqGenSchema.png">
</div>

## Training (continued)

<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/F1.large.jpg" width="50%">
</div>

## Training  (continued)

- After time $n$ = 3000, output connection weights, $w_{i}$ were computed from the last 2000 steps ($n$ = 1001,...,3000) of the training run such that MSE was minimized

$$ MSE_{train} = \frac{1}{2000} \sum_{n=1001}^{3000} \Big(d(n) - \sum_{i=1}^{1000} w_{i}x_{i}(n)\Big)^{2} $$

where $x_{i}(n)$ is the activation of the ith internal neuron at time $n$.

- This is simple linear regression!

## Validation

- Disconnect after 3000 steps and left running freely
- Output created through: $ y(n) = \sum_{i=1}^{1000} w_{i}x_{i}(n) $
- Looked at next 84-steps of original signal for comparison vs. generated signal.
    - Averaged results over 100 independent trials, and calculated normalize root mean square error:

$$ NRMSE = \Big(\sum_{j=1}^{100}(d_{j}(3084) - y_{i}(3084))^{2}/100 \sigma^{2} \Big)^{1/2} \approx 10^{-4.2} $$

- Improvement in performance by a factor of **700** when compared to previous techniques!
- Deviations noted after about 1300 steps
    - Refinements to model showed improvement factors up to **2400**

## MGS Results
<br>
<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/F2.large.jpg" width="50%">
</div>

## Why this jump in accuracy when compared to previous methods?

- ESNs capitalize on a massive short-term memory.
- Authors showed analytically, that under certain conditions, an ESN of size $N$ can "remember" a number of previous inouts that is of the same order of magnitude as $N$
    - Further reading [here](https://opus.jacobs-university.de/frontdoor/index/index/docId/638)
    - Significantly more memory than any other techniques (up until this point)

## A practical application of ESNs

- Wireless communications run at lower power profiles, since nonlinear distortion in high-gain power regions.
    - Imposed inefficiency due to nonlinear distortions!
    - Nonlinear modeling can help run wireless communications more efficiently! (Cellphones/Satellites use less power)
- Used message sequences of 5000 symbols for training of ESN and compared against conventional methods of 

## Results for Nonlinear Channel Equalization
<br>
<div style="width:fit-content;display:block;margin-right:auto;margin-left:auto;">
<img src="images/F3.large.jpg" width="50%">
</div>