In [None]:
%load_ext autoreload
%autoreload 2

# Setup

In [None]:
%matplotlib inline

# Imports.
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
import pandas as pd
import pickle
from mpl_toolkits.mplot3d import Axes3D

# Important for multiprocessing.
import torch
torch.set_num_threads(1)

# General plotting things.
from plot import get_3d_subplot_axs
from plot import get_figsize, set_figsize
from plot import plot_df_trisurf

default_w, default_h = get_figsize()

# Experiment imports.
from collections import OrderedDict
from gridsearch import experiment, load_experiment

# Dataset.
import dataset as ds

u_train, y_train = ds.NARMA(sample_len = 2000)
u_test, y_test = ds.NARMA(sample_len = 3000)
dataset = [u_train, y_train, u_test, y_test]
ds.dataset = dataset

# Distance functions.
from matrix import euclidean

euc = euclidean
def inv(x, y): return 1/euclidean(x, y)
def inv_squared(x, y): return 1/euclidean(x, y)**2
def inv_cubed(x, y): return 1/euclidean(x, y)**3

# Oftentimes for debugging purposes.
from ESN import ESN, Distribution
from metric import evaluate_esn

# Experiments: Random Geometric Graphs

## Echo State Networks with nodes in metric space

Undirected geometric graphs with nodes randomly sampled uniformly in the  
underlying space [0, 1)^d.  

In [None]:
%%script false --no-raise-error

from notebook import rgg_dist_performance
rgg_dist_performance()

In [None]:
from notebook import plot_rgg_dist_performance
plot_rgg_dist_performance(agg='mean')
plot_rgg_dist_performance(agg='min')

(TODO): add the default echo state network.

The inv^2 distance function seems to work the best. There is a diminishing  
return from a squared to a cubed distance function.  

Add some stuff about distribution of weights, and original spectral radius of  
the matrix requiring huge scalings.  

Perhaps also memory capacity and/or QGU.  

In [None]:
## 

# Experiments: Regular Tilings

## Performance of standard regular tilings

Regular tilings/Bravais lattices. Lattices that were mostly used are the square,  
hexagonal and triangular lattices/tilings.  

In [None]:
from notebook import plot_regular_tilings
plot_regular_tilings()

Default performance of such lattices, with a standard uniform input distribution  
in the interval [-0.5, 0.5], i.e. mostly the same as echo state networks.  

Note that these networks are all *undirected*, and have *no negative weights*.  

In [None]:
%%script false --no-raise-error

from notebook import regular_tilings_performance
regular_tilings_performance()

In [None]:
from notebook import plot_regular_tilings_performance
plot_regular_tilings_performance()

The performances shown are surprisingly good, if we consider the cutoff for  
being "unable to predict the time series" at an NRMSE of 1.0.  

An example for the predicted output of a square lattice vs. the expected output  
is shown below for the NARMA 10 dataset.  

In [None]:
esn = ESN(hidden_nodes=81, w_res_type='tetragonal')
evaluate_esn(ds.dataset, esn, plot=True, plot_range=[0, 100])

## Regular tilings with inhibitory connections

Hardly interesting.  

## Regular tilings with directed connections

What happens if we make a fraction of the edges of the lattice directed?  

In [None]:
esn = ESN(hidden_nodes=25, w_res_type='tetragonal', dir_frac=0.5)
plot_lattice(esn.G.reverse(), color_directed=True)

Performance-wise, we should, according to previous work, expect at least some  
improvement.  

In [None]:
from notebook import plot_directed_regular_tilings_performance
plot_directed_regular_tilings_performance()

It would seem that, the performance of the lattices match that of the standard  
echo state network for the NARMA 10 task. What about memory capacity and QGU?  

## Physical perspective: global input scheme

The input scheme of the previous experiment was still achieved by scaling the  
input of each hidden node with a value in the interval [-0.5, 0.5], as in all  
previous experiments. Is there some simpler scheme that works?  

In [None]:
from notebook import global_input_scheme_performance
global_input_scheme_performance()

Every input weight of every node is the same, but has been scaled as to fit the  
memory requirements of the task better.  

Additionally, every reservoir weight (i.e. the spacing between nodes in the  
lattice) is uniquely determined. This has been scaled to a specific reservoir  
weight that fits, but as the spacing is fixed in all cases, this amounts to a  
simple scalar scaling in all cases.  

Thus, what remains in terms of parameterization for the reservoir? One single  
parameter: determining which direction each edge should point. With a completely  
random scheme, the mean reservoir performs equally as well to that of the echo  
state network.  

Note that this is a quite similar approach to that of the minimal complexity  
echo state network, i.e. cyclic reservoirs with regular jumps. In CRJs, all  
reservoir weights are fixed to the same, predetermined value. However, with  
CRJs, the input scheme fixes all input values to 1, but includes a random  
distribution scheme of the signedness of these inputs, thus not employing a  
single global input.  

What about the activations of a lattice compared to that of standard echo state  
networks?  

In [None]:
from notebook import plot_global_input_activations
plot_global_input_activations()

We can clearly see the input sequence in the activations of the nodes of the  
lattice network, as all nodes see exactly the same input.  

## Reservoir robustness: removing nodes gradually

Usually one thinks of robustness in terms of two different scenarios: dead nodes,  
and noisy nodes. Here we look at the impact of dead nodes, i.e. what happens  
when single nodes disappear completely from the network.  

First: we will remove nodes to perform well specifically for the NARMA 10  
benchmark. The chosen nodes may (and should) prove different if we optimize for  
e.g. memory capacity or kernel quality.  

Thus, which nodes cause the biggest performance hits from removal?  

In [None]:
from notebook import node_removal_impact
node_removal_impact()

We see that the majority of the nodes cause a very small difference in  
NRMSE. **I also suspect that the node with a big value is due to a bug I had  
with the Torch implementation of SVD (pinverse), but I have yet to re-run the  
experiment with the sklearn implementation of the same algorithm, which causes  
no such instabilities.**  

Since the network proves quite resilient to removal of nodes, let's remove them  
all. One by one, and greedily.  

In [None]:
from notebook import remove_nodes_performance
remove_nodes_performance()

This is, of course, an unfair comparison. As we are comparing a specific  
instance of a lattice that we are incrementally removing single nodes to cause  
as little damage as possible.  

However, we still see that the lattice is robust, one may remove quite a lot of  
nodes before the performance collapses completely. Additionally, removing the  
worst nodes initially will actually improve the network for the specific task.  

To make a more fair comparison, I also did the same with the standard echo state  
network, as to determine whether the method is only applicable to the lattice  
(which it was not, it turns out to be a reasonable approach to the ESN as well).  

In [None]:
from notebook import remove_esn_nodes_performance
remove_esn_nodes_performance()

Now, how do these lattices look?  

In [None]:
from notebook import plot_node_removal
plot_node_removal()

How does the ESN look?  

In [None]:
from notebook import plot_esn_node_removal
plot_esn_node_removal()

Key takeaways:

- It is quite difficult to see what has happened with the ESN, even if the  
  density of the reservoir matrix is originally set to only 10%.  
- OTOH, it seems to be quite possible to look at the progress of node removals  
  in the lattice, and be able to get a clearer picture of what is happening.  
- There seems to develop a "main" pathway in the reservoir, likely for memory  
  purposes, that is "augmented" from outside connections.  
- Quite interestingly, when the NRMSE drops to 0.4, which is the theoretical  
  value achieved by a shift register, the network has seemingly turn into just  
  that.  

This also works similarly with periodic lattices, but they looked to be  
*slightly*, perhaps simply due to the fact that are more randomly directed edges  
that may turn out valuable.  

## Growing reservoirs: adding nodes incrementally

Interestingly, there is only a handful amount of ways to add new nodes to the  
network: only along the frontier of the lattice, and for each such node, there  
is only a set amount of ways to direct edges.  

We first create the default 144 node lattice, remove nodes to its minimal value,  
and then grow to a set size.  

In [None]:
from notebook import plot_growth
plot_growth()

Increasing the size seems to eventually run into some diminishing return, which  
seems to gradually stop around 250 nodes or so from other experiments.  

It could be interesting to hunt heuristics here: *could we add nodes in a less  
greedy manner, yet deterministic by some heuristic?*. I have not looked too far  
into this, as I've not found it too easy to look at the graph growth and see  
what, concretely, is happening.  

A viable approach could be to start with a shift register and/or appropriate  
loops, and from there grow a suitable lattice.  

## Gradually making edges undirected

In [None]:
from notebook import plot_making_edges_undirected_performance
plot_making_edges_undirected_performance()

We see something similar to the previous experiments. Some edges actually  
improve performance with bidirection. Generally, though, it seems as if the  
edges that are made bidirectional are edges that perhaps "did not matter much"  
originally, while the more important, directed, are still intact. Once these  
important edges must disappear, so must the performance.  

In [None]:
from notebook import plot_making_edges_undirected
plot_making_edges_undirected()

## Example of great performance

An example of the performance achieved with the best network from growing until  
some set amount of nodes.  

In [None]:
from notebook import plot_good_performance
plot_good_performance()

## Short-term memory, kernel quality, Mackey-Glass and other metrics

Perhaps less relevant. A bunch of general metrics and other tests, mostly  
contained within the NARMA 10 benchmark.  