In [None]:
%load_ext autoreload
%autoreload 2

# Setup

In [None]:
%matplotlib inline

# Imports.
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
import pandas as pd
import pickle
from mpl_toolkits.mplot3d import Axes3D

# Important for multiprocessing.
import torch
torch.set_num_threads(1)

# General plotting things.
from plot import get_3d_subplot_axs
from plot import get_figsize, set_figsize
from plot import plot_df_trisurf

default_w, default_h = get_figsize()

# Experiment imports.
from collections import OrderedDict
from gridsearch import experiment, load_experiment

# Dataset.
import dataset as ds

u_train, y_train = ds.NARMA(sample_len = 2000)
u_test, y_test = ds.NARMA(sample_len = 3000)
dataset = [u_train, y_train, u_test, y_test]
ds.dataset = dataset

# Distance functions.
from matrix import euclidean

euc = euclidean
def inv(x, y): return 1/euclidean(x, y)
def inv_squared(x, y): return 1/euclidean(x, y)**2
def inv_cubed(x, y): return 1/euclidean(x, y)**3

# Oftentimes for debugging purposes.
from ESN import ESN, Distribution
from metric import evaluate_esn

# Experiments: Random Geometric Graphs

## Echo State Networks with nodes in metric space

Undirected geometric graphs with nodes randomly sampled uniformly in the  
underlying space [0, 1)^d.  

In [None]:
%%script false --no-raise-error

from notebook import rgg_dist_performance
rgg_dist_performance()

In [None]:
from notebook import plot_rgg_dist_performance
plot_rgg_dist_performance(agg='mean')
plot_rgg_dist_performance(agg='min')

(TODO): add the default echo state network.

The inv^2 distance function seems to work the best. There is a diminishing  
return from a squared to a cubed distance function.  

Add some stuff about distribution of weights, and original spectral radius of  
the matrix requiring huge scalings.  

Perhaps also memory capacity and/or QGU.  

In [None]:
## 

# Experiments: Regular Tilings

## Performance of standard regular tilings

Regular tilings/Bravais lattices. Lattices that were mostly used are the square,  
hexagonal and triangular lattices/tilings.  

In [None]:
from notebook import plot_regular_tilings
plot_regular_tilings()

Default performance of such lattices, with a standard uniform input distribution  
in the interval [-0.5, 0.5], i.e. mostly the same as echo state networks.  

Note that these networks are all *undirected*, and have *no negative weights*.  

In [None]:
%%script false --no-raise-error

from notebook import regular_tilings_performance
regular_tilings_performance()

In [None]:
from notebook import plot_regular_tilings_performance
plot_regular_tilings_performance()

The performances shown are surprisingly good, if we consider the cutoff for  
being "unable to predict the time series" at an NRMSE of 1.0.  

An example for the predicted output of a square lattice vs. the expected output  
is shown below for the NARMA 10 dataset.  

In [None]:
esn = ESN(hidden_nodes=81, w_res_type='tetragonal')
evaluate_esn(ds.dataset, esn, plot=True, plot_range=[0, 100])

## Regular tilings with inhibitory connections

Hardly interesting.

## Regular tilings with directed connections

What happens if we make a fraction of the edges of the lattice directed?

In [None]:
esn = ESN(hidden_nodes=25, w_res_type='tetragonal', dir_frac=0.5)
plot_lattice(esn.G.reverse(), color_directed=True)

Performance-wise, we should, according to previous work, expect at least some  
improvement.  

In [None]:
from notebook import plot_directed_regular_tilings_performance
plot_directed_regular_tilings_performance()

It would seem that, the performance of the lattices match that of the standard  
echo state network for the NARMA 10 task. What about memory capacity and QGU?  

## Physical perspective: global input scheme

The input scheme of the previous experiment was still achieved by scaling the  
input of each hidden node with a value in the interval [-0.5, 0.5], as in all  
previous experiments. Is there some simpler scheme that works?  

In [None]:
from notebook import global_input_scheme_performance
global_input_scheme_performance()

Every input weight of every node is the same, but has been scaled as to fit the  
memory requirements of the task better.  

Additionally, every reservoir weight (i.e. the spacing between nodes in the  
lattice) is uniquely determined. This has been scaled to a specific reservoir  
weight that fits, but as the spacing is fixed in all cases, this amounts to a  
simple scalar scaling in all cases.  

Thus, what remains in terms of parameterization for the reservoir? One single  
parameter: determining which direction each edge should point. With a completely  
random scheme, the mean reservoir performs equally as well to that of the echo  
state network.  

Note that this is a quite similar approach to that of the minimal complexity  
echo state network, i.e. cyclic reservoirs with regular jumps. In CRJs, all  
reservoir weights are fixed to the same, predetermined value. However, with  
CRJs, the input scheme fixes all input values to 1, but includes a random  
distribution scheme of the signedness of these inputs, thus not employing a  
single global input.  

What about the activations of a lattice compared to that of standard echo state  
networks?  

In [None]:
from notebook import plot_global_input_activations
plot_global_input_activations()

## Reservoir robustness: removing nodes gradually

Usually one thinks of robustness in terms of two different scenarios: dead nodes  
and noisy nodes. Here we look at the impact of dead nodes, i.e. what happens  
when single nodes disappear completely from the network.  