In [1]:
import torch
import numpy as np
from common.ffn.ffn_relu import ParametricReLUNet

***
##### Theoretical values
***
Consider the FFN with input dimension ${n_0}$, all hidden and the out layers have width ${n}$; the width at l-layer is denoted by ${n^l}$. Activation function for preactivation on l-layer $z^{(l)}$ is $σ(z^{(l)})$, or $σ^{(l)}$. Preactivation weights are initialized with a centered normal distribution with variance $1/n$, except in the first layer, where the variance is $1/{n_0}$; bias is constantly zero. Let's denote the preactivation at layer l for the trainset with points $α_1...α_N∈D$ as $z^{(l)}$. Consider the distribution:

$$P(z^{(l)})_{g,v}=\frac{1}{Z_{g,v}}exp(-\frac{1}{2}\sum \limits _{k=1} ^{n^l}\sum \limits _{α_1,α_2∈D}g^{α_1,α_2}_{(l)}z^{(l)}_{k,α_1}z^{(l)}_{k,α_2})(1+\frac{1}{8}\sum \limits _{k_1,k_2=1} ^{n^l}\sum \limits _{α_1,α_2,α_3,α_4∈D}v^{(α_1,α_2)(α_3,α_4)}_{(l)}z^{(l)}_{k_1,α_1}z^{(l)}_{k_1,α_2}z^{(l)}_{k_2,α_3}z^{(l)}_{k_2,α_4}) (1)$$

Here, $g^{α_1,α_2}_{(l)}$ and $v^{(α_1,α_2)(α_3,α_4)}_{(l)}$ are calculated via $G^{(l)}_{α_1,α_2}$ and $v^{(l)}_{(α_1,α_2)(α_3,α_4)}$:

$$G^{(l+1)}_{α_1,α_2}=<σ^{(l)}_{α_1}σ^{(l)}_{α_2}>_{g^{(l)}}+\frac{1}{8}\sum \limits _{β_1,β_2,β_3,β_4∈D}v^{(β_1,β_2)(β_3,β_4)}_{(l)}(<σ^{l}_{α_1}σ^{(l)}_{α_2}z^{(l)}_{β_1,β_2}(z^{(l)}_{β_3,β_4}+ng^{(l)}_{β_3,β_4})>_{g^{(l)}}+2<σ^{(l)}_{α_1}σ^{(l)}_{α_2}>_{g^{(l)}}g^{(l)}_{β_1,β_3}g^{(l)}_{β_2,β_4}) (2)$$

In this formula $σ(z^{(l)})=σ^{(l)}$; $<⋅>_{g^{(l)}}$ means gaussian integral with covariance matrix $g^{(l)}$ for all greek letters variables mentioned inside <⋅>; $z^{(l)}_{β_1,β_2}=z^{(l)}_{β_1}z^l_{β_2}-g^{(l)}_{β_1,β_2}$

$$v^{(l)}_{(α_1,α_2)(α_3,α_4)}=...$$

In [2]:
'''n0: # dimension of x
    nk: # hidden nodes
    nl: # dimension of y
    l: # number of layers
    nd: # number of points in train-set'''
n0,nk,nl,l=3,100,100,5
nd = 2
'''slope_plus, slope_minus: # slopes for Relu
    experiments_number: # number of experiments'''
slope_plus, slope_minus=1.0, 0.5
experiments_number = 200

testNet = ParametricReLUNet(n0=n0,nk=nk,nl=nl,l=l)
testNet.set_log_level("info")
testNet.set_slopes(slope_plus, slope_minus)
testNet.set_gmetric_recording_indices([(1,1),(1,2),(2,2)])

xx = np.random.normal(size=(n0, nd)).astype(np.float32)
yy = np.zeros((experiments_number, nl, nd))
#weights distribution variances are set as in (5.67)
cb, cw = 0, 2.0/(slope_plus**2.0 + slope_minus**2.0)

G01_records = []
G00_records = []
G11_records = []

#for each experiment re-initialisation of the weights with recalculation
for experiment_number in range(experiments_number):
    testNet.init_weights(cb, cw)
    res = testNet.forward(xx)
    yy[experiment_number] = res
    G00_records.append(testNet.get_gmetric(1,1).copy())
    G11_records.append(testNet.get_gmetric(2,2).copy())
    G01_records.append(testNet.get_gmetric(1,2).copy())
    
    print('-', end='')


FeedForwardNet created with n0=3, nk=100, nl=100, l=5, bias_on=False
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------