# Tests of GD Convergence Depth

This document is a test of the hidden-layer depth threshold for different monomial activation functions. Specifically, SGL and Manelli found that 
$$ k \geq 2d $$
is sufficient for ensuring GD converges to 0 and has no spurious local minima. We seek to find the relationship between $k$ and $d$ for other monomials

Here we will use a single-layer monomial-activation neural network with output weights all set to be 1. 

In [1]:
# inport nn files
from experiment import *
from monomial_neural_network import *

In [None]:
## Create a function that will make data and train a neural network using a given number of data points and epochs
def test_training(n, k, M):
    # n is the number of data points
    # k is the hidden layer depth
    # M is the number of epochs

    d = 3 # just fix the dimension of the data for now
    teacher_k = [k] # single layer
    teacher_model = generate_teacher_model_noOutWeight(d, teacher_k) # use unit weights for these calculations
    print(teacher_model)

    # generate data
    data = generate_data(n, d, teacher_model)

    # create student
    student_k = [k] # student model hidden layer sizes - 2 layers with increasing number of neurons
    student_model = generate_student_model_noOutWeight(d, student_k)

    # train the student
    student_model, losses = train(
        model = student_model, 
        x_train = data[0], 
        y_train= data[1], 
        num_epochs = M, 
        lr = 1e-3
        )
    
    print(student_model.layers[0].weight)
    print(teacher_model.layers[0].weight)
    # return the final loss
    return losses[-1]

In [None]:
# test out the function
l = test_training(n=10, k=4, M=10000)
print(l)

MonomialNeuralNetwork_noOutputWeight(
  (layers): Sequential(
    (0): Linear(in_features=2, out_features=2, bias=False)
    (1): Monomial()
    (2): Linear(in_features=2, out_features=1, bias=False)
  )
)
starting training
Epoch [0/10000], Loss: 17.46714
Epoch [100/10000], Loss: 16.61201
Epoch [200/10000], Loss: 15.54452
Epoch [300/10000], Loss: 14.28187
Epoch [400/10000], Loss: 12.85036
Epoch [500/10000], Loss: 11.28392
Epoch [600/10000], Loss: 9.62347
Epoch [700/10000], Loss: 7.91661
Epoch [800/10000], Loss: 6.21747
Epoch [900/10000], Loss: 4.58663
Epoch [1000/10000], Loss: 3.09107
Epoch [1100/10000], Loss: 1.80410
Epoch [1200/10000], Loss: 0.80525
Epoch [1300/10000], Loss: 0.17995
Epoch [1400/10000], Loss: 0.00503
Epoch [1500/10000], Loss: 0.00160
Epoch [1600/10000], Loss: 0.00078
Epoch [1700/10000], Loss: 0.00047
Epoch [1800/10000], Loss: 0.00031
Epoch [1900/10000], Loss: 0.00021
Epoch [2000/10000], Loss: 0.00014
Epoch [2100/10000], Loss: 0.00010
Epoch [2200/10000], Loss: 0.00006
