# Tests of GD Convergence Depth

This document is a test of the hidden-layer depth threshold for different monomial activation functions. Specifically, SGL and Manelli found that 
$$ k \geq 2d $$
is sufficient for ensuring GD converges to 0 and has no spurious local minima. We seek to find the relationship between $k$ and $d$ for other monomials

Here we will use a single-layer monomial-activation neural network. In some tests, all the output weights are set to 1 to better match Manelli, but not always.

In [1]:
# inport nn files
from experiment import *
from monomial_neural_network import *

In [35]:
## Create a function that will make data and train a neural network using a given number of data points and epochs
def test_training(n, k, M):
    # n is the number of data points
    # k is the hidden layer depth
    # M is the number of epochs

    d = 3 # just fix the dimension of the data for now
    teacher_k = [k] # single layer
    # teacher_model = generate_teacher_model_noOutWeight(d, teacher_k) # use unit weights for these calculations
    teacher_model = generate_teacher_model(d, teacher_k)
    print(teacher_model)

    # generate data
    data = generate_data(n, d, teacher_model)

    # create student
    student_k = [k] # student model hidden layer sizes - 2 layers with increasing number of neurons
    # student_model = generate_student_model_noOutWeight(d, student_k)
    student_model = generate_student_model(d, k=student_k)

    # train the student
    student_model, losses = train(
        model = student_model, 
        x_train = data[0], 
        y_train= data[1], 
        num_epochs = M, 
        lr = 1e-3
        )
    
    print(student_model.layers[0].weight)
    print(teacher_model.layers[0].weight)
    # return the final loss
    return losses[-1]

In [36]:
# test out the function
l = test_training(n=10, k=4, M=10000)
print(l)

MonomialNeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=3, out_features=4, bias=False)
    (1): Monomial()
    (2): Linear(in_features=4, out_features=1, bias=False)
  )
)
starting training
Epoch [0/10000], Loss: 160.25925
Epoch [100/10000], Loss: 139.55611
Epoch [200/10000], Loss: 122.23717
Epoch [300/10000], Loss: 107.76143
Epoch [400/10000], Loss: 95.65111
Epoch [500/10000], Loss: 85.47652
Epoch [600/10000], Loss: 76.83749
Epoch [700/10000], Loss: 69.34380
Epoch [800/10000], Loss: 62.60438
Epoch [900/10000], Loss: 56.24401
Epoch [1000/10000], Loss: 49.95488
Epoch [1100/10000], Loss: 43.55092
Epoch [1200/10000], Loss: 36.99002
Epoch [1300/10000], Loss: 30.36780
Epoch [1400/10000], Loss: 23.90591
Epoch [1500/10000], Loss: 17.94465
Epoch [1600/10000], Loss: 12.93723
Epoch [1700/10000], Loss: 9.41938
Epoch [1800/10000], Loss: 7.69846
Epoch [1900/10000], Loss: 6.76842
Epoch [2000/10000], Loss: 5.98887
Epoch [2100/10000], Loss: 5.31767
Epoch [2200/10000], Loss: 4.72771
