# Constructive Neural Network Tutorial 1
A constructive algorithm can create new nodes and connections in artificial neural networks (ANNs) during training or operation.
Constructive algorithms gained some attention shortly after neural networks with hidden layers were found to increase learning capabilities.
The number of neurons in hidden layers of the network is an important factor in the ANN performance and is usually determined by a trial-and-error cycle of manually defining the network size then training and evaluating it.

Popular neural network libraries such as TensorFlow and PyTorch do not natively have methods for changing the size of the data structure that store connection weights or other network parameters.
This tutorial will start by introducing the fundamental features of a constructive algorithm neural network, then discuss some approaches to increasing the size of a neural network in PyTorch during training.

## Fundamental algorithm components
Constructive algorithms are intended to improve the performance of an artificial neural network by adding neurons or connections.
Designing a constructive algorithm requires a number of important decisions to be made:
 * When to add neurons (or connections)
 * Where to add the neurons (input and output connections)
 * What parameter values to assign new network components
 
The implementation of a constructive algorithm is another important design consideration, but algorithms may be defined theoretically before practical implementation.

Examples of predefined constructive algorithm processes include:
 * Adding neurons at a fixed rate during training
 * Selecting a specific layer to add all neurons
 * Assigning parameters (e.g., weights) with predefined or random values

Constructive algorithms may also make these decisions during operation, typically by considering the ANN performance, for example:
 * Neurons may be constructed if the decrease in training error has slowed or stopped
 * Performance of individual neurons or layers may indicate a location to perform construction
 * Parameters for new neuron may be calculated from network inputs

Dynamic Node Creation (DNC) is an early constructive algorithm developed by Timur Ash in 1989. 
DNC calculates a rate of training error descent and adds a neuron to the hidden layer if the decrease in error slows.

This tutorial will use the rate of validation error descent to trigger construction, create new neurons in a preselected hidden layer of a feedforward network, and provide randomly initialised weights for new neurons.
This tutorial will discuss and demonstrate methods for adding neurons to a network in PyTorch.
The generality of these methods should allow implementations in TensorFlow and other ANN libraries.

Later tutorials will investigate the application of constructive algorithm for deep networks, visual object recognition, transfer learning, and retraining without forgetting.

## Adding neurons in PyTorch
Changing the number of weights in an ANN is complicated by them being stored in Tensors objects that do not include a method for changing the size.
Two options that exist for adding new neurons and connection weights are:
1. Creating a new larger Tensor, copying the previous parameters and adding the new ones.
2. Starting with a Tensor larger than the number of neurons and adding neurons to the unused space.

In [1]:
import csv

iris_data = list()
iris_type = list()
with open('iris.data') as f:
    iris_reader = csv.reader(f)
    for csv_row in iris_reader:
        if len(csv_row) > 0:
            iris_row = []
            for n in csv_row[:-1]:
                iris_row.append(float(n))
            iris_data.append(iris_row)
            if csv_row[-1] == 'Iris-setosa':
                iris_type.append(0)
            elif csv_row[-1] == 'Iris-versicolor':
                iris_type.append(1)
            elif csv_row[-1] == 'Iris-virginica':
                iris_type.append(2)
            else:
                iris_type.append(-1)
        
#print(iris_data)
#print(iris_type)

In [2]:
import numpy as np

def add_np_randn_rows(new, arr):
    return np.append(arr,np.random.randn(new,arr.shape[1]),0)

def add_np_randn_cols(new, arr):
    return np.append(arr,np.random.randn(arr.shape[0],new),1)

#w_ji = np.random.randn(3,3)
#print(w_ji)

#w_ji = add_np_randn_cols(1,w_ji)
#print(w_ji)

In [3]:
def nn_sigmoid(w_in, bias, a_in):
    a_out = 1 / (1 + np.exp(- (w_in @ a_in + bias)))
    return a_out
    
def nn_softmax(w_in, a_in):
    a_out = np.exp( w_in @ (a_in - np.amax(a_in)) )  # reduce likelihood of NaN values 
    a_out = a_out / np.sum(a_out) 
    return a_out
    
def nn_add_neurons(new, j, w_ji, bias_j, w_kj):
    j += new
    w_ji = add_np_randn_rows(new,w_ji)
    bias_j = add_np_randn_rows(new,bias_j)
    w_kj = add_np_randn_cols(new,w_kj)
    return j, w_ji, bias_j, w_kj
    
# create an artificial neural network
input_neurons = 4
hidden_neurons = 2
output_neurons = 3

w_hi = np.random.randn(hidden_neurons, input_neurons)
bias_h = np.random.randn(hidden_neurons)

w_oh = np.random.randn(output_neurons, hidden_neurons)

activations_i = iris_data[0][:]
print(activations_i)

activations_h = nn_sigmoid(w_hi, bias_h, activations_i)
print(activations_h)

activations_o = nn_softmax(w_oh, activations_h)
print(activations_o)


[5.1, 3.5, 1.4, 0.2]
[0.03048774 0.01032446]
[0.32434655 0.33447449 0.34117897]


In [32]:
def error_sq(label, nn_out):
    t = np.zeros(nn_out.shape)
    t[label] = 1.0
    error = 0.5 * np.sum(np.square(t - nn_out))
    return error

error = error_sq(iris_type[0], activations_o)
print(error)

#print(w_oh)
#print(w_oh.transpose())

# Backpropagation
d_output = (activations_o - error) * activations_o * (1 - activations_o)
d_hidden = (w_oh.transpose() @ d_output) * activations_h * (1 - activations_h)
d_input = (w_hi.transpose() @ d_hidden) * activations_i

#print(d_output)
#print(d_hidden)
#print(d_input)

learning_rate = 0.1

#print(d_input.shape)
#print(activations_h[np.newaxis].transpose())

delta_w_hi = - learning_rate * activations_h[np.newaxis].transpose() @ d_input[np.newaxis]
#print(delta_w_hi)
delta_b_h = - learning_rate * d_input
print(delta_b_h)

delta_w_oh = - learning_rate * activations_o[np.newaxis].transpose() @ d_hidden[np.newaxis]
#print(delta_w_oh)



0.3423919304971338
[-1.78211736e-05 -8.35050736e-06  4.35106020e-06 -3.50143548e-08]


In [1]:
# modified from code posted: 
# https://discuss.pytorch.org/t/possible-to-add-initialize-new-nodes-to-hidden-layer-partway-through-training/3809/4  

import torch
import torch.nn as nn
import torch.nn.functional as F

class DynamicNodeCreation(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size):
        super(DynamicNodeCreation, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
          
        # initialize weights
        self.fcs = nn.ModuleList([nn.Linear(self.input_size, self.hidden_size)])
        self.fcs.append(nn.Linear(self.hidden_size, self.output_size))
        
    def forward(self, x):
        # hidden sigmoid layer
        x = F.sigmoid(self.fcs[0](x))
        x = F.log_softmax(self.fcs[1](x))

        
    def add_units(self, n_new):
        # take a copy of the current weights stored in self.fcs
        current = [ix.weight.data for ix in self.fcs]

        # make the new weights in and out of hidden layer you are adding neurons to
        hl_input = torch.zeros([n_new, current[0].shape[1]])
        nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('sigmoid'))
        hl_output = torch.zeros([current[1].shape[0], n_new])
        nn.init.xavier_uniform_(hl_input)

        # concatenate the old weights with the new weights
        new_wi = torch.cat([current[0], hl_input], dim=0)
        new_wo = torch.cat([current[1], hl_output], dim=1)

        # reset weight and grad variables to new size
        self.fcs[0] = nn.Linear(current[0].shape[1], self.hidden_size)
        self.fcs[1] = nn.Linear(self.hidden_size, current[1].shape[0])

        # set the weight data to new values
        self.fcs[0].weight.data = torch.tensor(new_wi, requires_grad=True)
        self.fcs[1].weight.data = torch.tensor(new_wo, requires_grad=True)

net = DynamicNodeCreation(4,1,3)
print(net)

DynamicNodeCreation(
  (fcs): ModuleList(
    (0): Linear(in_features=4, out_features=1, bias=True)
    (1): Linear(in_features=1, out_features=3, bias=True)
  )
)


In [2]:
params = list(net.parameters())
print(params[0].size())

net.add_units(1)
params = list(net.parameters())
print(params[0].size())

torch.Size([1, 4])
torch.Size([2, 4])
