(1)Different early stopping criteria
(2)Different cost functions: a.Quadratic; b.Cross-entropy; c.Log-likelihood;
(3)SGD or Momentum
(4)No regularization or L2 Rgularization
(5)Different transfer(activation) functions: a.tanh; b.softmax; c.ReLU;
2. The model uses better initial weights and minibatch shuffling, it will return learned network and accuracy/costs for your train set ,validation set and test set.
The parameters in the function are :
(1) inputs: a matrix with a column for each example, and a row for each input feature.
(2) targets: a matrix with a column for each example, and a row for each output feature.
(3) split: how the data will be divided into training set, test set and validation set.
(4) nodeLayers: a vector with the number of nodes in each layer (including the input and output layers). Important: Your code should not assume that there are just three layers of nodes. It should work with a network of any size.
(5) numEpochs: (scalar) desired number of epochs to run.
(6) batchSize: (scalar) number of instances in a mini-batch.
(7) eta: (scalar) learning rate.
(8) costFunOption: 0 stands for quadradic, 1 stands for cross entropy , 2 stands for log likelihood.
(9) actFunOption: 0 stands for sigmoid, 1 stands for tanh, 2 stands for softmax, 3 stands for ReLU.
(10) momentum: the value of momentum.
(11) lambda: the value of lambda.
(Early Stopping: In my code, I set the training to terminate early before all epochs have run when the difference between the accuracy score of training set and validation set is larger or equal to 0.15. The reason is that if the difference is larger or equal than 0.15, it means that the model is overfitting and result will not be well generalized.)