Performance comparison of various weight and bias initializers using MNIST dataset
- hidden layers: Two fully-connected layer (256 nodes, respectively) with ReLU
- output layer: Fully-connected layer (10 nodes = # of class for MNIST)
- Batch normalization is used for hidden layers
- Weight initializer: Normal, Truncate normal, Xavier, He initializer
- Bias initializer: Constant (zero), Normal initializer
Training accuracies of classifying MNIST data are compared.
- Loss values are plotted using Tensorboard in PyTorch.