# Model selection

### CIFAR10 experiments
* In [the vanilla FL paper](https://arxiv.org/abs/1602.05629) the author reported that the CNN model used in for learning CIFAR10 data was grabbed from the TensorFlow tutorial.
* The exact model borrowed from the [current TensorFlow tutorial](www.tensorflow.org/tutorials/images/cnn) (referred to as the *TF model*) is somewhat different from the one described in the vanilla paper as this *TF model* only has $\approx1.2\times10^5$ parameters, which is just one-tenth of the one described in the vanilla paper (has $\approx 10^6$ parameters). 
* Howover, this *TF model* is still approximately two times larger than the one used in [PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) (referred to as the *Torch model*) who has $\approx 0.6\times 10^5$ parameters. 
* The implemenation found in [AshwinRJ's repository](https://github.com/AshwinRJ/Federated-Learning-PyTorch) uses the above *Torch model* for experiments with CIFAR10.

### MNIST experiments
* in [the vanilla FL paper](https://arxiv.org/abs/1602.05629), they describes the MLP (2NN) and CNN used for MNIST experiments as below:

> A simple multilayer-perceptron with 2-hidden layers with 200 units each using ReLu activations (199,210 total parameters), which we refer to as the MNIST 2NN.

> A CNN with two 5x5 convolution layers (the first with 32 channels, the second with 64, each followed with 2x2 max pooling), a fully connected layer with 512 units and ReLu activation, and a final softmax output layer (1,663,370 total parameters).

* The related MLP and CNN implementations shown in AshwinRJ's repository, are different with the above descriptions:
    1. AshwinRJ's MLP has only one hidden layer where as the FL uses two.
    2. AshwinRJ's CNN has much fewer number of channels in convolutional layers and much fewer units in the fully connected layer.


* WY tried to follow the model architecture described in the vanilla paper for the MLP and CNN used for MNIST experiments, but the obtaind model stll has fewer number of parameters.


### Rationale
* Since our goal is to evaluate FL vs baseline method, not to achieve the best possible accuracy on CIFAR10 learning task, therefore the smaller Torch model is sufficient for our needs, considering that training the TF model may require much longer time.

### Usable models for CIFAR10

In [1]:
# change workding directory for the convenience of this notebook
import os
os.chdir('C:/Users/wangyuan/myfl-1/Federated-Learning-PyTorch/src')

In [2]:
import torch
import models
import utils

In [3]:
# create a model instance for Torch model (copied from the original repository)
model_torch_0 = models.CNNCifar()

# create a model instance for Torch model (created by WY)
model_torch_1 = models.CNNCifarTorch()

# create a model instance for Torch model (created by WY)
model_tf = models.CNNCifarTf()

print(f'Torch model (original) \thas: {utils.get_count_params(model_torch_0)}\tparameters.')
print(f'Torch model (WY\'s) \thas: {utils.get_count_params(model_torch_1)}\tparameters.')
print(f'TF model (WY\'s) \thas: {utils.get_count_params(model_tf)}\tparameters.')

Torch model (original) 	has: 62006	parameters.
Torch model (WY's) 	has: 62006	parameters.
TF model (WY's) 	has: 122570	parameters.


#### Equivalence of Torch models
Note that the number of trainable parameters is identical for both Torch models (original) and the one created by WY. Although their model class definition is slightly different, the actual model should be identical.

In [4]:
print('This is the Torch model in the original repository\n', model_torch_0)
print('\nThis is the Torch model created by WY\n', model_torch_1)

This is the Torch model in the original repository
 CNNCifar(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

This is the Torch model created by WY
 CNNCifarTorch(
  (conv_layer): Sequential(
    (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc_layer): Sequential(
    (0): Linear(in_features=400, out_features=120, bias=True)
    (1): ReLU()
    (2): Linear(in_features=120, out_featur

### Usable models for MNIST
#### 2NN comparison

In [10]:
# show the model architecture
print('This is the 2NN model in the original repository:\n', models.MLP(28*28,64,10))
print('\nThis is the 2NN model created by WY:\n', models.TwoNN())

# show the number of trainalbe parameters
print(f'2NN model for MNIST (original) \thas: {utils.get_count_params(models.MLP(28*28,64,10))}\tparameters.')
print(f'2NN model for MNIST (WY\'s) \thas: {utils.get_count_params(models.TwoNN())}\tparameters.')
print('2NN model for MNIST (FL paper) \thas: 199210\tparameters.')

This is the 2NN model in the original repository:
 MLP(
  (layer_input): Linear(in_features=784, out_features=64, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.5, inplace=False)
  (layer_hidden): Linear(in_features=64, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

This is the 2NN model created by WY:
 TwoNN(
  (nn_layer): Sequential(
    (0): Linear(in_features=784, out_features=100, bias=True)
    (1): ReLU()
    (2): Linear(in_features=100, out_features=100, bias=True)
    (3): ReLU()
    (4): Linear(in_features=100, out_features=10, bias=True)
  )
)
2NN model for MNIST (original) 	has: 50890	parameters.
2NN model for MNIST (WY's) 	has: 89610	parameters.
2NN model for MNIST (FL paper) 	has: 199210	parameters.


#### CNN comparison

In [11]:
# show the model architecture
print('This is the CNN model in the original repository:\n', models.CNNMnist())
print('\nThis is the CNN model created by WY:\n', models.CNNMnistWy())

# show the number of trainalbe parameters
print(f'CNN model for MNIST (original) \thas: {utils.get_count_params(models.CNNMnist())}\tparameters.')
print(f'CNN model for MNIST (WY\'s) \thas: {utils.get_count_params(models.CNNMnistWy())}\tparameters.')
print('CNN model for MNIST (FL paper) \thas: 1663370\tparameters.')

This is the CNN model in the original repository:
 CNNMnist(
  (conv1): Conv2d(3, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
)

This is the CNN model created by WY:
 CNNMnistWy(
  (conv_layer): Sequential(
    (0): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc_layer): Sequential(
    (0): Linear(in_features=1024, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=10, bias=True)
  )
)
CNN model for MNIST (original) 	has: 22340	parameters.
CNN model for MNIST 