# Tutorial of using Stochastic Coordinate Descent to train a sign activated neural network

In the `main` directory, there are two scripts for **binary classification with sigmoid activation** and **multi-class classification with softmax**.

#Binary classification
## SCD for MLP01

    python train_cnn01_01.py --nrows 0.75 --localit 1 --updated_fc_features 128 --updated_fc_nodes 1 --width 100 --normalize 0 --fail_count 1 --loss 01loss --act sign --init normal --no_bias 0 --scale 1 --w-inc2 0.17 --version mlp01scale --seed 0 --iters 1000 --dataset cifar10 --n_classes 2 --cnn 0 --divmean 0 --target cifar10_binary_mlp01scale_sign_i1_bce_b200_lrc0.05_lrf0.17_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --updated_fc_ratio 5 --verbose_iter 1
    
Explain:
- 7500/10000 (batch_size)
- feature pool size is 128
- randomly update 1 node in each iteration
- does not normalize the weight
- 01 loss
- weights initialization followed by normal distribution
- all layers have bias
- step-size 0.17
- architecture version is mlp01scale
- random seed is 0
- 2 classes
- --cnn 0 will flatten the vector
- --divmean 0 does not normalize the data
- --target model checkpoints and logs name
- --verbose_iter print acc and loss every iterations.




## ABP for CNN model in binary

For abp training, we have to define model variations for each phase.
srrr -> ssrr -> sssr -> ssss Each phase is a warm start from previous phase

    python train_cnn01_01.py --nrows 0.02 --localit 1 --updated_conv_features 32 --updated_fc_features 128 --updated_fc_nodes 1 --updated_conv_nodes 1 --width 100 --normalize 0 --percentile 1 --fail_count 1 --loss bce --act sign --fc_diversity 1 --init normal --no_bias 2 --scale 1 --w-inc1 0.025 --w-inc2 0.1 --version toy3srr100scale --seed 0 --iters 15000 --dataset cifar10 --n_classes 2 --cnn 1 --divmean 0 --target cifar10_binary_toy3srr100scale_abp_sign_i1_bce_b200_lrc0.025_lrf0.1_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --updated_fc_ratio 10 --updated_conv_ratio 20 --verbose_iter 500 --freeze_layer 0 --lr 0.001 --bp_layer 4 --aug 1
    
    python train_cnn01_01.py --nrows 0.02 --localit 1 --updated_conv_features 32 --updated_fc_features 128 --updated_fc_nodes 1 --updated_conv_nodes 1 --width 100 --normalize 0 --percentile 1 --fail_count 1 --loss bce --act sign --fc_diversity 1 --init normal --no_bias 2 --scale 1 --w-inc1 0.025 --w-inc2 0.05 --version toy3ssr100scale --seed 0 --iters 15000 --dataset cifar10 --n_classes 2 --cnn 1 --divmean 0 --target cifar10_binary_toy3ssr100scale_abp_sign_i1_bce_b200_lrc0.025_lrf0.05_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --updated_fc_ratio 10 --updated_conv_ratio 20 --verbose_iter 500 --freeze_layer 1 --resume --source cifar10_binary_toy3srr100scale_abp_sign_i1_bce_b200_lrc0.025_lrf0.1_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --lr 0.001 --bp_layer 3  --aug 1  --reinit 3
    
    python train_cnn01_01.py --nrows 0.02 --localit 1 --updated_conv_features 32 --updated_fc_features 128 --updated_fc_nodes 1 --updated_conv_nodes 1 --width 100 --normalize 0 --percentile 1 --fail_count 1 --loss bce --act sign --fc_diversity 1 --init normal --no_bias 2 --scale 1 --w-inc1 0.05 --w-inc2 0.05 --version toy3sss100scale --seed 0 --iters 15000 --dataset cifar10 --n_classes 2 --cnn 1 --divmean 0 --target cifar10_binary_toy3sss100scale_abp_sign_i1_bce_b200_lrc0.05_lrf0.05_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --updated_fc_ratio 10 --updated_conv_ratio 20 --verbose_iter 500 --freeze_layer 2 --resume --source cifar10_binary_toy3ssr100scale_abp_sign_i1_bce_b200_lrc0.025_lrf0.05_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0  --lr 0.001 --bp_layer 2  --aug 1  --reinit 2
    
    python train_cnn01_01.py --nrows 0.02 --localit 1 --updated_conv_features 32 --updated_fc_features 128 --updated_fc_nodes 1 --updated_conv_nodes 1 --width 100 --normalize 0 --percentile 1 --fail_count 1 --loss bce --act sign --fc_diversity 1 --init normal --no_bias 2 --scale 1 --w-inc1 0.05 --w-inc2 0.7 --version toy3ssss100scale --seed 0 --iters 15000 --dataset cifar10 --n_classes 2 --cnn 1 --divmean 0 --target cifar10_binary_toy3ssss100scale_abp_sign_i1_bce_b200_lrc0.05_lrf0.7_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --updated_fc_ratio 10 --updated_conv_ratio 20 --verbose_iter 500 --freeze_layer 3 --resume --source cifar10_binary_toy3sss100scale_abp_sign_i1_bce_b200_lrc0.05_lrf0.05_nb2_nw0_dm0_upc1_upf1_ucf32_normal_0 --lr 0.001 --bp_layer 1  --aug 1  --reinit 1
    
#### Notes
- 200/10000 (batch size) Currently, 200 is a optimal batch size for ABP
- --aug 1 for data augmentation, this is necessary.
- For cnn model, set --cnn 1
- freeze the layer trained by scd in the next phase, For example, in the first phase, no layer will be frozen, the first layer trained by scd. In the sencond phase, the first layer will be frozen and would not been trained any more, scd works on the second layer, but start from the third layer, there are 3 layers( the third, fourth, fifth) layer's weights will be re-initialized, so set --reinit 3. And they will be trained by bp, so set --bp_layer 3.