Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batchnormtrack Flag setting for cifar10 #10

Closed
jizongFox opened this issue May 8, 2019 · 11 comments
Closed

Batchnormtrack Flag setting for cifar10 #10

jizongFox opened this issue May 8, 2019 · 11 comments

Comments

@jizongFox
Copy link

jizongFox commented May 8, 2019

Hi @xu-ji,
Thanks for this wonderful work. I am rerunning your code here and noticed that in the commands.txt, the setting with cifar10 is without --batchnorm_track, while most other commands are with this flag. I can understand that freezing the BN can be used in a finetuning setting but here apparently not the case. Can you tell me why the BN has been freezed for this particular setting for training Cifar10 from scratch?
Thanks for your help in advance.

@xu-ji
Copy link
Owner

xu-ji commented May 9, 2019

Setting --batchnorm_track means each batchnorm module in the network has track_running_stats as True. So after calling net.eval(), while performing inference on the evaluation data, the batchnorm statistics are taken from training. If False the batchnorm statistics are taken from the evaluation data on the fly (docs). There's no particular reason to go with one or the other. It makes a negligible difference, e.g. 0.7% for CIFAR10, because training and test sets are the same for most of the datasets in the fully unsupervised setting.

The exception is STL, where the full training set contains a lot of distractor classes in the unlabelled images not present in test. This is why when setting --batchnorm_track for STL it makes sense to set --double_eval too, which makes an additional run through the test data (= training data for main output head) without any training on the IIC loss. This updates the batchnorm stats to the main output head's (= test) data prior to use in the evaluation.

@xu-ji xu-ji closed this as completed May 13, 2019
@jizongFox
Copy link
Author

Hi,
Thanks for the reply.
Here I would like to ask you for another details.

assert (all_imgs.requires_grad and all_imgs_tf.requires_grad)

in this line, you let the input image to be differentiable, while in the cluster_greyscale.py, you haven't. Could you explain why here the gradient is required?

@xu-ji
Copy link
Owner

xu-ji commented Jun 2, 2019

Hi, that's redundant, thanks for pointing it out. I've removed it.

@jizongFox
Copy link
Author

jizongFox commented Jun 14, 2019

Hi @xu-ji
See attached a training summary of mnist using your provided code.
Is that normal?
plots

@xu-ji
Copy link
Owner

xu-ji commented Jun 14, 2019

Yes, it is.

@jizongFox
Copy link
Author

Is that normal to have an average acc of 84%?

@xu-ji
Copy link
Owner

xu-ji commented Jun 18, 2019

Sorry, I skimmed over that second graph. It’s ok but not as good as my reported model. My trained model:

plots

(By the way if you download the models you can see the plots and records.)

As you can see in that graph the average is 98.4. Other MNIST models I've trained have averaged at 96.6, 92.0, 92.5, 95.9.

@jizongFox
Copy link
Author

jizongFox commented Jun 27, 2019

If I understand correctly, you may run several experiments and choose to show the best one.

@xu-ji
Copy link
Owner

xu-ji commented Jun 27, 2019

Yes, I ran a few experiments and show the best model.

@primecai
Copy link

Can we say that as long as the distribution of train and test set stays the same,setting this flag or not should not make too much difference?For example,if we split CIFAR10 into a 7:3 train-test partition,train on the train set and test on the unseen test set,using a batch size of 660 this flag should not affect much of the performance?

@xu-ji
Copy link
Owner

xu-ji commented Apr 13, 2020

If the test batches' statistics are representative of the training batches' statistics (same class distribution, same input distribution, same size) and training is given enough time for batchnorm stats to reflect the latest features, then yes theoretically there should not be a material difference between taking the test time batchnorm statistics from training batches or test batches. In practice, there may be a small difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants