Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange reproduced results of Swin transformer #6

Closed
xiangyu8 opened this issue Jan 20, 2022 · 6 comments
Closed

Strange reproduced results of Swin transformer #6

xiangyu8 opened this issue Jan 20, 2022 · 6 comments

Comments

@xiangyu8
Copy link

Hi authors,
I have reproduced all results based on your codes. Most of them are consistent with the reported results, except the swin transformer. Below are some results (with reported results followed in brackets):
Trained with 8 gpus (a100):
Cifar10: 75.00 (59.47), CIFAR100: 52.26 (53.28), SVHN: 38.10 (71.60)
Trained with 4 gpus:
CIFAR10: 81.91 (59.47), CIFAR100: 62.30 (53.28), SVHN: 91.29 (71.60)
It seems that the batch size affect swin a lot from results above. All reproduced results are comparable with vit. (e.g. ViT on CIFAR10 with 8 gpus: 77.00 (71.70)). Do you have any idea on the reason?

@yhlleo
Copy link
Owner

yhlleo commented Jan 20, 2022

@xiangyu8 ,it's interesting. I'm sorry that I do not have the A100 GPUs to reproduce the experiments. Did you try with V100 GPUs? As you found, the batch size (potentially, the version of PyTorch and the GPU device) can play a great role in the final results.

@xiangyu8
Copy link
Author

@yhlleo I also tried V100 GPUs and got similar results. (Results followed were based on 4 gpus) For CIFAR100, I got 61.91 for V100 (62.3 for A100). For CIFAR10, the experiments was interrupted at epoch 92, but it got 81.4 for V100, compared with 81.65 for A100 at epoch 92.
Another observation is that the number of GPUs (final batch size) matters a lot only for swin, from 75 (4 gpus) to 81.91 (8 gpus). However, this gap is pretty small for other backbones, including resnet50, CvT, T2T and ViT, less than 2 percent.
My Pytorch is 1.7.1, torchvision 0.8.2, timm 0.4.12 on V100. I know this might affect, but not 15.53.
Is that convenient for me to get the log file for swin on cifar10? I want to compare to see if I can get the reason... I appreciate it if you have more thoughts on this.

@yhlleo
Copy link
Owner

yhlleo commented Jan 20, 2022

@xiangyu8, I will run again Swin on CIRAR10 and share with you the log file later. If we have the similar results, it would indicate Swin is sensitive to both the number of GPUs and the type of GPU device.

@yhlleo
Copy link
Owner

yhlleo commented Jan 21, 2022

@xiangyu8 , I uploaded the log files for 4 and 8 GPUs here. The results are similar as yours:

  • 8 V100 GPUs: 56.86
  • 4 V100 GPUs: 81.37

In the official codes, some hyper-parameters are related to the number of GPUs. Therefore, these hyper-parameters in the two settings are:

4 GPUs 8 GPUs
BASE_LR 0.0005 0.001
WARMUP_LR 5.0e-07 1.0e-05
MIN_LR 5.0e-06 1.0e-06

Obviously, the configuration with 4 GPUs is better.

@xiangyu8
Copy link
Author

Thank you so much. Results for 4 gpus are consistent now. The only problem left is the results are not stable for 8 gpus. Maybe you are right, it's because swin is sensitive to devices.

@yhlleo
Copy link
Owner

yhlleo commented Jan 21, 2022

Yes, thanks for your interesting observation. Now, I close this issue first. If you find some new insights, please don't hesitate to share with me.

@yhlleo yhlleo closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants