New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange reproduced results of Swin transformer #6
Comments
@xiangyu8 ,it's interesting. I'm sorry that I do not have the A100 GPUs to reproduce the experiments. Did you try with V100 GPUs? As you found, the batch size (potentially, the version of PyTorch and the GPU device) can play a great role in the final results. |
@yhlleo I also tried V100 GPUs and got similar results. (Results followed were based on 4 gpus) For CIFAR100, I got 61.91 for V100 (62.3 for A100). For CIFAR10, the experiments was interrupted at epoch 92, but it got 81.4 for V100, compared with 81.65 for A100 at epoch 92. |
@xiangyu8, I will run again Swin on CIRAR10 and share with you the log file later. If we have the similar results, it would indicate Swin is sensitive to both the number of GPUs and the type of GPU device. |
@xiangyu8 , I uploaded the log files for 4 and 8 GPUs here. The results are similar as yours:
In the official codes, some hyper-parameters are related to the number of GPUs. Therefore, these hyper-parameters in the two settings are:
Obviously, the configuration with 4 GPUs is better. |
Thank you so much. Results for 4 gpus are consistent now. The only problem left is the results are not stable for 8 gpus. Maybe you are right, it's because swin is sensitive to devices. |
Yes, thanks for your interesting observation. Now, I close this issue first. If you find some new insights, please don't hesitate to share with me. |
Hi authors,
I have reproduced all results based on your codes. Most of them are consistent with the reported results, except the swin transformer. Below are some results (with reported results followed in brackets):
Trained with 8 gpus (a100):
Cifar10: 75.00 (59.47), CIFAR100: 52.26 (53.28), SVHN: 38.10 (71.60)
Trained with 4 gpus:
CIFAR10: 81.91 (59.47), CIFAR100: 62.30 (53.28), SVHN: 91.29 (71.60)
It seems that the batch size affect swin a lot from results above. All reproduced results are comparable with vit. (e.g. ViT on CIFAR10 with 8 gpus: 77.00 (71.70)). Do you have any idea on the reason?
The text was updated successfully, but these errors were encountered: