Strange reproduced results of Swin transformer #6

xiangyu8 · 2022-01-20T00:19:23Z

Hi authors,
I have reproduced all results based on your codes. Most of them are consistent with the reported results, except the swin transformer. Below are some results (with reported results followed in brackets):
Trained with 8 gpus (a100):
Cifar10: 75.00 (59.47), CIFAR100: 52.26 (53.28), SVHN: 38.10 (71.60)
Trained with 4 gpus:
CIFAR10: 81.91 (59.47), CIFAR100: 62.30 (53.28), SVHN: 91.29 (71.60)
It seems that the batch size affect swin a lot from results above. All reproduced results are comparable with vit. (e.g. ViT on CIFAR10 with 8 gpus: 77.00 (71.70)). Do you have any idea on the reason?

yhlleo · 2022-01-20T05:00:33Z

@xiangyu8 ，it's interesting. I'm sorry that I do not have the A100 GPUs to reproduce the experiments. Did you try with V100 GPUs? As you found, the batch size (potentially, the version of PyTorch and the GPU device) can play a great role in the final results.

xiangyu8 · 2022-01-20T06:01:19Z

@yhlleo I also tried V100 GPUs and got similar results. (Results followed were based on 4 gpus) For CIFAR100, I got 61.91 for V100 (62.3 for A100). For CIFAR10, the experiments was interrupted at epoch 92, but it got 81.4 for V100, compared with 81.65 for A100 at epoch 92.
Another observation is that the number of GPUs (final batch size) matters a lot only for swin, from 75 (4 gpus) to 81.91 (8 gpus). However, this gap is pretty small for other backbones, including resnet50, CvT, T2T and ViT, less than 2 percent.
My Pytorch is 1.7.1, torchvision 0.8.2, timm 0.4.12 on V100. I know this might affect, but not 15.53.
Is that convenient for me to get the log file for swin on cifar10? I want to compare to see if I can get the reason... I appreciate it if you have more thoughts on this.

yhlleo · 2022-01-20T07:40:35Z

@xiangyu8, I will run again Swin on CIRAR10 and share with you the log file later. If we have the similar results, it would indicate Swin is sensitive to both the number of GPUs and the type of GPU device.

yhlleo · 2022-01-21T03:22:53Z

@xiangyu8 , I uploaded the log files for 4 and 8 GPUs here. The results are similar as yours:

8 V100 GPUs: 56.86
4 V100 GPUs: 81.37

In the official codes, some hyper-parameters are related to the number of GPUs. Therefore, these hyper-parameters in the two settings are:

	4 GPUs	8 GPUs
`BASE_LR`	0.0005	0.001
`WARMUP_LR`	5.0e-07	1.0e-05
`MIN_LR`	5.0e-06	1.0e-06

Obviously, the configuration with 4 GPUs is better.

xiangyu8 · 2022-01-21T16:34:16Z

Thank you so much. Results for 4 gpus are consistent now. The only problem left is the results are not stable for 8 gpus. Maybe you are right, it's because swin is sensitive to devices.

yhlleo · 2022-01-21T17:11:37Z

Yes, thanks for your interesting observation. Now, I close this issue first. If you find some new insights, please don't hesitate to share with me.

yhlleo closed this as completed Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange reproduced results of Swin transformer #6

Strange reproduced results of Swin transformer #6

xiangyu8 commented Jan 20, 2022

yhlleo commented Jan 20, 2022

xiangyu8 commented Jan 20, 2022

yhlleo commented Jan 20, 2022

yhlleo commented Jan 21, 2022

xiangyu8 commented Jan 21, 2022

yhlleo commented Jan 21, 2022

Strange reproduced results of Swin transformer #6

Strange reproduced results of Swin transformer #6

Comments

xiangyu8 commented Jan 20, 2022

yhlleo commented Jan 20, 2022

xiangyu8 commented Jan 20, 2022

yhlleo commented Jan 20, 2022

yhlleo commented Jan 21, 2022

xiangyu8 commented Jan 21, 2022

yhlleo commented Jan 21, 2022