Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behaviours on VisDA-2017 using different pretrained models from timm #5

Open
swift-n-brutal opened this issue Sep 15, 2022 · 5 comments

Comments

@swift-n-brutal
Copy link

Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.

  1. It seems that the ViT backbone doesn't match with the bottleneck when setting no_pool. The output of ViT backbone is a sequence of tokens instead of a single class token. Thus, it makes the BatchNorm1d layer complains about the dimension.
  2. I fix the previous problem by adding a pool layer to extract the class token:
    pool_layer = lambda _x: _x[:, 0] if args.no_pool else None
    Then use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT:
    python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_results
    Finally I get a slightly lower accuracy as below:
    global correct: 86.0
    mean correct:88.3
    mean IoU: 78.5
    +------------+-------------------+--------------------+
    | class | acc | iou |
    +------------+-------------------+--------------------+
    | aeroplane | 97.83323669433594 | 96.3012924194336 |
    | bicycle | 88.43165588378906 | 81.25331115722656 |
    | bus | 81.79104614257812 | 72.69281768798828 |
    | car | 78.06941986083984 | 67.53160095214844 |
    | horse | 97.31400299072266 | 92.78455352783203 |
    | knife | 96.91566467285156 | 82.31681823730469 |
    | motorcycle | 94.9102783203125 | 83.37374877929688 |
    | person | 81.3499984741211 | 58.12790298461914 |
    | plant | 94.04264831542969 | 89.68553161621094 |
    | skateboard | 95.87899780273438 | 81.48286437988281 |
    | train | 94.05099487304688 | 87.69535064697266 |
    | truck | 59.04830551147461 | 48.311458587646484 |
    +------------+-------------------+--------------------+
    test_acc1 = 86.0
    I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.
@swift-n-brutal
Copy link
Author

After some research, the first problem has been resolved. The default behaviour of ViT.forward() changed in different version of timm. When global_pool='', the backbone returns x[:, 0] in timm=v0.5.x, while it returns x in timm=v0.6.7.

@rangwani-harsh
Copy link
Contributor

Hi @swift-n-brutal are you now able to get the correct accuracy for VisDA dataset?

@swift-n-brutal
Copy link
Author

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one.
val top1 acc curves

@swift-n-brutal swift-n-brutal changed the title ViT on VisDA-2017 Different behaviours on VisDA-2017 using different pretrained models from timm Oct 9, 2022
@Wangzs0228
Copy link

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one. val top1 acc curves

Have you found the reason? Is it a flaw in the model or a problem with our operation?

@swift-n-brutal
Copy link
Author

@Wangzs0228 It is almost sure that the smoothness regularization is beneficial to transferability, robustness, generalization ability, etc. For a specific task, the results may vary. I am not working on this task recently. You can try the experiments to see if the results match your expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants