Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shceduler] #43

Open
2 tasks
spurs30 opened this issue Mar 12, 2024 · 3 comments
Open
2 tasks

[shceduler] #43

spurs30 opened this issue Mar 12, 2024 · 3 comments

Comments

@spurs30
Copy link

spurs30 commented Mar 12, 2024

Model/Dataset/Scheduler description

Your paper is very well-written, and I appreciate you open-sourcing the code. Your experiments utilized 8 GPUs, and the norm_cfg during training was set to SyncBN. In my experimental setup, I don't have as many GPUs. When I used two GPUs and reduced the learning rate accordingly (1/4 of the original), the mAP of the model trained with the backbone you provided was approximately 1% lower than that reported in the paper. Is the number of GPUs the main reason for this issue? Additionally, although there was some improvement in mAP when I increased the learning rate, it still didn't reach the level reported in the paper. I noticed in your log files that the training set was labeled as trainval_2 instead of the usual trainval. Did you add any operations during the data processing phase?

Open source status

  • The model implementation is available
  • The model weights are available.

Provide useful links for the implementation

No response

@zcablii
Copy link
Owner

zcablii commented Mar 12, 2024

Thank you for your kind words and for engaging with our work!

Regarding your query, the "trainval_2" label refers to a standard training-validation split, with no additional or undisclosed operations applied beyond what was described in the paper.

You've raised an important point about the challenges of replicating performance across different setups, which is indeed a common issue in real-world applications. The adjustment in the number of GPUs directly impacts the batch size, which we believe is the principal reason for the discrepancy in performance you're experiencing. Although modifying the learning rate is a standard approach to counterbalance the effects of a reduced batch size, pinpointing the perfect equilibrium to replicate the originally reported performance can be complex. Unfortunately, I think these adjustments may not always fully compensate for the disadvantages introduced by a smaller batch size.

@spurs30
Copy link
Author

spurs30 commented Mar 12, 2024

Thank you for your kind words and for engaging with our work!

Regarding your query, the "trainval_2" label refers to a standard training-validation split, with no additional or undisclosed operations applied beyond what was described in the paper.

You've raised an important point about the challenges of replicating performance across different setups, which is indeed a common issue in real-world applications. The adjustment in the number of GPUs directly impacts the batch size, which we believe is the principal reason for the discrepancy in performance you're experiencing. Although modifying the learning rate is a standard approach to counterbalance the effects of a reduced batch size, pinpointing the perfect equilibrium to replicate the originally reported performance can be complex. Unfortunately, I think these adjustments may not always fully compensate for the disadvantages introduced by a smaller batch size.

You make a valid point. Simply adjusting the learning rate may not completely solve the issues caused by a small batch size. Therefore, during training, I switched from SyncBN to GroupNorm (GN). However, since the pretraining on ImageNet was done with a batch size of 128 and BN, using GN during training on the DOTA dataset is akin to starting from scratch, and the final performance is even worse than with SyncBN.

Moreover, even using GroupNorm during pretraining on ImageNet would lower the accuracy of the pretrained model. I haven't found a good solution to mitigate the problems caused by the small batch size. Could you provide some suggestions?

@zcablii
Copy link
Owner

zcablii commented Mar 13, 2024

I'm currently not sure how to resolve it, but I'll leave this issue open in hopes that someone with the right expertise might offer a potential solution. If anyone knows how to address this, your suggestion would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants