Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

Closed
gbdjxgp opened this issue Mar 28, 2024 · 8 comments

Comments

@gbdjxgp
Copy link

gbdjxgp commented Mar 28, 2024

Branch

master branch https://mmrotate.readthedocs.io/en/latest/

📚 The doc issue

作者您好,我正在基于单个GPU复现您的代码,按照您文档中的说明,采用多尺度训练,仅读取预训练的主干,配置文件使用LSKNet_T,将syncBN改成BN,学习率从原来的0.0002改成0.0002/8,下面附件里是训练过程中的日志文件,从实验结果中看到精度差异较大,并不能达到您日志中的0.852,请问是我超参数设置的有问题吗,期待您的回复。
Hello author, I am replicating your code based on a single GPU. Following the instructions in your document, we are using multi-scale training to only read the pre trained backbone. The configuration file uses LSKNet-T, changing syncBN to BN, and changing the learning rate from 0.0002 to 0.0002/8. Attached is the log file from the training process. From the experimental results, we can see that there is a significant difference in accuracy, which cannot reach the 0.852 in your log. May I ask if there is an issue with my hyperparameter settings? Looking forward to your reply.

Suggest a potential alternative/fix

No response

@gbdjxgp
Copy link
Author

gbdjxgp commented Mar 28, 2024

下面是日志文件
20240327_220346.log

@zcablii
Copy link
Owner

zcablii commented Mar 28, 2024

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

@gbdjxgp
Copy link
Author

gbdjxgp commented Mar 29, 2024

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比,
1.公开的训练日志中8个GPU,mAP=0.766
2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log
4.使用1个GPU基础lr/2(0.0001),mAP=NaN
20240328_223842.log

请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN
    [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log
    May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@zcablii
Copy link
Owner

zcablii commented Mar 29, 2024

Adjusting the learning rate is an effective way, and you can also increase the number of training epochs appropriately. I have never changed the warmup, and I am not sure whether it will have a significant impact on performance.

@zcablii zcablii closed this as completed Apr 2, 2024
@yyq0828
Copy link

yyq0828 commented Apr 9, 2024

@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志
20240409_232907.log

@gbdjxgp
Copy link
Author

gbdjxgp commented Apr 14, 2024

@gbdjxgp 您好,请问一下您对dotav1.0的分割,采用的是作者提供的代码吗?您的分割日志能发一下吗?我想参考一下,谢谢您这是我的日志 20240409_232907.log

直接采用文档进行分割就行,我没找到分割文件
分割代码

@CrazyBrick
Copy link

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log

请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN
    [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log
    May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

@gbdjxgp
Copy link
Author

gbdjxgp commented May 25, 2024

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复,我按照您的思路稍微调大了一点学习率,由于我们实验设备有限,只在多尺度数据集上训练了前三个epoch,并且在第三个epoch进行验证集精度对比, 1.公开的训练日志中8个GPU,mAP=0.766 2.使用1个GPU基础lr/8(0.000025),mAP=0.73120240327_220346.log
3.使用1个GPU基础lr/4(0.00005),mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2(0.0001),mAP=NaN 20240328_223842.log
请问作者,我增大学习率,在第三个epoch精度有一定的上涨,那么增大学习率这个方向是对的吗,如果为了尽量复现原论文中的精度指标,我又需要做什么样的调整呢?我看到您日志文件中,预热为500个iter,这是否意味着我使用1个GPU需要预热4000个iteration呢?
Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

  1. [Public training logs]( https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log )8 GPUs in the middle, mAP=0.766
  2. Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]( https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log
  3. Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]( https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log
  4. Use 1 GPU base lr/2 (0.0001), mAP=NaN
    [20240328_223842. log]( https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log
    May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

Hello, I did not conduct further experiments on multi-scale datasets. I only conducted experiments on single-scale datasets. It is recommended to adjust the learning rate of a single GPU to 0.00005, and the result of a single-scale dataset may be around 0.755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants