[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

gbdjxgp · 2024-03-28T06:42:03Z

Branch

master branch https://mmrotate.readthedocs.io/en/latest/

📚 The doc issue

作者您好，我正在基于单个GPU复现您的代码，按照您文档中的说明，采用多尺度训练，仅读取预训练的主干，配置文件使用LSKNet_T，将syncBN改成BN，学习率从原来的0.0002改成0.0002/8，下面附件里是训练过程中的日志文件，从实验结果中看到精度差异较大，并不能达到您日志中的0.852，请问是我超参数设置的有问题吗，期待您的回复。
Hello author, I am replicating your code based on a single GPU. Following the instructions in your document, we are using multi-scale training to only read the pre trained backbone. The configuration file uses LSKNet-T, changing syncBN to BN, and changing the learning rate from 0.0002 to 0.0002/8. Attached is the log file from the training process. From the experimental results, we can see that there is a significant difference in accuracy, which cannot reach the 0.852 in your log. May I ask if there is an issue with my hyperparameter settings? Looking forward to your reply.

Suggest a potential alternative/fix

No response

gbdjxgp · 2024-03-28T06:43:13Z

下面是日志文件
20240327_220346.log

zcablii · 2024-03-28T07:32:22Z

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

gbdjxgp · 2024-03-29T07:23:46Z

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比，
1.公开的训练日志中8个GPU，mAP=0.766
2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log
4.使用1个GPU基础lr/2（0.0001）,mAP=NaN
20240328_223842.log

请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766
Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）
Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）
Use 1 GPU base lr/2 (0.0001), mAP=NaN
[20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ）
May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

zcablii · 2024-03-29T07:43:54Z

Adjusting the learning rate is an effective way, and you can also increase the number of training epochs appropriately. I have never changed the warmup, and I am not sure whether it will have a significant impact on performance.

yyq0828 · 2024-04-09T17:57:36Z

@gbdjxgp 您好，请问一下您对dotav1.0的分割，采用的是作者提供的代码吗？您的分割日志能发一下吗？我想参考一下，谢谢您这是我的日志
20240409_232907.log

gbdjxgp · 2024-04-14T09:07:48Z

@gbdjxgp 您好，请问一下您对dotav1.0的分割，采用的是作者提供的代码吗？您的分割日志能发一下吗？我想参考一下，谢谢您这是我的日志 20240409_232907.log

直接采用文档进行分割就行，我没找到分割文件
分割代码

CrazyBrick · 2024-05-22T02:46:22Z

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比， 1.公开的训练日志中8个GPU，mAP=0.766 2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log

3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2（0.0001）,mAP=NaN 20240328_223842.log

请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？

Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766

Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）

Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）

Use 1 GPU base lr/2 (0.0001), mAP=NaN
[20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ）
May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

gbdjxgp · 2024-05-25T15:23:49Z

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

非常感谢作者您的回复，我按照您的思路稍微调大了一点学习率，由于我们实验设备有限，只在多尺度数据集上训练了前三个epoch，并且在第三个epoch进行验证集精度对比， 1.公开的训练日志中8个GPU，mAP=0.766 2.使用1个GPU基础lr/8（0.000025），mAP=0.73120240327_220346.log
3.使用1个GPU基础lr/4（0.00005），mAP=0.75420240328_182705.log 4.使用1个GPU基础lr/2（0.0001）,mAP=NaN 20240328_223842.log
请问作者，我增大学习率，在第三个epoch精度有一定的上涨，那么增大学习率这个方向是对的吗，如果为了尽量复现原论文中的精度指标，我又需要做什么样的调整呢？我看到您日志文件中，预热为500个iter，这是否意味着我使用1个GPU需要预热4000个iteration呢？
Thank you very much for your reply, the author. I have slightly increased the learning rate according to your idea. Due to limited experimental equipment, we only trained the first three epochs on a multi-scale dataset, and compared the accuracy of the validation set on the third epochs,

[Public training logs]（ https://download.openmmlab.com/mmrotate/v1.0/lsknet/lsk_t_fpn_1x_dota_le90/lsk_t_fpn_1x_dota_le90_20230206.log ）8 GPUs in the middle, mAP=0.766

Use 1 GPU base lr/8 (0.000025), mAP=0.731 [20240327_220346. log]（ https://github.com/zcablii/LSKNet/files/14785012/20240327_220346.log ）

Use 1 GPU base lr/4 (0.00005), mAP=0.754 [20240328_182705. log]（ https://github.com/zcablii/LSKNet/files/14800878/20240328_182705.log ）

Use 1 GPU base lr/2 (0.0001), mAP=NaN
[20240328_223842. log]（ https://github.com/zcablii/LSKNet/files/14800896/20240328_223842.log ）
May I ask the author, if I increase the learning rate and there is a certain increase in accuracy in the third epoch, is increasing the learning rate the right direction? If I want to reproduce the accuracy indicators in the original paper as much as possible, what kind of adjustments do I need to make? I see in your log file that the preheating is 500 items. Does this mean that I need to preheat 4000 items when using 1 GPU?

@gbdjxgp, Hi, Have you reproduce the work sucessfully?If successful, what adjustments have been made

Hello, I did not conduct further experiments on multi-scale datasets. I only conducted experiments on single-scale datasets. It is recommended to adjust the learning rate of a single GPU to 0.00005, and the result of a single-scale dataset may be around 0.755

zcablii closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

gbdjxgp commented Mar 28, 2024

gbdjxgp commented Mar 28, 2024

zcablii commented Mar 28, 2024

gbdjxgp commented Mar 29, 2024

zcablii commented Mar 29, 2024

yyq0828 commented Apr 9, 2024

gbdjxgp commented Apr 14, 2024

CrazyBrick commented May 22, 2024

gbdjxgp commented May 25, 2024

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

[Docs]Some questions about the code reproduction results of LSKNet_T in DOTA-1.0 dataset #51

Comments

gbdjxgp commented Mar 28, 2024

Branch

📚 The doc issue

Suggest a potential alternative/fix

gbdjxgp commented Mar 28, 2024

zcablii commented Mar 28, 2024

gbdjxgp commented Mar 29, 2024

zcablii commented Mar 29, 2024

yyq0828 commented Apr 9, 2024

gbdjxgp commented Apr 14, 2024

CrazyBrick commented May 22, 2024

gbdjxgp commented May 25, 2024