Can not reproduce the effect #53

liuheng0111 · 2021-06-23T08:54:00Z

I train model of CenterNet2_R50_1x , use v100 8gpus, but the best AP is 40.26, lower of 42.9; Can you give me some suggestions ?
I use the floowing configs:

DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.02
STEPS: (60000, 80000)
MAX_ITER: 90000
CHECKPOINT_PERIOD: 1000000000
WARMUP_ITERS: 4000
WARMUP_FACTOR: 0.00025
CLIP_GRADIENTS:
ENABLED: True
INPUT:
MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)

xingyizhou · 2021-06-24T21:40:03Z

Hi,
Thank you for your interest. Can you show the full config including the _BASE_? If you are using the exact CenterNet2_R50_1x, you don't need to copy these configs as they are contained in the base file. If your _BASE_ is already Base-CenterNet2.yaml, can you also provide the command of your training?

Best,
Xingyi

liuheng0111 · 2021-06-25T02:43:36Z

Yes， my BASE is already Base-CenterNet2.yaml. Trainning command is:
python train_net.py --num-gpus 8 --config-file configs/CenterNet2_R50_1x.yaml

xingyizhou · 2021-06-25T20:50:28Z

Hi,
I have shared my training and evaluation log for the R50-1x model here. Please have a check if there is anything clearly mismatched. If it doesn't work, I am happy to check your training log if you can upload it.

liuheng0111 · 2021-06-27T07:43:04Z

My traing log is very different from yours.
My loss like: total_loss: 1.169 loss_box_reg_stage0: 0.1548 loss_box_reg_stage1: 0.2466 loss_box_reg_stage2: 0.2614 loss_centernet_agn_neg: 0.02222 loss_centernet_agn_pos: 0.05661 loss_centernet_loc: 0.1677 loss_cls_stage0: 0.09623 loss_cls_stage1: 0.08068
But you is: stage0/fast_rcnn/cls_accuracy: 0.944 stage0/fast_rcnn/fg_cls_accuracy: 0.755 stage1/fast_rcnn/cls_accuracy: 0.949 stage1/fast_rcnn/fg_cls_accuracy: 0.780 stage2/fast_rcnn/cls_accuracy: 0.955 stage2/fast_rcnn/fg_cls_accuracy: 0.773 total_loss: 1.401 loss_box_reg_stage0: 0.177 loss_box_reg_stage1: 0.260 loss_box_reg_stage2: 0.262 loss_centernet_agn_neg: 0.033 loss_centernet_agn_pos: 0.087 loss_centernet_loc: 0.198 loss_cls_stage0: 0.144 loss_cls_stage1: 0.126 loss_cls_stage2: 0.115 time: 0.5513 data_time: 0.0178 lr: 0.000200 max_mem: 5000M.

Is it because a different version of detectron2 is used?

you model output config has:
EFFICIENTNET:
BASE_NAME: efficientnet_b3
NORM: FrozenBN
OUT_LEVELS: (3, 4, 5)
why EFFICIENTNET in the config? Your Base-CenterNet2.yaml is this one ?

xingyizhou · 2021-06-29T18:13:13Z

Hi,
Sorry for the delayed response. It seems I do not have access to your log, can you share it? For the difference in the log format, yes I used my own detectron2 version which I printed more statistics during training (simply modifying this line). EfficientNet is in my own version and is not used in this project. These should not affect the functionality of the codebase. The referred Base-CenterNet2.yaml is correct. I'll have a better idea when I see your log.

bywang2018 · 2021-06-30T04:38:59Z

Hello, where can I download this dataset（coco_un_yolov4_55_0.5）?
Thanks！ @xingyizhou

liuheng0111 · 2021-06-30T06:32:36Z

My traing log.

Another Question: I want to train detect with my self dataset, some boxes have not category, some boxes have category. I use two ways:
1: add another category;
2: didnot add another category, set -999 as the category which box has no category;
box has no category doesn't Calculation the cls-loss. I trained on coco, mask half bbox with no category, validation dataset bbox all have category. Two ways both has 30 MAP, but the loss is lagger when the AP goes on;
total_loss: 10.1 loss_box_reg_stage0: 2.176 loss_box_reg_stage1: 2.417 loss_box_reg_stage2: 4.492 loss_centernet_agn_neg: 0.03299 loss_centernet_agn_pos: 0.1173 loss_centernet_loc: 0.2441 loss_cls_stage0: 0.1453 loss_cls_stage1: 0.1776 loss_cls_stage2: 0.2449
The box regresion loss is very big，but the MAP on validate dataset get better. Is there any suggestions ?
The way 2 trainning log on coco;

xingyizhou · 2021-06-30T15:46:54Z

The log shows you are using a batchsize of 96. Can you use the original batch size (16) or modify the total iterations and learning rate accordingly? Please do specify any changes you made in the code when reporting reproducibility issues.

xingyizhou · 2021-06-30T15:48:09Z

@WangBoying you can download it here from the model zoo.

bywang2018 · 2021-07-01T00:39:24Z

This is very important to me! Thank you very much!

@xingyizhou

liuheng0111 · 2021-07-01T07:53:00Z

@xingyizhou The original code and config trained log

LeonLab · 2021-10-18T09:11:08Z

@liuheng0111 Did you found the solution?I have a similar problem with you

liuheng0111 added the documentation Improvements or additions to documentation label Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reproduce the effect #53

Can not reproduce the effect #53

liuheng0111 commented Jun 23, 2021

xingyizhou commented Jun 24, 2021

liuheng0111 commented Jun 25, 2021

xingyizhou commented Jun 25, 2021

liuheng0111 commented Jun 27, 2021 •

edited

Loading

xingyizhou commented Jun 29, 2021

bywang2018 commented Jun 30, 2021

liuheng0111 commented Jun 30, 2021

xingyizhou commented Jun 30, 2021

xingyizhou commented Jun 30, 2021

bywang2018 commented Jul 1, 2021

liuheng0111 commented Jul 1, 2021

LeonLab commented Oct 18, 2021

Can not reproduce the effect #53

Can not reproduce the effect #53

Comments

liuheng0111 commented Jun 23, 2021

xingyizhou commented Jun 24, 2021

liuheng0111 commented Jun 25, 2021

xingyizhou commented Jun 25, 2021

liuheng0111 commented Jun 27, 2021 • edited Loading

xingyizhou commented Jun 29, 2021

bywang2018 commented Jun 30, 2021

liuheng0111 commented Jun 30, 2021

xingyizhou commented Jun 30, 2021

xingyizhou commented Jun 30, 2021

bywang2018 commented Jul 1, 2021

liuheng0111 commented Jul 1, 2021

LeonLab commented Oct 18, 2021

liuheng0111 commented Jun 27, 2021 •

edited

Loading