Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reproduce the effect #53

Open
liuheng0111 opened this issue Jun 23, 2021 · 12 comments
Open

Can not reproduce the effect #53

liuheng0111 opened this issue Jun 23, 2021 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@liuheng0111
Copy link

I train model of CenterNet2_R50_1x , use v100 8gpus, but the best AP is 40.26, lower of 42.9; Can you give me some suggestions ?
I use the floowing configs:

DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.02
STEPS: (60000, 80000)
MAX_ITER: 90000
CHECKPOINT_PERIOD: 1000000000
WARMUP_ITERS: 4000
WARMUP_FACTOR: 0.00025
CLIP_GRADIENTS:
ENABLED: True
INPUT:
MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)

@liuheng0111 liuheng0111 added the documentation Improvements or additions to documentation label Jun 23, 2021
@xingyizhou
Copy link
Owner

Hi,
Thank you for your interest. Can you show the full config including the _BASE_? If you are using the exact CenterNet2_R50_1x, you don't need to copy these configs as they are contained in the base file. If your _BASE_ is already Base-CenterNet2.yaml, can you also provide the command of your training?

Best,
Xingyi

@liuheng0111
Copy link
Author

Yes, my BASE is already Base-CenterNet2.yaml. Trainning command is:
python train_net.py --num-gpus 8 --config-file configs/CenterNet2_R50_1x.yaml

@xingyizhou
Copy link
Owner

Hi,
I have shared my training and evaluation log for the R50-1x model here. Please have a check if there is anything clearly mismatched. If it doesn't work, I am happy to check your training log if you can upload it.

@liuheng0111
Copy link
Author

liuheng0111 commented Jun 27, 2021

My traing log is very different from yours.
My loss like: total_loss: 1.169 loss_box_reg_stage0: 0.1548 loss_box_reg_stage1: 0.2466 loss_box_reg_stage2: 0.2614 loss_centernet_agn_neg: 0.02222 loss_centernet_agn_pos: 0.05661 loss_centernet_loc: 0.1677 loss_cls_stage0: 0.09623 loss_cls_stage1: 0.08068
But you is: stage0/fast_rcnn/cls_accuracy: 0.944 stage0/fast_rcnn/fg_cls_accuracy: 0.755 stage1/fast_rcnn/cls_accuracy: 0.949 stage1/fast_rcnn/fg_cls_accuracy: 0.780 stage2/fast_rcnn/cls_accuracy: 0.955 stage2/fast_rcnn/fg_cls_accuracy: 0.773 total_loss: 1.401 loss_box_reg_stage0: 0.177 loss_box_reg_stage1: 0.260 loss_box_reg_stage2: 0.262 loss_centernet_agn_neg: 0.033 loss_centernet_agn_pos: 0.087 loss_centernet_loc: 0.198 loss_cls_stage0: 0.144 loss_cls_stage1: 0.126 loss_cls_stage2: 0.115 time: 0.5513 data_time: 0.0178 lr: 0.000200 max_mem: 5000M.

Is it because a different version of detectron2 is used?

you model output config has:
EFFICIENTNET:
BASE_NAME: efficientnet_b3
NORM: FrozenBN
OUT_LEVELS: (3, 4, 5)
why EFFICIENTNET in the config? Your Base-CenterNet2.yaml is this one ?

@xingyizhou
Copy link
Owner

Hi,
Sorry for the delayed response. It seems I do not have access to your log, can you share it? For the difference in the log format, yes I used my own detectron2 version which I printed more statistics during training (simply modifying this line). EfficientNet is in my own version and is not used in this project. These should not affect the functionality of the codebase. The referred Base-CenterNet2.yaml is correct. I'll have a better idea when I see your log.

@bywang2018
Copy link

Hello, where can I download this dataset(coco_un_yolov4_55_0.5)?
Thanks! @xingyizhou

@liuheng0111
Copy link
Author

My traing log.

Another Question: I want to train detect with my self dataset, some boxes have not category, some boxes have category. I use two ways:
1: add another category;
2: didnot add another category, set -999 as the category which box has no category;
box has no category doesn't Calculation the cls-loss. I trained on coco, mask half bbox with no category, validation dataset bbox all have category. Two ways both has 30 MAP, but the loss is lagger when the AP goes on;
total_loss: 10.1 loss_box_reg_stage0: 2.176 loss_box_reg_stage1: 2.417 loss_box_reg_stage2: 4.492 loss_centernet_agn_neg: 0.03299 loss_centernet_agn_pos: 0.1173 loss_centernet_loc: 0.2441 loss_cls_stage0: 0.1453 loss_cls_stage1: 0.1776 loss_cls_stage2: 0.2449
The box regresion loss is very big,but the MAP on validate dataset get better. Is there any suggestions ?
The way 2 trainning log on coco;

@xingyizhou
Copy link
Owner

The log shows you are using a batchsize of 96. Can you use the original batch size (16) or modify the total iterations and learning rate accordingly? Please do specify any changes you made in the code when reporting reproducibility issues.

@xingyizhou
Copy link
Owner

@WangBoying you can download it here from the model zoo.

@bywang2018
Copy link

This is very important to me! Thank you very much!

@xingyizhou

@liuheng0111
Copy link
Author

@xingyizhou The original code and config trained log

@LeonLab
Copy link

LeonLab commented Oct 18, 2021

@liuheng0111 Did you found the solution?I have a similar problem with you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants