training error #15

Baby47 · 2019-04-26T00:39:28Z

Hi, @tianzhi0549,thanks for your project.
I am trying to run this project with on my own dataset. I change the corresponding setup in config file and began to train. However, it runs for several iterations and then this error appears:

File "tools/train_net.py", line 174, in
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/detection/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 656, in _process_next_batch
self._put_indices()
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 646, in _put_indices
indices = next(self.sample_iter, None)
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py", line 24, in iter
for batch in self.batch_sampler:
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 107, in iter
batches = self._prepare_batches()
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in _prepare_batches
first_element_of_batch = [t[0].item() for t in merged]
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in
first_element_of_batch = [t[0].item() for t in merged]
IndexError: index 0 is out of bounds for dimension 0 with size 0

I have checked the format of my dataset, could you give me some suggestions about this error?
Thanks a lot

tianzhi0549 · 2019-04-26T01:17:23Z

@Baby47 Do you have some images with no annotations? Please remove these images and try again.

Baby47 · 2019-04-26T01:59:23Z

@tianzhi0549 I have checked the corresponding index of images and annotations and replaced the false index, but it still makes a same error .

tianzhi0549 · 2019-04-26T02:05:59Z

@Baby47 You need to debug your code line by line and figure out why your code raised the error IndexError: index 0 is out of bounds for dimension 0 with size 0. The error means you have a list or something with no element.

Baby47 · 2019-04-26T12:05:53Z

I have checked the image and its annotations which it turns run, but not any mistake. I wonder if it is relevant to the min_size_train and max_size_train in config file. I have changed it to less number, but the size in the wrong image is less than the value I set. @tianzhi0549

Baby47 · 2019-04-29T02:29:47Z

I have solved this problem by issues in maskrcnn, but when i start training and analyze the loss, I find the value of loss_center remains nearly the same during the whole training process and I wonder why. @tianzhi0549

tianzhi0549 · 2019-04-29T02:39:11Z

@Baby47 Happy to know that you have solved it. Could you post the issue solving your problem here, in case someone else has a similar problem? The final loss value of centerness branch is about 0.57. Because we use binary cross entropy (BCE) and its minimum is not near to zero if the training target is some value between 0 and 1 (e.g., 0.5).

Baby47 · 2019-04-29T03:00:33Z

I solved my problem by this link: facebookresearch/maskrcnn-benchmark#656

Baby47 · 2019-04-29T03:09:04Z

The BCE loss is very close to 0.6 when training begins, so it quickly converges to final loss value(about 0.56), I'm not sure the truly meaning of this loss and I wonder where the code finishes the multiply operation (cls score * centerness). Moreover, can BCE loss replaced by other cross entropy loss, have you tried any other loss? @tianzhi0549

tianzhi0549 · 2019-04-29T03:43:12Z

@Baby47 0.56 should be OK.

The multiplication happens only when testing and the code is at

FCOS/maskrcnn_benchmark/modeling/rpn/fcos/inference.py

Line 71 in de2d65a

box_cls = box_cls * centerness[:, :, None]

.

We have tried L1 for center-ness but it yields a similar performance.

Cying212Jack · 2021-10-11T13:12:30Z

@Baby47 0.56 should be OK.

The multiplication happens only when testing and the code is at

FCOS/maskrcnn_benchmark/modeling/rpn/fcos/inference.py

Line 71 in de2d65a

box_cls = box_cls * centerness[:, :, None]

.
We have tried L1 for center-ness but it yields a similar performance.

the loss values between centerness, classification and regression are not balanced, eg 0.57, 0.1 and 0.15. Have you tried add loss weight to reduce affect of centerness branch when back propagate?

tianzhi0549 mentioned this issue Apr 29, 2019

centerness loss is larger than other loss #22

Closed

tianzhi0549 closed this as completed May 5, 2019

tianzhi0549 mentioned this issue Jun 23, 2019

Problem about centerness for trarning and inference #66

Open

milliema mentioned this issue Nov 1, 2019

center-ness loss #189

Closed

This was referenced Jun 27, 2021

loss_fcos_ctr is always around 0.6 aim-uofa/AdelaiDet#395

Closed

loss_fcos_ctr is always around 0.6 youngwanLEE/centermask2#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training error #15

training error #15

Baby47 commented Apr 26, 2019

tianzhi0549 commented Apr 26, 2019

Baby47 commented Apr 26, 2019

tianzhi0549 commented Apr 26, 2019 •

edited

Loading

Baby47 commented Apr 26, 2019

Baby47 commented Apr 29, 2019

tianzhi0549 commented Apr 29, 2019

Baby47 commented Apr 29, 2019

Baby47 commented Apr 29, 2019

tianzhi0549 commented Apr 29, 2019 •

edited

Loading

Cying212Jack commented Oct 11, 2021 •

edited

Loading

training error #15

training error #15

Comments

Baby47 commented Apr 26, 2019

tianzhi0549 commented Apr 26, 2019

Baby47 commented Apr 26, 2019

tianzhi0549 commented Apr 26, 2019 • edited Loading

Baby47 commented Apr 26, 2019

Baby47 commented Apr 29, 2019

tianzhi0549 commented Apr 29, 2019

Baby47 commented Apr 29, 2019

Baby47 commented Apr 29, 2019

tianzhi0549 commented Apr 29, 2019 • edited Loading

Cying212Jack commented Oct 11, 2021 • edited Loading

tianzhi0549 commented Apr 26, 2019 •

edited

Loading

tianzhi0549 commented Apr 29, 2019 •

edited

Loading

Cying212Jack commented Oct 11, 2021 •

edited

Loading