Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error #15

Closed
Baby47 opened this issue Apr 26, 2019 · 10 comments
Closed

training error #15

Baby47 opened this issue Apr 26, 2019 · 10 comments

Comments

@Baby47
Copy link

Baby47 commented Apr 26, 2019

Hi, @tianzhi0549,thanks for your project.
I am trying to run this project with on my own dataset. I change the corresponding setup in config file and began to train. However, it runs for several iterations and then this error appears:

File "tools/train_net.py", line 174, in
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/detection/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 656, in _process_next_batch
self._put_indices()
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 646, in _put_indices
indices = next(self.sample_iter, None)
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py", line 24, in iter
for batch in self.batch_sampler:
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 107, in iter
batches = self._prepare_batches()
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in _prepare_batches
first_element_of_batch = [t[0].item() for t in merged]
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in
first_element_of_batch = [t[0].item() for t in merged]
IndexError: index 0 is out of bounds for dimension 0 with size 0

I have checked the format of my dataset, could you give me some suggestions about this error?
Thanks a lot

@tianzhi0549
Copy link
Owner

@Baby47 Do you have some images with no annotations? Please remove these images and try again.

@Baby47
Copy link
Author

Baby47 commented Apr 26, 2019

@tianzhi0549 I have checked the corresponding index of images and annotations and replaced the false index, but it still makes a same error .

@tianzhi0549
Copy link
Owner

tianzhi0549 commented Apr 26, 2019

@Baby47 You need to debug your code line by line and figure out why your code raised the error IndexError: index 0 is out of bounds for dimension 0 with size 0. The error means you have a list or something with no element.

@Baby47
Copy link
Author

Baby47 commented Apr 26, 2019

I have checked the image and its annotations which it turns run, but not any mistake. I wonder if it is relevant to the min_size_train and max_size_train in config file. I have changed it to less number, but the size in the wrong image is less than the value I set. @tianzhi0549

@Baby47
Copy link
Author

Baby47 commented Apr 29, 2019

I have solved this problem by issues in maskrcnn, but when i start training and analyze the loss, I find the value of loss_center remains nearly the same during the whole training process and I wonder why. @tianzhi0549

@tianzhi0549
Copy link
Owner

@Baby47 Happy to know that you have solved it. Could you post the issue solving your problem here, in case someone else has a similar problem? The final loss value of centerness branch is about 0.57. Because we use binary cross entropy (BCE) and its minimum is not near to zero if the training target is some value between 0 and 1 (e.g., 0.5).

@Baby47
Copy link
Author

Baby47 commented Apr 29, 2019

I solved my problem by this link: facebookresearch/maskrcnn-benchmark#656

@Baby47
Copy link
Author

Baby47 commented Apr 29, 2019

The BCE loss is very close to 0.6 when training begins, so it quickly converges to final loss value(about 0.56), I'm not sure the truly meaning of this loss and I wonder where the code finishes the multiply operation (cls score * centerness). Moreover, can BCE loss replaced by other cross entropy loss, have you tried any other loss? @tianzhi0549

@tianzhi0549
Copy link
Owner

tianzhi0549 commented Apr 29, 2019

@Baby47 0.56 should be OK.

The multiplication happens only when testing and the code is at

box_cls = box_cls * centerness[:, :, None]
.

We have tried L1 for center-ness but it yields a similar performance.

@Cying212Jack
Copy link

Cying212Jack commented Oct 11, 2021

@Baby47 0.56 should be OK.

The multiplication happens only when testing and the code is at

box_cls = box_cls * centerness[:, :, None]

.
We have tried L1 for center-ness but it yields a similar performance.

the loss values between centerness, classification and regression are not balanced, eg 0.57, 0.1 and 0.15. Have you tried add loss weight to reduce affect of centerness branch when back propagate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants