-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training error #15
Comments
@Baby47 Do you have some images with no annotations? Please remove these images and try again. |
@tianzhi0549 I have checked the corresponding index of images and annotations and replaced the false index, but it still makes a same error . |
@Baby47 You need to debug your code line by line and figure out why your code raised the error |
I have checked the image and its annotations which it turns run, but not any mistake. I wonder if it is relevant to the min_size_train and max_size_train in config file. I have changed it to less number, but the size in the wrong image is less than the value I set. @tianzhi0549 |
I have solved this problem by issues in maskrcnn, but when i start training and analyze the loss, I find the value of loss_center remains nearly the same during the whole training process and I wonder why. @tianzhi0549 |
@Baby47 Happy to know that you have solved it. Could you post the issue solving your problem here, in case someone else has a similar problem? The final loss value of centerness branch is about |
I solved my problem by this link: facebookresearch/maskrcnn-benchmark#656 |
The BCE loss is very close to 0.6 when training begins, so it quickly converges to final loss value(about 0.56), I'm not sure the truly meaning of this loss and I wonder where the code finishes the multiply operation (cls score * centerness). Moreover, can BCE loss replaced by other cross entropy loss, have you tried any other loss? @tianzhi0549 |
@Baby47 The multiplication happens only when testing and the code is at
We have tried L1 for center-ness but it yields a similar performance. |
the loss values between centerness, classification and regression are not balanced, eg 0.57, 0.1 and 0.15. Have you tried add loss weight to reduce affect of centerness branch when back propagate? |
Hi, @tianzhi0549,thanks for your project.
I am trying to run this project with on my own dataset. I change the corresponding setup in config file and began to train. However, it runs for several iterations and then this error appears:
File "tools/train_net.py", line 174, in
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/detection/FCOS/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 656, in _process_next_batch
self._put_indices()
File "/home/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 646, in _put_indices
indices = next(self.sample_iter, None)
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py", line 24, in iter
for batch in self.batch_sampler:
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 107, in iter
batches = self._prepare_batches()
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in _prepare_batches
first_element_of_batch = [t[0].item() for t in merged]
File "/home/detection/FCOS/maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py", line 79, in
first_element_of_batch = [t[0].item() for t in merged]
IndexError: index 0 is out of bounds for dimension 0 with size 0
I have checked the format of my dataset, could you give me some suggestions about this error?
Thanks a lot
The text was updated successfully, but these errors were encountered: