Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training error! #28

Closed
Byronnar opened this issue Nov 12, 2020 · 12 comments
Closed

Training error! #28

Byronnar opened this issue Nov 12, 2020 · 12 comments

Comments

@Byronnar
Copy link

I want to train VOC2012, but get the error below:

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    main(config, args.resume)
  File "train.py", line 82, in main
    trainer.train()
  File "/home/byronnar/bigfile/projects/CCT/base/base_trainer.py", line 91, in train
    results = self._train_epoch(epoch)
  File "/home/byronnar/bigfile/projects/CCT/trainer.py", line 76, in _train_epoch
    total_loss, cur_losses, outputs = self.model(x_l=input_l, target_l=target_l, x_ul=input_ul, curr_iter=batch_idx, target_ul=target_ul, epoch=epoch-1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/model.py", line 93, in forward
    output_l = self.main_decoder(self.encoder(x_l))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 61, in forward
    x = self.psp(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in forward
    align_corners=False) for stage in self.stages])
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in <listcomp>
    align_corners=False) for stage in self.stages])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1652, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
  0%|                                                                                                         | 0/9118 [00:02<?, ?it/s]

How should I do? Thank you

@yassouali
Copy link
Owner

Hi @Byronnar, thank you for your interest in our work.

This error is at the end of the first epoch, where in the last batch, you have only one image and BN gives an error, to solve this, please simply set drop_last = True in the dataloaders (or change the batch size / do an if statement to skip the last step if BZ = 1).

@Byronnar
Copy link
Author

Thank for your reply~ I have set batch_size=2 have solved this problem. But I cannot find where is the dataloaders(Except in train_cam.py), you mean in dataloader in voc.py?

@Byronnar
Copy link
Author

Hi @Byronnar, thank you for your interest in our work.

This error is at the end of the first epoch, where in the last batch, you have only one image and BN gives an error, to solve this, please simply set drop_last = True in the dataloaders (or change the batch size / do an if statement to skip the last step if BZ = 1).

The error below:

self.init_kwargs: 88888888  {'dataset': Dataset: VOCDataset
    # data: 9118
    Split: train_unsupervised
    Root: /home/byronnar/bigfile/datasets/img_detection/VOCdevkit/VOC2012, 'batch_size': 1, 'shuffle': True, 'num_workers': 8, 'pin_memory': True, 'drop_last': True}
self.init_kwargs: 88888888  {'dataset': Dataset: VOCDataset
    # data: 1449
    Split: val
    Root: /home/byronnar/bigfile/datasets/img_detection/VOCdevkit/VOC2012, 'batch_size': 1, 'shuffle': False, 'num_workers': 4, 'pin_memory': True, 'drop_last': True}
Loading pretrained model:models/backbones/pretrained/3x3resnet50-imagenet.pth


Nbr of trainable parameters: 47210776

Detected GPUs: 1 Requested: 1


  0%|                                                                                                         | 0/9118 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 99, in <module>
    main(config, args.resume)
  File "train.py", line 83, in main
    trainer.train()
  File "/home/byronnar/bigfile/projects/CCT/base/base_trainer.py", line 91, in train
    results = self._train_epoch(epoch)
  File "/home/byronnar/bigfile/projects/CCT/trainer.py", line 75, in _train_epoch
    total_loss, cur_losses, outputs = self.model(x_l=input_l, target_l=target_l, x_ul=input_ul, curr_iter=batch_idx, target_ul=target_ul, epoch=epoch-1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/model.py", line 93, in forward
    output_l = self.main_decoder(self.encoder(x_l))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 61, in forward
    x = self.psp(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in forward
    align_corners=False) for stage in self.stages])
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in <listcomp>
    align_corners=False) for stage in self.stages])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1652, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
  0%|                                                                                                         | 0/9118 [00:02<?, ?it/s]

I have added the 'drop_last': True.

@yassouali
Copy link
Owner

yassouali commented Nov 12, 2020

@Byronnar

Sorry, I thought you had a problem only at the end of the first epochs (multi gpu training). In your case, I think you are training with only one batch, but the nn.BatchNorm layers expect more than one image per batch to compute the running mean. You'll need to set the batch size to be > 1.

@Byronnar
Copy link
Author

@Byronnar

Sorry, I thought you had a problem only at the end of the first epochs (multi gpu training). In your case, I think you are training with only one batch, but the nn.BatchNorm layers expect more than one image per batch to compute the running mean. You'll need to set the batch size to be > 1.

Thanks for your reply.
I can train on VOC2012, but I don't know how to generate images in the SegmentationClassAug? I mean that unsupervised images.
Looking forward to your reply.

@Byronnar
Copy link
Author

@yassouali

@yassouali
Copy link
Owner

yassouali commented Nov 26, 2020

Are you trying to use it on your dataset? SegmentationClassAug is only specific for VOC2012.

@Byronnar
Copy link
Author

Yeah, I AM trying to use it on my dataset, but I don't know how to get the unlabeled data, it can be original RGB colorful image?

@Byronnar
Copy link
Author

Are you trying to use it on your dataset? SegmentationClassAug is only specific for VOC2012.

whether the custom datasets should as same as SegmentationClassAug?

@yassouali
Copy link
Owner

No need for SegmentationClassAug, simply use your own dataloader, you'll need to create two dataloaders, one for the supervised dataset and one for the unsupervised dataset. You can create it them any way you like (see closed issues, there is a lot of discussion on the usage of custom datasets).

@Byronnar
Copy link
Author

No need for SegmentationClassAug, simply use your own dataloader, you'll need to create two dataloaders, one for the supervised dataset and one for the unsupervised dataset. You can create it them any way you like (see closed issues, there is a lot of discussion on the usage of custom datasets).

Thank you! You mean don't need for image-level data? (in the paper, there arre 9k image-level data in unsupervised section.)

@SuzannaLin
Copy link
Contributor

I used https://github.com/wkentaro/labelme to annotate my own dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants