Training error! #28

Byronnar · 2020-11-12T02:37:24Z

I want to train VOC2012, but get the error below:

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    main(config, args.resume)
  File "train.py", line 82, in main
    trainer.train()
  File "/home/byronnar/bigfile/projects/CCT/base/base_trainer.py", line 91, in train
    results = self._train_epoch(epoch)
  File "/home/byronnar/bigfile/projects/CCT/trainer.py", line 76, in _train_epoch
    total_loss, cur_losses, outputs = self.model(x_l=input_l, target_l=target_l, x_ul=input_ul, curr_iter=batch_idx, target_ul=target_ul, epoch=epoch-1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/model.py", line 93, in forward
    output_l = self.main_decoder(self.encoder(x_l))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 61, in forward
    x = self.psp(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in forward
    align_corners=False) for stage in self.stages])
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in <listcomp>
    align_corners=False) for stage in self.stages])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1652, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
  0%|                                                                                                         | 0/9118 [00:02<?, ?it/s]

How should I do? Thank you

The text was updated successfully, but these errors were encountered:

yassouali · 2020-11-12T02:56:27Z

Hi @Byronnar, thank you for your interest in our work.

This error is at the end of the first epoch, where in the last batch, you have only one image and BN gives an error, to solve this, please simply set drop_last = True in the dataloaders (or change the batch size / do an if statement to skip the last step if BZ = 1).

Byronnar · 2020-11-12T04:31:53Z

Thank for your reply~ I have set batch_size=2 have solved this problem. But I cannot find where is the dataloaders(Except in train_cam.py), you mean in dataloader in voc.py?

Byronnar · 2020-11-12T06:18:25Z

Hi @Byronnar, thank you for your interest in our work.

This error is at the end of the first epoch, where in the last batch, you have only one image and BN gives an error, to solve this, please simply set drop_last = True in the dataloaders (or change the batch size / do an if statement to skip the last step if BZ = 1).

The error below:

self.init_kwargs: 88888888  {'dataset': Dataset: VOCDataset
    # data: 9118
    Split: train_unsupervised
    Root: /home/byronnar/bigfile/datasets/img_detection/VOCdevkit/VOC2012, 'batch_size': 1, 'shuffle': True, 'num_workers': 8, 'pin_memory': True, 'drop_last': True}
self.init_kwargs: 88888888  {'dataset': Dataset: VOCDataset
    # data: 1449
    Split: val
    Root: /home/byronnar/bigfile/datasets/img_detection/VOCdevkit/VOC2012, 'batch_size': 1, 'shuffle': False, 'num_workers': 4, 'pin_memory': True, 'drop_last': True}
Loading pretrained model:models/backbones/pretrained/3x3resnet50-imagenet.pth


Nbr of trainable parameters: 47210776

Detected GPUs: 1 Requested: 1


  0%|                                                                                                         | 0/9118 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 99, in <module>
    main(config, args.resume)
  File "train.py", line 83, in main
    trainer.train()
  File "/home/byronnar/bigfile/projects/CCT/base/base_trainer.py", line 91, in train
    results = self._train_epoch(epoch)
  File "/home/byronnar/bigfile/projects/CCT/trainer.py", line 75, in _train_epoch
    total_loss, cur_losses, outputs = self.model(x_l=input_l, target_l=target_l, x_ul=input_ul, curr_iter=batch_idx, target_ul=target_ul, epoch=epoch-1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/model.py", line 93, in forward
    output_l = self.main_decoder(self.encoder(x_l))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 61, in forward
    x = self.psp(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in forward
    align_corners=False) for stage in self.stages])
  File "/home/byronnar/bigfile/projects/CCT/models/encoder.py", line 36, in <listcomp>
    align_corners=False) for stage in self.stages])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1652, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
  0%|                                                                                                         | 0/9118 [00:02<?, ?it/s]

I have added the 'drop_last': True.

yassouali · 2020-11-12T06:52:31Z

@Byronnar

Sorry, I thought you had a problem only at the end of the first epochs (multi gpu training). In your case, I think you are training with only one batch, but the nn.BatchNorm layers expect more than one image per batch to compute the running mean. You'll need to set the batch size to be > 1.

Byronnar · 2020-11-25T07:19:05Z

@Byronnar

Sorry, I thought you had a problem only at the end of the first epochs (multi gpu training). In your case, I think you are training with only one batch, but the nn.BatchNorm layers expect more than one image per batch to compute the running mean. You'll need to set the batch size to be > 1.

Thanks for your reply.
I can train on VOC2012, but I don't know how to generate images in the SegmentationClassAug? I mean that unsupervised images.
Looking forward to your reply.

Byronnar · 2020-11-25T09:10:49Z

@yassouali

yassouali · 2020-11-26T00:02:06Z

Are you trying to use it on your dataset? SegmentationClassAug is only specific for VOC2012.

Byronnar · 2020-11-26T01:29:53Z

Yeah, I AM trying to use it on my dataset, but I don't know how to get the unlabeled data, it can be original RGB colorful image?

Byronnar · 2020-11-26T01:30:55Z

Are you trying to use it on your dataset? SegmentationClassAug is only specific for VOC2012.

whether the custom datasets should as same as SegmentationClassAug?

yassouali · 2020-11-26T06:55:50Z

No need for SegmentationClassAug, simply use your own dataloader, you'll need to create two dataloaders, one for the supervised dataset and one for the unsupervised dataset. You can create it them any way you like (see closed issues, there is a lot of discussion on the usage of custom datasets).

Byronnar · 2020-11-26T07:04:22Z

No need for SegmentationClassAug, simply use your own dataloader, you'll need to create two dataloaders, one for the supervised dataset and one for the unsupervised dataset. You can create it them any way you like (see closed issues, there is a lot of discussion on the usage of custom datasets).

Thank you! You mean don't need for image-level data? (in the paper, there arre 9k image-level data in unsupervised section.)

SuzannaLin · 2020-11-30T17:52:56Z

I used https://github.com/wkentaro/labelme to annotate my own dataset

yassouali closed this as completed Nov 12, 2020

SuzannaLin mentioned this issue Apr 2, 2021

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1]) #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error! #28

Training error! #28

Byronnar commented Nov 12, 2020

yassouali commented Nov 12, 2020

Byronnar commented Nov 12, 2020

Byronnar commented Nov 12, 2020

yassouali commented Nov 12, 2020 •

edited

Byronnar commented Nov 25, 2020

Byronnar commented Nov 25, 2020

yassouali commented Nov 26, 2020 •

edited

Byronnar commented Nov 26, 2020

Byronnar commented Nov 26, 2020

yassouali commented Nov 26, 2020

Byronnar commented Nov 26, 2020

SuzannaLin commented Nov 30, 2020

Training error! #28

Training error! #28

Comments

Byronnar commented Nov 12, 2020

yassouali commented Nov 12, 2020

Byronnar commented Nov 12, 2020

Byronnar commented Nov 12, 2020

yassouali commented Nov 12, 2020 • edited

Byronnar commented Nov 25, 2020

Byronnar commented Nov 25, 2020

yassouali commented Nov 26, 2020 • edited

Byronnar commented Nov 26, 2020

Byronnar commented Nov 26, 2020

yassouali commented Nov 26, 2020

Byronnar commented Nov 26, 2020

SuzannaLin commented Nov 30, 2020

yassouali commented Nov 12, 2020 •

edited

yassouali commented Nov 26, 2020 •

edited