Is there any problem when using train_loader of scannetV2? #4

Yustarzzz · 2022-03-21T16:01:48Z

Hi!
I'm working with your code to train scannetV2 for 3d object detection.
Anyway, I want to test this code for a few dataset.
( ex. 2 train scene, 1 val scene, 1 test scene for scannetV2 dataset)

So, I modify train.txt, val.txt, test.txt for these 4 scene.
And my dataset structure is like below.

Fortunately, when I try to run train.py, It works.

However, there is an error like this.
In this error, it said that there is no 'loss' key in am_dict.
Therefore, I printed it and it shows that am_dict is an empty list.

Also, I tried to print train_loader of this dataset, by adding a code like "for i, batch in enumerate(train_loader): print (i)".
However, there isn't any result of this print statement. (It means, that there is no train_loader?)
So, I want to ask some help!
Thk 👍

/content/drive/MyDrive/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2022-03-21 15:47:04,679 INFO log.py line 39 7139] ************************ Start Logging ************************
[2022-03-21 15:47:04,751 INFO train.py line 22 7139] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32)
[2022-03-21 15:47:04,757 INFO train.py line 153 7139] => creating model ...
Load pretrained input_conv: 1/1
Load pretrained unet: 390/390
Load pretrained output_layer: 5/5
Load pretrained semantic_linear: 9/9
Load pretrained offset_linear: 9/9
Load pretrained intra_ins_unet: 85/85
Load pretrained intra_ins_outputlayer: 5/5
[2022-03-21 15:47:09,078 INFO train.py line 164 7139] cuda available: True
[2022-03-21 15:47:09,130 INFO train.py line 168 7139] #classifier parameters: 30839600
[2022-03-21 15:47:09,311 INFO scannetv2_inst.py line 50 7139] Training samples: 2
[2022-03-21 15:47:09,375 INFO scannetv2_inst.py line 84 7139] Validation samples: 1
Traceback (most recent call last):
File "train.py", line 221, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 98, in train_epoch
logger.info("epoch: {}/{}, train loss: {:.4f}, time: {}s".format(epoch, cfg.epochs, am_dict['loss'].avg, time.time() - start_epoch))
KeyError: 'loss'

thangvubk · 2022-03-22T10:26:46Z

It seems the dataloader cannot load the data. Could you please check whether your data is in correct path. The train, val and test should be in SoftGroup/dataset/scannetv2/

Yustarzzz · 2022-03-22T10:42:32Z

Hi thangvubk !
Thanks for your help. However, they are in the correct path. . .

thangvubk · 2022-03-22T10:50:28Z

The problem is your train data has only 2 scans. The default batch size is 4 and drop_last=True, it will ignore your data. See below.

SoftGroup/data/scannetv2_inst.py

Lines 53 to 54 in 5ac6485

    
           self.train_data_loader = DataLoader(train_set, batch_size=self.batch_size, collate_fn=self.trainMerge, num_workers=self.train_workers, 
        
                                               shuffle=True, sampler=None, drop_last=True, pin_memory=True)

The solution is (1) set drop_last to False, or (2) reduce batch_size to 2.

Yustarzzz · 2022-03-22T12:41:42Z

Thanks thangvubk!
I think maybe it works. I uploaded more 2 scenes for train dataset, so there is 4 scenes.

However, there is an other error occur like below ...
Do you know about this?

/content/drive/MyDrive/AIA/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2022-03-22 12:37:56,288 INFO log.py line 39 7052] ************************ Start Logging ************************
[2022-03-22 12:37:58,464 INFO train.py line 22 7052] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=True, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32)
[2022-03-22 12:37:58,473 INFO train.py line 153 7052] => creating model ...
Load pretrained input_conv: 1/1
Load pretrained unet: 390/390
Load pretrained output_layer: 5/5
Load pretrained semantic_linear: 9/9
Load pretrained offset_linear: 9/9
Load pretrained intra_ins_unet: 85/85
Load pretrained intra_ins_outputlayer: 5/5
[2022-03-22 12:38:05,721 INFO train.py line 164 7052] cuda available: True
[2022-03-22 12:38:05,788 INFO train.py line 168 7052] #classifier parameters: 30839600
[2022-03-22 12:38:06,724 INFO scannetv2_inst.py line 50 7052] Training samples: 4
[2022-03-22 12:38:06,792 INFO scannetv2_inst.py line 84 7052] Validation samples: 1
Traceback (most recent call last):
File "train.py", line 221, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 61, in train_epoch
loss, , visual_dict, meter_dict = model_fn(batch, model, epoch, semantic_only=cfg.semantic_only)
File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 527, in model_fn
ret = model(input, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch, 'train', semantic_only=semantic_only)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 316, in forward
output = self.input_conv(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/usr/local/lib/python3.7/dist-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /content/drive/MyDrive/AIA/softgroup/SoftGroup/lib/spconv/src/spconv/indice.cu 120
cuda execution failed with error 98

thangvubk · 2022-03-22T12:47:57Z

What is the GPU model are you using. It is related to spconv. I found a related issue here open-mmlab/OpenPCDet#442

thangvubk closed this as completed Apr 4, 2022

SijanNeupane49 mentioned this issue Apr 28, 2022

Spconv and Cuda error while training on my own dataset #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any problem when using train_loader of scannetV2? #4

Is there any problem when using train_loader of scannetV2? #4

Yustarzzz commented Mar 21, 2022 •

edited

thangvubk commented Mar 22, 2022

Yustarzzz commented Mar 22, 2022

thangvubk commented Mar 22, 2022

Yustarzzz commented Mar 22, 2022 •

edited

thangvubk commented Mar 22, 2022

Is there any problem when using train_loader of scannetV2? #4

Is there any problem when using train_loader of scannetV2? #4

Comments

Yustarzzz commented Mar 21, 2022 • edited

thangvubk commented Mar 22, 2022

Yustarzzz commented Mar 22, 2022

thangvubk commented Mar 22, 2022

Yustarzzz commented Mar 22, 2022 • edited

thangvubk commented Mar 22, 2022

Yustarzzz commented Mar 21, 2022 •

edited

Yustarzzz commented Mar 22, 2022 •

edited