Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any problem when using train_loader of scannetV2? #4

Closed
Yustarzzz opened this issue Mar 21, 2022 · 5 comments
Closed

Is there any problem when using train_loader of scannetV2? #4

Yustarzzz opened this issue Mar 21, 2022 · 5 comments

Comments

@Yustarzzz
Copy link

Yustarzzz commented Mar 21, 2022

Hi!
I'm working with your code to train scannetV2 for 3d object detection.
Anyway, I want to test this code for a few dataset.
( ex. 2 train scene, 1 val scene, 1 test scene for scannetV2 dataset)

So, I modify train.txt, val.txt, test.txt for these 4 scene.
And my dataset structure is like below.
image

Fortunately, when I try to run train.py, It works.

However, there is an error like this.
In this error, it said that there is no 'loss' key in am_dict.
Therefore, I printed it and it shows that am_dict is an empty list.

Also, I tried to print train_loader of this dataset, by adding a code like "for i, batch in enumerate(train_loader): print (i)".
However, there isn't any result of this print statement. (It means, that there is no train_loader?)
So, I want to ask some help!
Thk 👍


/content/drive/MyDrive/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2022-03-21 15:47:04,679 INFO log.py line 39 7139] ************************ Start Logging ************************
[2022-03-21 15:47:04,751 INFO train.py line 22 7139] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32)
[2022-03-21 15:47:04,757 INFO train.py line 153 7139] => creating model ...
Load pretrained input_conv: 1/1
Load pretrained unet: 390/390
Load pretrained output_layer: 5/5
Load pretrained semantic_linear: 9/9
Load pretrained offset_linear: 9/9
Load pretrained intra_ins_unet: 85/85
Load pretrained intra_ins_outputlayer: 5/5
[2022-03-21 15:47:09,078 INFO train.py line 164 7139] cuda available: True
[2022-03-21 15:47:09,130 INFO train.py line 168 7139] #classifier parameters: 30839600
[2022-03-21 15:47:09,311 INFO scannetv2_inst.py line 50 7139] Training samples: 2
[2022-03-21 15:47:09,375 INFO scannetv2_inst.py line 84 7139] Validation samples: 1
Traceback (most recent call last):
File "train.py", line 221, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 98, in train_epoch
logger.info("epoch: {}/{}, train loss: {:.4f}, time: {}s".format(epoch, cfg.epochs, am_dict['loss'].avg, time.time() - start_epoch))
KeyError: 'loss'

@thangvubk
Copy link
Owner

It seems the dataloader cannot load the data. Could you please check whether your data is in correct path. The train, val and test should be in SoftGroup/dataset/scannetv2/

@Yustarzzz
Copy link
Author

Hi thangvubk !
Thanks for your help. However, they are in the correct path. . .

@thangvubk
Copy link
Owner

The problem is your train data has only 2 scans. The default batch size is 4 and drop_last=True, it will ignore your data. See below.

self.train_data_loader = DataLoader(train_set, batch_size=self.batch_size, collate_fn=self.trainMerge, num_workers=self.train_workers,
shuffle=True, sampler=None, drop_last=True, pin_memory=True)

The solution is (1) set drop_last to False, or (2) reduce batch_size to 2.

@Yustarzzz
Copy link
Author

Yustarzzz commented Mar 22, 2022

Thanks thangvubk!
I think maybe it works. I uploaded more 2 scenes for train dataset, so there is 4 scenes.

However, there is an other error occur like below ...
Do you know about this?


/content/drive/MyDrive/AIA/softgroup/SoftGroup/util/config.py:22: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2022-03-22 12:37:56,288 INFO log.py line 39 7052] ************************ Start Logging ************************
[2022-03-22 12:37:58,464 INFO train.py line 22 7052] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=-1, batch_size=4, bg_thresh=0.0, block_reps=2, block_residual=True, class_numpoint_mean=[-1.0, -1.0, 3917.0, 12056.0, 2303.0, 8331.0, 3948.0, 3166.0, 5629.0, 11719.0, 1003.0, 3317.0, 4912.0, 10221.0, 3889.0, 4136.0, 2120.0, 945.0, 3967.0, 2589.0], classes=18, cluster_shift_meanActive=300, config='config/softgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', dist=False, epochs=500, eval=True, exp_path='exp/scannetv2/softgroup/softgroup_default_scannet', fg_thresh=1.0, filename_suffix='_inst_nostuff.pth', fix_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear'], full_scale=[128, 512], ignore_label=-100, input_channel=3, iou_thr=0.5, local_rank=0, loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0], lr=0.001, manual_seed=123, max_npoint=250000, max_proposal_num=200, mode=4, model_dir='model/softgroup/softgroup.py', model_name='softgroup', momentum=0.9, multiplier=0.5, optim='Adam', point_aggr_radius=0.04, prepare_epochs=-1, pretrain=None, pretrain_module=['input_conv', 'unet', 'output_layer', 'semantic_linear', 'offset_linear', 'intra_ins_unet', 'intra_ins_outputlayer'], pretrain_path='hais_ckpt.pth', save_dir='exp', save_freq=16, save_instance=True, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=20, score_mode=4, score_scale=50, score_thr=0.2, semantic_classes=20, semantic_only=False, split='val', step_epoch=200, task='train', test_epoch=500, test_mask_score_thre=-0.5, test_seed=567, test_workers=16, train_workers=4, use_coords=True, using_NMS=False, weight_decay=0.0001, width=32)
[2022-03-22 12:37:58,473 INFO train.py line 153 7052] => creating model ...
Load pretrained input_conv: 1/1
Load pretrained unet: 390/390
Load pretrained output_layer: 5/5
Load pretrained semantic_linear: 9/9
Load pretrained offset_linear: 9/9
Load pretrained intra_ins_unet: 85/85
Load pretrained intra_ins_outputlayer: 5/5
[2022-03-22 12:38:05,721 INFO train.py line 164 7052] cuda available: True
[2022-03-22 12:38:05,788 INFO train.py line 168 7052] #classifier parameters: 30839600
[2022-03-22 12:38:06,724 INFO scannetv2_inst.py line 50 7052] Training samples: 4
[2022-03-22 12:38:06,792 INFO scannetv2_inst.py line 84 7052] Validation samples: 1
Traceback (most recent call last):
File "train.py", line 221, in
train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
File "train.py", line 61, in train_epoch
loss, , visual_dict, meter_dict = model_fn(batch, model, epoch, semantic_only=cfg.semantic_only)
File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 527, in model_fn
ret = model(input
, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch, 'train', semantic_only=semantic_only)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/AIA/softgroup/SoftGroup/model/softgroup/softgroup.py", line 316, in forward
output = self.input_conv(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/spconv/modules.py", line 123, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/spconv/conv.py", line 151, in forward
self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)
File "/usr/local/lib/python3.7/dist-packages/spconv/ops.py", line 89, in get_indice_pairs
stride, padding, dilation, out_padding, int(subm), int(transpose))
RuntimeError: /content/drive/MyDrive/AIA/softgroup/SoftGroup/lib/spconv/src/spconv/indice.cu 120
cuda execution failed with error 98

@thangvubk
Copy link
Owner

What is the GPU model are you using. It is related to spconv. I found a related issue here open-mmlab/OpenPCDet#442

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants