Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run main.py but can run demo.py and test.py #404

Closed
WJ-Lai opened this issue Oct 14, 2019 · 1 comment
Closed

Cannot run main.py but can run demo.py and test.py #404

WJ-Lai opened this issue Oct 14, 2019 · 1 comment

Comments

@WJ-Lai
Copy link

WJ-Lai commented Oct 14, 2019

Hi, I can run demo.py and test.py successfully by:
python test.py ctdet --exp_id coco_dla --keep_res --load_model ../models/ctdet_coco_dla_2x.pth
python demo.py ctdet --demo /home/vincent/Code/CenterNet/images/ --load_model /home/vincent/Code/CenterNet/models/ctdet_coco_dla_2x.pth

When I run python main.py ctdet --exp_id coco_dla --batch_size 5 --master_batch 5 --lr 1.25e-4 --gpus 0, it reports errors.

Fix size testing.
training chunk_sizes: [5]
The output will be saved to  /home/vincent/Code/CenterNet/src/lib/../../exp/ctdet/coco_dla
heads {'hm': 80, 'wh': 2, 'reg': 2}
Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=5, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[5], data_dir='/home/vincent/Code/CenterNet/src/lib/../../data', dataset='coco', debug=0, debug_dir='/home/vincent/Code/CenterNet/src/lib/../../exp/ctdet/coco_dla/debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='/home/vincent/Code/CenterNet/src/lib/../../exp/ctdet', exp_id='coco_dla', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 80, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=512, input_res=512, input_w=512, keep_res=False, kitti_split='3dop', load_model='', lr=0.000125, lr_step=[90, 120], master_batch_size=5, mean=array([[[0.40789655, 0.44719303, 0.47026116]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=False, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=80, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=4, off_weight=1, output_h=128, output_res=128, output_w=128, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='/home/vincent/Code/CenterNet/src/lib/../..', rot_weight=1, rotate=0, save_all=False, save_dir='/home/vincent/Code/CenterNet/src/lib/../../exp/ctdet/coco_dla', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.2886383 , 0.27408165, 0.27809834]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1)
Creating model...
Setting up data...
==> initializing coco 2017 val data.
loading annotations into memory...
Done (t=0.31s)
creating index...
index created!
Loaded val 5000 samples
==> initializing coco 2017 train data.
loading annotations into memory...
Done (t=8.72s)
creating index...
index created!
Loaded train 118287 samples
Starting training...
ctdet/coco_dlaTHCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
  File "main.py", line 102, in <module>
    main(opt)
  File "main.py", line 70, in main
    log_dict_train, _ = trainer.train(epoch, train_loader)
  File "/home/vincent/Code/CenterNet/src/lib/trains/base_trainer.py", line 119, in train
    return self.run_epoch('train', epoch, data_loader)
  File "/home/vincent/Code/CenterNet/src/lib/trains/base_trainer.py", line 69, in run_epoch
    output, loss, loss_stats = model_with_loss(batch)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vincent/Code/CenterNet/src/lib/trains/base_trainer.py", line 19, in forward
    outputs = self.model(batch['input'])
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vincent/Code/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 471, in forward
    x = self.base(x)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vincent/Code/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 288, in forward
    x = self.base_layer(x)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vincent/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp:663

Even I run the below command, it also reports the same error.
python main.py ctdet --exp_id coco_dla --batch_size 5 --master_batch 5 --lr 1.25e-4 --gpus 0 --num_workers 0

My environment is:
Ubuntu 16.04
GPU RTX2070 Advanced OC 8G
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+

CUDA V9.0.176
CUDNN 7.1.4
python 3.6.9
pytorch 0.4.1

Could you please tell me how to solve this problem? Thanks.

@WJ-Lai
Copy link
Author

WJ-Lai commented Oct 17, 2019

Solved by changing the CUDA version: #356

@WJ-Lai WJ-Lai closed this as completed Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant