Training error at windows #126

sounansu · 2019-06-05T14:22:16Z

I already can demo.py on my windows10 CP.

Next, I try to run benchmark evaluate on my PC.
But, below error was occurred.

Please, some one help me!

(CenterNet) F:\Users\sounansu\Anaconda3\CenterNet\src>python test.py ctdet --exp_id coco_dla --keep_res --load_model ../models/ctdet_coco_dla_2x.pth
Keep resolution testing.
training chunk_sizes: [32]
The output will be saved to  F:\Users\sounansu\Anaconda3\CenterNet\src\lib\..\..\exp\ctdet\coco_dla
heads {'hm': 80, 'wh': 2, 'reg': 2}
Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=32, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[32], data_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\data', dataset='coco', debug=0, debug_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\coco_dla\\debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet', exp_id='coco_dla', fix_res=False, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 80, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=512, input_res=512, input_w=512, keep_res=True, kitti_split='3dop', load_model='../models/ctdet_coco_dla_2x.pth', lr=0.000125, lr_step=[90, 120], master_batch_size=32, mean=array([[[0.40789655, 0.44719303, 0.47026116]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=False, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=80, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=4, off_weight=1, output_h=128, output_res=128, output_w=128, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..', rot_weight=1, rotate=0, save_all=False, save_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\coco_dla', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.2886383 , 0.27408165, 0.27809834]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1)
==> initializing coco 2017 val data.
loading annotations into memory...
Done (t=0.70s)
creating index...
index created!
Loaded val 5000 samples
Creating model...
loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
coco_dlaTHCudaCheck FAIL file=C:\w\1\s\tmp_conda_3.6_035809\conda\conda-bld\pytorch_1556683229598\work\torch/csrc/generic/StorageSharing.cpp line=245 error=71 : operation not supported
Traceback (most recent call last):
  File "test.py", line 126, in <module>
    prefetch_test(opt)
  File "test.py", line 69, in prefetch_test
    for ind, (img_id, pre_processed_images) in enumerate(data_loader):
  File "f:\users\sounansu\anaconda3\envs\centernet\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
    return _DataLoaderIter(self)
  File "f:\users\sounansu\anaconda3\envs\centernet\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
    w.start()
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "f:\users\sounansu\anaconda3\envs\centernet\lib\site-packages\torch\multiprocessing\reductions.py", line 231, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at C:\w\1\s\tmp_conda_3.6_035809\conda\conda-bld\pytorch_1556683229598\work\torch/csrc/generic/StorageSharing.cpp:245

(CenterNet) F:\Users\sounansu\Anaconda3\CenterNet\src>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

The text was updated successfully, but these errors were encountered:

xingyizhou · 2019-06-06T18:11:26Z

It seems the problem is from the pytorch dataload. Make sure your pytorch is installed correctly, i.e., you can run other pytorch projects. For benchmark testing, you can disable the multi-thread dataloader by --not_prefetch_test. Note that this will slow down the testing.

sounansu · 2019-06-09T14:58:15Z

Thank you!
I can run test.py by below.

python test.py ctdet --not_prefetch_test --exp_id coco_dla --keep_res --load_model ..\models\ctdet_coco_dla_2x.pth

But, when I tried to run main.py Training as below.

(CenterNet) F:\Users\sounansu\Anaconda3\CenterNet\src>python main.py ctdet --exp_id coco_dla --batch_size 32 --master_batch 15 --lr 1.25e-4  --gpus 0
Fix size testing.
training chunk_sizes: [15]
The output will be saved to  F:\Users\sounansu\Anaconda3\CenterNet\src\lib\..\..\exp\ctdet\coco_dla
heads {'hm': 80, 'wh': 2, 'reg': 2}
Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=32, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[15], data_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\data', dataset='coco', debug=0, debug_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\coco_dla\\debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet', exp_id='coco_dla', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 80, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=512, input_res=512, input_w=512, keep_res=False, kitti_split='3dop', load_model='', lr=0.000125, lr_step=[90, 120], master_batch_size=15, mean=array([[[0.40789655, 0.44719303, 0.47026116]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=False, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=80, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=4, off_weight=1, output_h=128, output_res=128, output_w=128, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..', rot_weight=1, rotate=0, save_all=False, save_dir='F:\\Users\\sounansu\\Anaconda3\\CenterNet\\src\\lib\\..\\..\\exp\\ctdet\\coco_dla', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.2886383 , 0.27408165, 0.27809834]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1)
Creating model...
Setting up data...
==> initializing coco 2017 val data.
loading annotations into memory...
Done (t=0.68s)
creating index...
index created!
Loaded val 5000 samples
==> initializing coco 2017 train data.
loading annotations into memory...
Done (t=18.41s)
creating index...
index created!
Loaded train 118287 samples
Starting training...
ctdet/coco_dlaTraceback (most recent call last):
  File "main.py", line 102, in <module>
    main(opt)
  File "main.py", line 70, in main
    log_dict_train, _ = trainer.train(epoch, train_loader)
  File "F:\Users\sounansu\Anaconda3\CenterNet\src\lib\trains\base_trainer.py", line 119, in train
    return self.run_epoch('train', epoch, data_loader)
  File "F:\Users\sounansu\Anaconda3\CenterNet\src\lib\trains\base_trainer.py", line 61, in run_epoch
    for iter_id, batch in enumerate(data_loader):
  File "f:\users\sounansu\anaconda3\envs\centernet\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
    return _DataLoaderIter(self)
  File "f:\users\sounansu\anaconda3\envs\centernet\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
    w.start()
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset.<locals>.Dataset'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "F:\Users\sounansu\Anaconda3\envs\CenterNet\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

EOFError was occurred.
Next I tried to run with --not_prefetch_test option. But, some error was occurred, too.

zzundark · 2019-06-10T08:03:12Z

Same issue..is it any solution?

zzundark · 2019-06-10T08:34:46Z

Same issue..is it any solution?
it work with --num_workers==0 but I can't handle data loading using multi-processing in windows...

sounansu · 2019-06-15T11:50:32Z

Thank you! zzundark!
I can train in below command!

python main.py ctdet --exp_id coco_dla --batch_size 11 --master_batch 11 --lr 1.25e-4 --gpus 0 --num_workers 0

I have only one RTX2070 with 8GB memories. So, I change batch size and master batch to 11.
(I tried 12. But OUT OF MEMORY.)

I will try res_18. It needs short training time.

heartInsert · 2019-06-19T15:10:38Z

Thank you! zzundark!
I can train in below command!
python main.py ctdet --exp_id coco_dla --batch_size 11 --master_batch 11 --lr 1.25e-4 --gpus 0 --num_workers 0
I have only one GTX2070 with 8GB memories. So, I change batch size and master batch to 11.
(I tried 12. But OUT OF MEMORY.)

I will try res_18. It needs short training time.

Yes,the num_worker must be 0 , even it is 1, goes wrong

Ai-is-light · 2019-09-09T15:29:09Z

@ @sounansu why the num_worker must be 0

TomsonBoylett · 2019-10-30T21:35:36Z

I have it training in windows with num_workers > 0

Basically the error boils down to Can't pickle local object 'get_dataset..Dataset'

From the python pickle docs: only classes that are defined at the top level of a module are picklable.

The Dataset object is defined dynamically in src\lib\datasets\dataset_factory.py based on the options you pass in from the command line.

def get_dataset(dataset, task):
  class Dataset(dataset_factory[dataset], _sample_factory[task]):
    pass
  return Dataset

What you will notice is that this class is defined in a function and not in the top level! So to fix this I made a small workaround:

class MyDataset(dataset_factory['mydataset'], _sample_factory['ctdet']):
  pass

def get_dataset(dataset, task):
  if dataset == 'mydataset' and task == 'ctdet':
    return MyDataset
  class Dataset(dataset_factory[dataset], _sample_factory[task]):
    pass
  return Dataset

Messy, but it works.

kap2403 · 2023-05-11T21:22:17Z

please explain the procedure for centernet dataloader.

xingyizhou closed this as completed Jun 6, 2019

YoutianGOD mentioned this issue Feb 1, 2020

I have it training in windows with num_workers > 0 #600

Open

Fansgithub2019 mentioned this issue Dec 15, 2021

RuntimeError happened when run test.py to check coco dataset AP #346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error at windows #126

Training error at windows #126

sounansu commented Jun 5, 2019

xingyizhou commented Jun 6, 2019

sounansu commented Jun 9, 2019

zzundark commented Jun 10, 2019 •

edited

zzundark commented Jun 10, 2019

sounansu commented Jun 15, 2019 •

edited

heartInsert commented Jun 19, 2019

Ai-is-light commented Sep 9, 2019

TomsonBoylett commented Oct 30, 2019

kap2403 commented May 11, 2023

Training error at windows #126

Training error at windows #126

Comments

sounansu commented Jun 5, 2019

xingyizhou commented Jun 6, 2019

sounansu commented Jun 9, 2019

zzundark commented Jun 10, 2019 • edited

zzundark commented Jun 10, 2019

sounansu commented Jun 15, 2019 • edited

heartInsert commented Jun 19, 2019

Ai-is-light commented Sep 9, 2019

TomsonBoylett commented Oct 30, 2019

kap2403 commented May 11, 2023

zzundark commented Jun 10, 2019 •

edited

sounansu commented Jun 15, 2019 •

edited