one GPU #39

gh2517956473 · 2019-05-14T08:14:49Z

Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much！

YuwenXiong · 2019-05-14T19:56:27Z

gh2517956473 · 2019-05-15T01:11:30Z

Thank you！

lfdeep · 2019-06-10T13:40:00Z

Thank you！
Hello,i use one gpu,but it occured:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 414, in
upsnet_train()
File "upsnet/upsnet_end2end_train.py", line 268, in upsnet_train
data, label, _ = train_iterator.next()
File "/root/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 330, in next
idx, batch = self._get_batch()
File "/root/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 309, in _get_batch
return self.data_queue.get()
File "/root/anaconda3/lib/python3.7/multiprocessing/queues.py", line 352, in get
res = self._reader.recv_bytes()
File "/root/anaconda3/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/root/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 227, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 31613) is killed by signal: Bus error. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

lfdeep · 2019-06-11T08:38:59Z

Can I use one GPU with 12G memory to train? Where does the code need to change?
Thank you very much！

Hello,Can you run the code successfully on a gpu?

pkuCactus · 2019-08-06T08:05:38Z

Thank you for great work. what if i use horovod on a single gpu machine?I tried it and found it fast than not use horovod, do this have any problem?Moreover, how could i run multiple horovod worker to mimic multiple gpu on a single gpu machine, thanks a lot. Expect your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

one GPU #39

one GPU #39

gh2517956473 commented May 14, 2019

YuwenXiong commented May 14, 2019

gh2517956473 commented May 15, 2019

lfdeep commented Jun 10, 2019

lfdeep commented Jun 11, 2019

pkuCactus commented Aug 6, 2019 •

edited

one GPU #39

one GPU #39

Comments

gh2517956473 commented May 14, 2019

YuwenXiong commented May 14, 2019

gh2517956473 commented May 15, 2019

lfdeep commented Jun 10, 2019

lfdeep commented Jun 11, 2019

pkuCactus commented Aug 6, 2019 • edited

pkuCactus commented Aug 6, 2019 •

edited