Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么用单卡训练? #7

Closed
xinhaojin opened this issue Nov 29, 2022 · 5 comments
Closed

怎么用单卡训练? #7

xinhaojin opened this issue Nov 29, 2022 · 5 comments

Comments

@xinhaojin
Copy link

如题

@arkerman
Copy link

@xinhaojin 哈喽,这个不能单卡训练吗?我只有一块GPU运行报错,好像是关于显卡的问题

@jyqi
Copy link
Collaborator

jyqi commented Nov 30, 2022

你好,将--nproc_per_node设为1即可进行单卡训练。

@arkerman
Copy link

@jyqi 你好帅哥,我已经按照你的提示设置了单卡,但是报了以下的错误:

2022-11-30 10:10:14.802 | ERROR | main::68 - An error has been caught in function '', process 'MainProcess' (25404), thread 'M
ainThread' (8124):
Traceback (most recent call last):

File "tools\train.py", line 68, in
main()
└ <function main at 0x00000253ED19A318>

File "tools\train.py", line 53, in main
torch.distributed.init_process_group(backend='gloo', init_method='env://')
│ │ └ <function init_process_group at 0x00000253EC862DC8>
│ └ <module 'torch.distributed' from 'D:\Anaconda3\envs\DAMO-YOLO\lib\site-packages\torch\distributed\init.py'>
└ <module 'torch' from 'D:\Anaconda3\envs\DAMO-YOLO\lib\site-packages\torch\init.py'>

File "D:\Anaconda3\envs\DAMO-YOLO\lib\site-packages\torch\distributed\distributed_c10d.py", line 421, in init_process_group
init_method, rank, world_size, timeout=timeout
│ │ │ └ datetime.timedelta(seconds=1800)
│ │ └ -1
│ └ -1
└ 'env://'

File "D:\Anaconda3\envs\DAMO-YOLO\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
│ └ <property object at 0x00000253E9FE0AE8>
└ ParseResult(scheme='env', netloc='', path='', params='', query='', fragment='')

RuntimeError: No rendezvous handler for env://

这是啥原因呀?

@jyqi
Copy link
Collaborator

jyqi commented Nov 30, 2022

你好,看到你的backend改成了gloo,是因为在windows平台训练吗?我们在linux平台上尝试单卡训练,没有碰到过这个错误。可以排查下是否平台原因。

@arkerman
Copy link

arkerman commented Dec 1, 2022

@jyqi 好的,谢谢你的建议

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants