Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 1 #37

Closed
longmalongma opened this issue Apr 16, 2021 · 5 comments

Comments

@longmalongma
Copy link

image

image

鑫哥,我有一个四卡2080ti的服务器,一个程序占了一部分内存,这是剩余的内存,要复现您的代码需要怎么设置resnet101_cfbi?batch_size和gpu怎么设置?我试了好多组合都是这个报错。

@longmalongma
Copy link
Author

image

image

鑫哥,我有一个四卡2080ti的服务器,一个程序占了一部分内存,这是剩余的内存,要复现您的代码需要怎么设置resnet101_cfbi?batch_size和gpu怎么设置?我试了好多组合都是这个报错。

另外,要是超算上用4个16g的特斯卡跑的话,用您的默认的设置是不是就可以?

@z-x-yang
Copy link
Owner

As the printed log, "your training has been finished." Thus, the training process was exited.

@longmalongma
Copy link
Author

As the printed log, "your training has been finished." Thus, the training process was exited.

Thanks for your reply, how to start to retrain?

@z-x-yang
Copy link
Owner

Change the experiment name in your config.

An experiment can't be finished twice.

@longmalongma
Copy link
Author

config

Ok, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants