New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any difference about using 'OpenMPI'? #15
Comments
@seongkyun, Yes, you can run the code without MPI. Running the code with MPI will split the independent calculation jobs to different GPUs, and the results calculated by each worker are collected to write to disk. I observed the hanging issue with a newly configured machine, while the same code works well in my machine. Are you able to test the following simple command
|
@ljk628 , Thank you for your replying. And installed requirements are below: |
Thanks. I'll try that |
Hello.
I just tried to run this code with
Ubuntu 16.04 LTS, Geforce TITAN X GPU with Pytorch 0.4.1
While running the code with
mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states --model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 --model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot --show
, nothing has happened.
But just delete
mpirun -n 4
, then code starts running. I think it is in the training process.And after the training process, I can see the plotted results.
Can I run this code without 'OpenMPI'??
I only know that openmpi is just for parallel computation.
So can I use the code with
python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states --model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 --model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot --show
?The text was updated successfully, but these errors were encountered: