Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test error with multi gpu #249

Closed
liuxiao214 opened this issue Dec 29, 2018 · 1 comment
Closed

test error with multi gpu #249

liuxiao214 opened this issue Dec 29, 2018 · 1 comment

Comments

@liuxiao214
Copy link

liuxiao214 commented Dec 29, 2018

hi~ Mr.xiong, i have a problem about test with multi gpus,
first i use my own caffe version, when i test the ucf101, i try change tools/eval_net.py code , like this:

caffe.init_gpu_scope([device_id]) caffe.set_device(device_id) caffe.set_mode_gpu()

and then i test on one gpu it has no error, but when i test with multi gpus, i got these errors,

F1229 13:51:41.069020 9588 gpu_memory.cpp:143] Check failed: error == cudaSuccess (3 vs. 0) initialization error *** Check failure stack trace: *** [172-16-30-14:09588] *** Process received signal *** [172-16-30-14:09588] Signal: Aborted (6) [172-16-30-14:09588] Signal code: (-6) [172-16-30-14:09588] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x7f5ded3965e0] [172-16-30-14:09588] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7f5dec8f01f7] [172-16-30-14:09588] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7f5dec8f18e8] [172-16-30-14:09588] [ 3] /usr/lib64/libglog.so.0(+0xa7d9)[0x7f5d832c47d9] [172-16-30-14:09588] [ 4] /usr/lib64/libglog.so.0(+0xbe6d)[0x7f5d832c5e6d] [172-16-30-14:09588] [ 5] /usr/lib64/libglog.so.0(_ZN6google10LogMessage9SendToLogEv+0x24d)[0x7f5d832c7ced] [172-16-30-14:09588] [ 6] /usr/lib64/libglog.so.0(_ZN6google10LogMessage5FlushEv+0x9c)[0x7f5d832c5a5c] [172-16-30-14:09588] [ 7] /usr/lib64/libglog.so.0(_ZN6google15LogMessageFatalD2Ev+0xe)[0x7f5d832c863e] [172-16-30-14:09588] [ 8] /home/caffe/python/caffe/../../build/lib/libcaffe.so.1.0.0(_ZN5caffe9GPUMemory7Manager15update_dev_infoEi+0x19a)[0x7f5cf47514da]

i print the my_id(my_id = multiprocessing.current_process()._identity[0]), it is 0,1,2.....200+...,
but i only have 8 gpus,
i see the issue 198 and 160, but still have no idea, could you tell me ? Thanks very much.

@liuxiao214
Copy link
Author

i have solved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant