Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error when trying to use multiple GPUs: nnconv_forward_blas: CUDA error [cuda: forward: pointer does not correspond to a registered memory region] #50

Open
junaidmalik09 opened this issue Mar 7, 2017 · 5 comments

Comments

@junaidmalik09
Copy link

I'm gettng a CUDA memory error when trying to run my FCN code on more than 1 GPUs. As soon as the training finishes in the first epoch and validation is about to start, I get the following

cuda1

the error is trailed by a string of "cudaHostUnregister failied" warnings as shown below:

cuda2

Has anyone faced a similar issue? If yes, any help would be much appreciated. Thanks!

@junaidmalik09
Copy link
Author

apparently, this only happens when using Matconvnet 23 (latest). I switched back to matconvnet 17 which I had used earlier and the error disappears

@nicjac
Copy link

nicjac commented Apr 12, 2017

I got the same issue. Any pointer as to how to fix this?

@jotaf98
Copy link

jotaf98 commented Apr 13, 2017

A couple of things to try:

  1. The multi-GPU code has changed a bit recently; do you still have the same issue with Matconvnet 24?
  2. Try switching between opts.parameterServer.method = 'mmap' and 'tmove'

@zoraZHJ
Copy link

zoraZHJ commented Apr 28, 2017

on windows, opts.parameterServer.method = 'mmap'
and on linux Ubuntu, set to 'tmove'

@Mirsadeghi
Copy link

I got the same issue! Any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants