Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory issue #40

Open
dryman opened this issue Sep 17, 2019 · 8 comments
Open

CUDA out of memory issue #40

dryman opened this issue Sep 17, 2019 · 8 comments
Labels

Comments

@dryman
Copy link

dryman commented Sep 17, 2019

Sorry to bother again.
What is the minimal memory requirement for the GPU?

Creating 500000 random states... done in 4.35 seconds
ERROR: LoadError: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] macro expansion at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [3] alloc at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [4] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:251
 [5] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [6] macro expansion at ./util.jl:213 [inlined]
 [7] alloc(::Int64) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [8] CuArrays.CuArray{Float32,2}(::Tuple{Int64,Int64}) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [9] similar at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:61 [inlined]
 [10] gemm at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/blas/wrap.jl:903 [inlined]
 [11] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:71
 [12] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:249
 [13] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:352
 [14] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:72
 [15] top-level scope at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [16] top-level scope at ./none:0
 [17] include at ./boot.jl:317 [inlined]
 [18] include_relative(::Module, ::String) at ./loading.jl:1038
 [19] include(::Module, ::String) at ./sysimg.jl:29
 [20] include(::String) at ./client.jl:398
 [21] top-level scope at none:0
in expression starting at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:170
@una-dinosauria
Copy link
Owner

From our README:

Requirements
This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.

@dryman
Copy link
Author

dryman commented Sep 17, 2019

Our CUDA GPU is having 8GB and we thought that was enough.

@una-dinosauria
Copy link
Owner

You could try increasing the number of splits (ie, how many chunks the data is split into before passing it to the GPU) to reduce the GPU memory requirement.

(sorry, a bit hardcoded for now).

nsplits_train = m <= 8 ? 1 : 1
nsplits_base = m <= 8 ? 2 : 4

@dryman
Copy link
Author

dryman commented Sep 17, 2019

Cool. Setting it as follows seems working for 8GB

nsplits_train =  2
nsplits_base  =  4

@dryman dryman closed this as completed Sep 17, 2019
@una-dinosauria
Copy link
Owner

I'm glad it's working. Was this the reason behind issue #38?

@dryman
Copy link
Author

dryman commented Sep 17, 2019

I restarted julia and wasn't able to reproduce #38

Turns out fixing partition size doesn't solve the issue.
CuArrays are not freed.
I saw the memory keep increasing and then it goes out of memory again.
https://discourse.julialang.org/t/freeing-memory-in-the-gpu-with-cudadrv-cudanative-cuarrays/10946/8

Calling GC.gc() doesn't free the underlying CUDA memory. Any clues?

@una-dinosauria
Copy link
Owner

Yes, this is definitely an open issue. The julia GC is a bit of a black box to me, so I never really figured out how to fix this (other than using a larger GPU, which happens to have enough memory for GC to kick in just in time...)

I know this is less than ideal. It might be worth trying out calling CuArray's unsafe_free! function to alleviate the issue.

https://github.com/JuliaGPU/CuArrays.jl/blob/9892999533fa4c234516d777c0978576b3b3ff39/src/array.jl#L26-L32

But I'm sorry I can't provide a better fix.

@una-dinosauria
Copy link
Owner

Related: JuliaGPU/CuArrays.jl#275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants