CUDA out of memory issue #40

dryman · 2019-09-17T15:29:49Z

Sorry to bother again.
What is the minimal memory requirement for the GPU?

Creating 500000 random states... done in 4.35 seconds
ERROR: LoadError: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] macro expansion at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [3] alloc at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [4] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:251
 [5] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [6] macro expansion at ./util.jl:213 [inlined]
 [7] alloc(::Int64) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [8] CuArrays.CuArray{Float32,2}(::Tuple{Int64,Int64}) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [9] similar at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:61 [inlined]
 [10] gemm at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/blas/wrap.jl:903 [inlined]
 [11] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:71
 [12] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:249
 [13] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:352
 [14] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:72
 [15] top-level scope at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [16] top-level scope at ./none:0
 [17] include at ./boot.jl:317 [inlined]
 [18] include_relative(::Module, ::String) at ./loading.jl:1038
 [19] include(::Module, ::String) at ./sysimg.jl:29
 [20] include(::String) at ./client.jl:398
 [21] top-level scope at none:0
in expression starting at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:170

The text was updated successfully, but these errors were encountered:

una-dinosauria · 2019-09-17T15:33:00Z

From our README:

Requirements
This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.

dryman · 2019-09-17T15:49:00Z

Our CUDA GPU is having 8GB and we thought that was enough.

una-dinosauria · 2019-09-17T16:09:48Z

You could try increasing the number of splits (ie, how many chunks the data is split into before passing it to the GPU) to reduce the GPU memory requirement.

(sorry, a bit hardcoded for now).

Rayuela.jl/demos/demos_train_query_base.jl

Lines 61 to 62 in ccf22ab

    
           nsplits_train =  m <= 8 ? 1 : 1 
        
           nsplits_base  =  m <= 8 ? 2 : 4

dryman · 2019-09-17T16:25:30Z

Cool. Setting it as follows seems working for 8GB

nsplits_train =  2
nsplits_base  =  4

una-dinosauria · 2019-09-17T17:40:17Z

I'm glad it's working. Was this the reason behind issue #38?

dryman · 2019-09-17T18:02:57Z

I restarted julia and wasn't able to reproduce #38

Turns out fixing partition size doesn't solve the issue.
CuArrays are not freed.
I saw the memory keep increasing and then it goes out of memory again.
https://discourse.julialang.org/t/freeing-memory-in-the-gpu-with-cudadrv-cudanative-cuarrays/10946/8

Calling GC.gc() doesn't free the underlying CUDA memory. Any clues?

una-dinosauria · 2019-09-23T14:42:18Z

Yes, this is definitely an open issue. The julia GC is a bit of a black box to me, so I never really figured out how to fix this (other than using a larger GPU, which happens to have enough memory for GC to kick in just in time...)

I know this is less than ideal. It might be worth trying out calling CuArray's unsafe_free! function to alleviate the issue.

https://github.com/JuliaGPU/CuArrays.jl/blob/9892999533fa4c234516d777c0978576b3b3ff39/src/array.jl#L26-L32

But I'm sorry I can't provide a better fix.

una-dinosauria · 2019-09-23T17:50:07Z

Related: JuliaGPU/CuArrays.jl#275

dryman closed this as completed Sep 17, 2019

dryman reopened this Sep 17, 2019

una-dinosauria mentioned this issue Sep 23, 2019

LSQ training got stuck #38

Closed

una-dinosauria added the bug label Sep 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory issue #40

CUDA out of memory issue #40

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 23, 2019

una-dinosauria commented Sep 23, 2019

CUDA out of memory issue #40

CUDA out of memory issue #40

Comments

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 17, 2019

dryman commented Sep 17, 2019

una-dinosauria commented Sep 23, 2019

una-dinosauria commented Sep 23, 2019