Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadError: context should be active #51

Open
JakubHanko opened this issue Sep 8, 2022 · 4 comments
Open

LoadError: context should be active #51

JakubHanko opened this issue Sep 8, 2022 · 4 comments

Comments

@JakubHanko
Copy link

Hello, occasionally I get this error whenever using CUDA.

Training a chain quantizer
 -2 2.394080e+04... 0.24 secs updating C
ERROR: LoadError: context should be active
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] device at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/context.jl:165 [inlined]
 [3] (::getfield(CuArrays.CUBLAS, Symbol("##3#5")))() at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/CUBLAS.jl:25
 [4] get!(::getfield(CuArrays.CUBLAS, Symbol("##3#5")), ::Dict{CUDAdrv.CuContext,Ptr{Nothing}}, ::CUDAdrv.CuContext) at ./dict.jl:453
 [5] handle at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/CUBLAS.jl:20 [inlined]
 [6] macro expansion at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/error.jl:43 [inlined]
 [7] gemm!(::Char, ::Char, ::Float32, ::CuArrays.CuArray{Float32,2}, ::CuArrays.CuArray{Float32,2}, ::Float32, ::CuArrays.CuArray{Float32,2}) at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/wrappers.jl:888
 [8] gemm at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/wrappers.jl:903 [inlined]
 [9] quantize_chainq_cuda!(::Array{Int16,2}, ::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Array{Array{Float32,2},1}, ::UnitRange{Int64}) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:239
 [10] quantize_chainq(::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Bool, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:325
 [11] train_chainq(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:401
 [12] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:57
 [13] top-level scope at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [14] top-level scope at ./none:0
 [15] include at ./boot.jl:326 [inlined]
 [16] include_relative(::Module, ::String) at ./loading.jl:1038
 [17] include(::Module, ::String) at ./sysimg.jl:29
 [18] include(::String) at ./client.jl:403
 [19] top-level scope at none:0
in expression starting at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

What is weird is that sometimes it lets me train both ChainQ and LSQ, sometimes I get this error. Does anyone have any pointers what could possibly be the error?

@una-dinosauria
Copy link
Owner

Oh jeez, it seems like the cuda context is getting garbage collected or something.

To be honest, the CUDA ecosystem in julia was quite unstable back then, and I had to hack a bunch of things to make it work. Could you please share your OS, julia version, command you ran, and other details that could help me reproduce this issue on my end?

@JakubHanko
Copy link
Author

Sure, OS is Red Hat Enterprise Linux 8.6 and Kernel is Linux 4.18.0-372.19.1.el8_6.x86_64. I am using Julia 1.1.1.

There was this wonky behavior when building Rayuela for the first time where some of the libraries' versions didn't match
the versions in Manifest.toml so I am providing the current versions as well:

 Installed Requires ───────────── v0.5.2
 Installed Adapt ──────────────── v0.4.2
 Installed Rmath ──────────────── v0.6.0
 Installed AbstractFFTs ───────── v0.4.1
 Installed NaNMath ────────────── v0.3.7
 Installed HDF5 ───────────────── v0.12.5
 Installed QuadGK ─────────────── v2.5.0
 Installed JSON ───────────────── v0.21.3
 Installed StatsAPI ───────────── v1.5.0
 Installed CommonSubexpressions ─ v0.3.0
 Installed DataAPI ────────────── v1.10.0
 Installed FFTW ───────────────── v0.3.0
 Installed GPUArrays ──────────── v0.6.1
 Installed CMakeWrapper ───────── v0.2.4
 Installed BinDeps ────────────── v1.0.2
 Installed Arpack ─────────────── v0.3.2
 Installed DataStructures ─────── v0.17.20
 Installed Distributions ──────── v0.21.9
 Installed NearestNeighbors ───── v0.4.11
 Installed NNlib ──────────────── v0.5.0
 Installed Distances ──────────── v0.10.7
 Installed DiffResults ────────── v1.0.3
 Installed MacroTools ─────────── v0.5.9
 Installed BinaryProvider ─────── v0.5.10
 Installed StaticArrays ───────── v0.12.5
 Installed ForwardDiff ────────── v0.10.18
 Installed Missings ───────────── v0.4.5
 Installed SortingAlgorithms ──── v0.3.1
 Installed CMake ──────────────── v1.2.0
 Installed URIParser ──────────── v0.4.1
 Installed UnPack ─────────────── v1.0.2
 Installed RecipesBase ────────── v1.2.1
 Installed CUDAdrv ────────────── v1.0.1
 Installed CUDAnative ─────────── v1.0.1
 Installed PDMats ─────────────── v0.9.12
 Installed FillArrays ─────────── v0.5.0
 Installed Parsers ────────────── v2.4.0
 Installed StatsFuns ──────────── v0.9.8
 Installed Compat ─────────────── v2.2.1
 Installed VersionParsing ─────── v1.3.0
 Installed Clustering ─────────── v0.14.2
 Installed CuArrays ───────────── v0.9.1
 Installed LLVM ───────────────── v1.1.0
 Installed Parameters ─────────── v0.12.3
 Installed Reexport ───────────── v0.2.0
 Installed CUDAapi ────────────── v0.6.3
 Installed Blosc ──────────────── v0.5.1
 Installed SpecialFunctions ───── v0.8.0
 Installed LogExpFunctions ────── v0.2.5
 Installed DocStringExtensions ── v0.8.6
 Installed IterativeSolvers ───── v0.8.5
 Installed DiffRules ──────────── v0.1.0
 Installed Conda ──────────────── v1.5.2
 Installed OrderedCollections ─── v1.4.1
 Installed StatsBase ──────────── v0.32.2

I am running demos_train_query_base.jl in Julia REPL using include(...). I have commented out lines 29 through 48 and ran the program. I have applied the fix from the other issue, otherwise I get an error much sooner. On top of that, I am running @time in OPQ.jl:186.

What is annoying is that sometimes it lets me through both ChainQ and LSQ training and sometimes it crashes with this error. Seemingly nondeterministically. Weird.

@JakubHanko
Copy link
Author

I got one more CUDA-related error on a custom dataset (100k x 4096) which I unfortunately cannot share so I will
understand if you are unable to help here.

Running CUDA LSQ training...
**********************************************************************************************
Training LSQ GPU with 7 codebooks, 4 perturbations, 4 icm iterations and random order = true
**********************************************************************************************
Doing fast bin codebook update... done in 1.438 seconds.
 -2 1.823259e+03
Creating 50000 random states... done in 0.02 seconds
ERROR: LoadError: CUDA error: invalid argument (code #1, ERROR_INVALID_VALUE)
Stacktrace:
 [1] macro expansion at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/base.jl:147 [inlined]
 [2] macro expansion at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:90 [inlined]
 [3] macro expansion at ./gcutils.jl:87 [inlined]
 [4] macro expansion at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:88 [inlined]
 [5] _launch at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:68 [inlined]
 [6] launch at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:60 [inlined]
 [7] macro expansion at ./gcutils.jl:87 [inlined]
 [8] macro expansion at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:171 [inlined]
 [9] #_cudacall#24(::Int64, ::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::typeof(CUDAdrv._cudacall), ::CUDAdrv.CuFunction, ::Type{Tuple{Ptr{Float32},Ptr{Float32},Ptr{UInt8},Ptr{Float32},Int32,Int32,Int32}}, ::Tuple{CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,Int32,Int32,Int32}) at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:154
 [10] (::getfield(CUDAdrv, Symbol("#kw##_cudacall")))(::NamedTuple{(:blocks, :threads, :shmem),Tuple{Int64,Tuple{Int64,Int64},Int64}}, ::typeof(CUDAdrv._cudacall), ::CUDAdrv.CuFunction, ::Type, ::Tuple{CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,Int32,Int32,Int32}) at ./none:0
 [11] #cudacall#22 at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/execution.jl:139 [inlined]
 [12] (::getfield(CUDAdrv, Symbol("#kw##cudacall")))(::NamedTuple{(:blocks, :threads, :shmem),Tuple{Int64,Tuple{Int64,Int64},Int64}}, ::typeof(CUDAdrv.cudacall), ::CUDAdrv.CuFunction, ::NTuple{7,DataType}, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::Int32, ::Int32, ::Int32) at ./none:0
 [13] veccost2(::Int64, ::Tuple{Int64,Int64}, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::Int32, ::Int32, ::Int32) at /home/xhanko1/.julia/dev/Rayuela/src/CudaUtilsModule.jl:106
 [14] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/LSQ_GPU.jl:116
 [15] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/LSQ_GPU.jl:249
 [16] train_lsq_cuda(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/LSQ_GPU.jl:300
 [17] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/LSQ_GPU.jl:345
 [18] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/xhanko1/.julia/dev/Rayuela/demos/demo_profiset.jl:70
 [19] top-level scope at /home/xhanko1/.julia/dev/Rayuela/demos/demo_profiset.jl:98 [inlined]
 [20] top-level scope at ./none:0
 [21] include at ./boot.jl:326 [inlined]
 [22] include_relative(::Module, ::String) at ./loading.jl:1038
 [23] include(::Module, ::String) at ./sysimg.jl:29
 [24] include(::String) at ./client.jl:403
 [25] top-level scope at none:0
in expression starting at /home/xhanko1/.julia/dev/Rayuela/demos/demo_profiset.jl:97

This never happens on SIFT1M where if I don't get the context error everything runs fine. Do you have any idea what could
be the issue here? I did successfully run the previous methods (PQ, OPQ, RVQ, ERVQ) albeit it took much more time than
in case of SIFT1M which makes me believe that the LSQ implementation cannot handle data of this dimensionality?
Could my assumption be correct?

Cheers.

@una-dinosauria
Copy link
Owner

Regarding the last comment, the CUDA kernels have some hardcoded values they expect in eg data dimensionality. You kind of have to do that if you want to squeeze the last bits of performance..., so that could be the issue, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants