Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to run on Windows? #70

Open
jaccarmac opened this issue Jun 23, 2016 · 52 comments
Open

Any way to run on Windows? #70

jaccarmac opened this issue Jun 23, 2016 · 52 comments

Comments

@jaccarmac
Copy link

Setting up cl-cuda seems to hook into gcc to create the FFI. GCC is well and good thanks to MSYS2/MinGW64, but apparently the CUDA toolkit and MinGW don't play nice together. Is there any way to set up cl-cuda to use the Windows CUDA toolchain?

@takagi
Copy link
Owner

takagi commented Jun 23, 2016

I did not try cl-cuda on Windows, but I suppose that if you could satisfy the following points, cl-cuda would run on Windows even natively without MSYS/MinGW help. How about these?

  • Running NVCC on commaond line.
  • Running external commands from Common Lisp.
  • Calling libcuda.dll via CFFI.

@jaccarmac
Copy link
Author

jaccarmac commented Jun 24, 2016

nvcc works, haven't tried it with actual input files but it is on the PATH. Will try to compile samples and see what happens.

nvcc can be run through SBCL (sb-ext:run-program "nvcc" nil :search t).

Can't find libcuda.dll on my system, even though I have a CUDA card and have installed the developer SDK. Is that a secondary dependency? I'll do some more research momentarily.

@jaccarmac
Copy link
Author

I can find cuda.lib but no cuda.dll.

@takagi
Copy link
Owner

takagi commented Jun 24, 2016

Cupy https://github.com/pfnet/chainer/tree/master/cupy does the almost same thing with cl-cuda in Python, generating CUDA C codes, compiling them with NVCC and launching kernels, and it works on Windows as well, so it should be possible.

@jaccarmac
Copy link
Author

My installation was slightly borked due to the lack of a valid Visual Studio version. That problem is fixed and my environment is actually working now, but I still can't find the right DLL(s). Haven't taken a look at exactly what cupy does yet. Here are the DLLs I can find.

cublas64_75.dll
cudart32_75.dll
cudart64_75.dll
cufft64_75.dll
cufftw64_75.dll
cuinj32_75.dll
cuinj64_75.dll
curand64_75.dll
cusolver64_75.dll
cusparse64_75.dll
nppc64_75.dll
nppi64_75.dll
npps64_75.dll
nvblas64_75.dll
nvrtc64_75.dll
nvrtc-builtins64_75.dll

@takagi
Copy link
Owner

takagi commented Jun 24, 2016

This https://developer.nvidia.com/cuda-faq says that needed to use the driver API is "nvcuda.dll" and it is included as part of the standard NVIDIA driver install. Would you find it in Windows system folders such as System32? Cl-cuda uses the driver API only.

@jaccarmac
Copy link
Author

Appears to work on SBCL for me.

* (ql:quickload :cffi)
To load "cffi":
  Load 1 ASDF system:
    cffi
; Loading "cffi"
........
(:CFFI)
* (cffi:load-foreign-library "nvcuda")

#<CFFI:FOREIGN-LIBRARY NVCUDA-523 "nvcuda">

@takagi
Copy link
Owner

takagi commented Jun 25, 2016

Okay, then you should be able to load cl-cuda with nvcuda.dll.

(ql:quickload :cl-cuda)

Please set *nvcc-binary* to the path to NVCC compiler and try to run some sample programs.

(setf cl-cuda:*nvcc-binary* #P"path\to\nvcc")

(ql:quickload :cl-cuda-examples)
(cl-cuda-examples.vector-add:main)

You may need to pass some options to nvcc via *nvcc-options*, please let me know what you will get.

@jaccarmac
Copy link
Author

Can't even load the system in the first place thanks to an error groveling a file in cl-cuda. The full stacktrace from SLIME, since I'm very unfamiliar with native integration in SBCL.

Couldn't execute "gcc": The system cannot find the file specified.
   [Condition of type CFFI-GROVEL:GROVEL-ERROR]

Restarts:
 0: [RETRY] Retry PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">.
 1: [ACCEPT] Continue, treating PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel"> as having been successful.
 2: [RETRY] Retry ASDF operation.
 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
 4: [ABORT] Give up on "cl-cuda"
 5: [RETRY] Retry SLIME REPL evaluation request.
 --more--

Backtrace:
  0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<SIMPLE-ERROR "Couldn't execute ~S: ~A" {1006781B33}>)
  1: ((FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
  2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {9F2DDBB}>)
  3: (CFFI-GROVEL:PROCESS-GROVEL-FILE #P"C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/src/driver-api/type-grovel.lisp" #P"C:/Users/jaccarmac/AppData/Local/cache/common-lisp/sbcl-1.3.6-win-x..
  4: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  5: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">)
  6: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  7: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#1# . ..
  8: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
  9: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#..
 10: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
 11: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {1003E29C63}> :VERBOSE NIL) [fast-method]
 12: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 13: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL)
 14: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 15: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 16: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL)
 17: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 18: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {1003E1B22B}> :OVERRIDE NIL :KEY NIL)
 19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 20: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 21: (ASDF/OPERATE:LOAD-SYSTEM "cl-cuda" :VERBOSE NIL)
 22: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {1003D8125B}>)
 23: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-cuda" :PROMPT NIL)
 24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1004559C2B}>) [fast-method]
 25: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {10066F0833}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1004559C2B}>) [fast-me..
 26: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
 27: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {100453EAFB}>)
 28: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA) #<NULL-LEXENV>)
 29: (EVAL (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA))
 30: (SWANK::EVAL-REGION "(ql:quickload :cl-cuda) ..)
 31: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
 32: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E25B}>)
 33: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E1BB}>)
 34: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {100453E19B}>)
 35: (SWANK-REPL::REPL-EVAL "(ql:quickload :cl-cuda) ..)
 36: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 37: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 38: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 39: (SWANK::PROCESS-REQUESTS NIL)
 40: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 41: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 42: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {1003DD000B}>)
 43: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "c:/Users/jaccarmac/.emacs.d/elpa/slime-20160614.1214/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-R..
 44: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1003C7EB13}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1003D8F743}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 45: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1003220523}> NIL)
 46: ((FLET #:WITHOUT-INTERRUPTS-BODY-1161 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 47: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 48: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
 49: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {9F2FB5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "..
 50: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1003DC8033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1003DBFF9B}> (#<SB-THREAD:THREAD "re..
 51: ("foreign function: #x42E6FC")
 52: ("foreign function: #x40334E")
 53: ("foreign function: #x8B6FE0")

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

Ah... grovel... I missed you mentioned first with nvcc. While I will think of some working around, how did you failed on MSYS2/MinGW64 at frist?

but apparently the CUDA toolkit and MinGW don't play nice together.

@jaccarmac
Copy link
Author

AFAICT (definitely not an expert systems programmer :-), NVIDIA distributes their dev environment as binaries, but provide .libs for MSVC instead of DLL's, which means you have to do low level lib twiddling to get them to link against MinGW's libc.

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

Is it possible to call nvcuda.dll from SBCL on MinGW?

  • Running NVCC on commaond line.
  • Running external commands from Common Lisp.
  • Calling libcuda.dll via CFFI.
  • Groveling cuda.h with gcc. 

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

I suppose that MinGW has a feature to call DLLs as well as GNU libraries, though not familiar with its calling convension.

@jaccarmac
Copy link
Author

MinGW does use DLLs as its shared library format, but as I understand it they are linked to an old msvcr.dll. In any case, here are the results from running SBCL from inside a MinGW64 shell.

Subprocess (:PROCESS #<SB-IMPL::PROCESS :EXITED 1>)
 with command ("gcc" "-m64" "-o"
               "C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel-tmpGHU3ALSV.exe"
               "-IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/"
               "C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel.c")
 exited with error code 1
   [Condition of type CFFI-GROVEL:GROVEL-ERROR]

Restarts:
 0: [RETRY] Retry PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">.
 1: [ACCEPT] Continue, treating PROCESS-OP on #<CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel"> as having been successful.
 2: [RETRY] Retry ASDF operation.
 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
 4: [ABORT] Give up on "cl-cuda"
 5: [RETRY] Retry SLIME REPL evaluation request.
 --more--

Backtrace:
  0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {100614BC93}>)
  1: ((FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
  2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (FLET #:THUNK :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {9EEDDBB}>)
  3: (CFFI-GROVEL:PROCESS-GROVEL-FILE #P"C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/src/driver-api/type-grovel.lisp" #P"C:/Users/jaccarmac/AppData/Local/cache/common-lisp/sbcl-1.3.6-win-x..
  4: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  5: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">)
  6: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CL-CUDA-ASD::CUDA-GROVEL-FILE "cl-cuda" "src" "driver-api" "type-grovel">) [fast-method]
  7: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#1# . ..
  8: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
  9: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #2=#<ASDF/SYSTEM:SYSTEM "uiop">) (#<ASDF/LISP-ACTION:COMPILE-OP > . #2#) (#3=#<ASDF/LISP-ACTION:LOAD-OP > . #2#) (#..
 10: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
 11: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {1003781C63}> :VERBOSE NIL) [fast-method]
 12: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 13: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL)
 14: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 15: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-cuda"> :VERBOSE NIL) [fast-method]
 16: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL)
 17: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 18: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {100377322B}> :OVERRIDE NIL :KEY NIL)
 19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 20: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-cuda" :VERBOSE NIL) [fast-method]
 21: (ASDF/OPERATE:LOAD-SYSTEM "cl-cuda" :VERBOSE NIL)
 22: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {100371125B}>)
 23: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-cuda" :PROMPT NIL)
 24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DE55FB}>) [fast-method]
 25: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {10066F0833}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DE55FB}>) [fast-me..
 26: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
 27: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {1003DC330B}>)
 28: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA) #<NULL-LEXENV>)
 29: (EVAL (QUICKLISP-CLIENT:QUICKLOAD :CL-CUDA))
 30: (SWANK::EVAL-REGION "(ql:quickload :cl-cuda) ..)
 31: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
 32: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC2A6B}>)
 33: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC29CB}>)
 34: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1003DC29AB}>)
 35: (SWANK-REPL::REPL-EVAL "(ql:quickload :cl-cuda) ..)
 36: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 37: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 38: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload :cl-cuda) ..)
 39: (SWANK::PROCESS-REQUESTS NIL)
 40: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 41: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 42: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {1003DC000B}>)
 43: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "c:/Users/jaccarmac/.emacs.d/elpa/slime-20160614.1214/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-R..
 44: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1003C76B13}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1003D87DF3}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 45: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1003220523}> NIL)
 46: ((FLET #:WITHOUT-INTERRUPTS-BODY-1161 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 47: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 48: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
 49: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {9EEFB5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "..
 50: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1003DB8033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1003DB7F9B}> (#<SB-THREAD:THREAD "re..
 51: ("foreign function: #x42E6FC")
 52: ("foreign function: #x40334E")
 53: ("foreign function: #x2637DA0")

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

What does gcc return if directly executed? You would be able to find some error messages.

gcc -m64 -o C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel-tmpGHU3ALSV.exe -IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/ C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel.c

@jaccarmac
Copy link
Author

Command as written fails because C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\Users\jaccarmac\software\quicklisp\local-projects\cl-cuda\src\driver-api\type-grovel__grovel-tmpGHU3ALSV.exe is not a valid path.

@jaccarmac
Copy link
Author

(Note the C\ in the middle of the pathname.)

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

You do not have the path C:\Users\jaccarmac\AppData\Local\cache\common-lisp\sbcl-1.3.6-win-x64\C\? Or isn't it because of escape sequences, how about this?

gcc -m64 -o C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel-tmpGHU3ALSV.exe -IC:/Users/jaccarmac/software/quicklisp/dists/quicklisp/software/cffi_0.17.1/ C:\\Users\\jaccarmac\\AppData\\Local\\cache\\common-lisp\\sbcl-1.3.6-win-x64\\C\\Users\\jaccarmac\\software\\quicklisp\\local-projects\\cl-cuda\\src\\driver-api\\type-grovel__grovel.c

@jaccarmac
Copy link
Author

Aha, that was it. cuda.h is missing from gcc's search path.

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

Please add the path to cuda.h to environment variable C_INCLUDE_PATH. Do you find cuda.h on your environment?

@jaccarmac
Copy link
Author

That seems to work! Thanks!

@jaccarmac
Copy link
Author

I see you have a list of supported environments in the README. If you let me know how to run the test suite, I can verify it passes and submit a PR with my specifics.

@jaccarmac
Copy link
Author

Test programs or (asdf:oos 'asdf:test-op '#:cl-cuda) both fail with an alien function cuInit is undefined.

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

It helps me a lot. You can run the test with (ql:quickload :cl-cuda-test) .

@jaccarmac
Copy link
Author

jaccarmac commented Jun 26, 2016

The alien function "cuInit" is undefined. is what I'm still getting. Natively or from MSYS console, with or without changes to PATH or C_INCLUDE_PATH.

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

cuInit should be defined via CFFI:DEFCFUN in cl-cuda/src/driver-api/function.lisp. There may be something left to call API in nvcuda.dll. Let me think a while.

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

Would you try it again with the following fix in cl-cuda/src/driver-api/library.lisp ?

  (cffi:define-foreign-library libcuda
+   (:windows "nvcuda.dll")
+   ; (:windows "nvcuda.dll" :convention :stdcall)
    (:darwin (:framework "CUDA"))
    (:unix (:or "libcuda.so" "libcuda64.so")))

At least, The alien function "cuInit" is undefined. is because of missing a line on Windows in foreign library definition, but I do not know :convention :stdcall is required or not.

@jaccarmac
Copy link
Author

Both versions seem to work, and further into the test suite we get The function OSICAT-POSIX:MKTEMP is undefined..

@jaccarmac
Copy link
Author

This test also fails further up the chain.

 ? basic case 4
    "float3_add( __make_float3( 1.0f, 1.0f, 1.0f ), __make_float3( 2.0f, 2.0f, 2.0f ) )" is expected to be "float3_add( make_float3( 1.0f, 1.0f, 1.0f ), make_float3( 2.0f, 2.0f, 2.0f ) )"

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

Both versions seem to work, and further into the test suite we get The function OSICAT-POSIX:MKTEMP is undefined..

OSICAT-POSIX:MKTEMP might not work on Windows, please apply this patch as working around. I use MKTEMP just for making temporary file name. I will fix it later.

cl-cuda/src/api/nvcc.lisp

  (defun get-cu-path ()
+   (let ((name "cl-cuda.tmp"))
-   (let ((name (format nil "cl-cuda.~A" (osicat-posix:mktemp))))
      (make-pathname :name name :type "cu" :defaults (get-tmp-path))))

@takagi
Copy link
Owner

takagi commented Jun 26, 2016

This is that the test is wrong, I will fix it. You can ignore this.

 ? basic case 4
    "float3_add( __make_float3( 1.0f, 1.0f, 1.0f ), __make_float3( 2.0f, 2.0f, 2.0f ) )" is expected to be "float3_add( make_float3( 1.0f, 1.0f, 1.0f ), make_float3( 2.0f, 2.0f, 2.0f ) )"

@jaccarmac
Copy link
Author

All right, now the command nvcc -arch=sm_30 -I C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda.tmp.ptx /tmp/cl-cuda.tmp.cu is failing with nvcc fatal : Cannot find compiler 'cl.exe' in PATH.

It seems like NVCC is designed to run from inside Visual Studio on Windows. I'll see what I can do to fix the path.

@jaccarmac
Copy link
Author

Indeed, running SBCL from Visual Studio's CMD makes many of the tests run. Then, nvcc -arch=sm_30 -I C:/Users/jaccarmac/software/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda.tmp.ptx /tmp/cl-cuda.tmp.cu fails.

cl-cuda.tmp.cu
C:/tmp/cl-cuda.tmp.cu(42): warning: variable "i" was declared but never referenced

C:/tmp/cl-cuda.tmp.cu(46): warning: dynamic initialization in unreachable code

C:/tmp/cl-cuda.tmp.cu(46): warning: variable "i" was declared but never referenced

C:/tmp/cl-cuda.tmp.cu(105): error: expected an expression

1 error detected in the compilation of "C:/Users/JACCAR~1/AppData/Local/Temp/tmpxft_000009cc_00000000-8_cl-cuda.tmp.cpp1.ii".

@takagi
Copy link
Owner

takagi commented Jun 27, 2016

Can I see the part of failed /tmp/cl-cuda.tmp.cu ? It might be caused by difference in gcc and cl.exe.

@jaccarmac
Copy link
Author

@takagi
Copy link
Owner

takagi commented Jun 27, 2016

It seems that cl.exe does not accept struct initializers as expressions. I will give you a patch, I want to make cl.exe cause no errors in these tests.

@takagi
Copy link
Owner

takagi commented Jun 27, 2016

Wait a while because I'm at work now.

@jaccarmac
Copy link
Author

In that case, I'll take a moment to thank you for the manner in which you're handling the ticket. Really appreciate the promptness of responses and willingness to direct my exploration of the problem :-)!

@takagi
Copy link
Owner

takagi commented Jun 27, 2016

You also help me a lot to let me know how cl-cuda goes on Windows. Thanks.

@takagi
Copy link
Owner

takagi commented Jun 27, 2016

Would you try this patch? This disables compiling to compound literals.

cl-cuda/src/lang/built-in.lisp

  ;; built-in vector constructor
- float3 (((float float float) float3 nil "__make_float3"))
- float4 (((float float float float) float4 nil "__make_float4"))
- double3 (((double double double) double3 nil "__make_double3"))
- double4 (((double double double double) double4 nil "__make_double4"))
+ float3 (((float float float) float3 nil "make_float3"))
+ float4 (((float float float float) float4 nil "make_float4"))
+ double3 (((double double double) double3 nil "make_double3"))
+ double4 (((double double double double) double4 nil "make_double4"))

cl-cuda/t/api/defkernel.lisp

  ;;;
  ;;; Initializers
  ;;;

+ #+nil
  (defglobal c (float3 3.0 2.0 1.0))

+ #+nil
  (defkernel initializer (float3 ())
    (let ((x 1.0))
      (return (float3 x 2.0 3.0))))

+ #+nil
  (defkernel use-initializer (void ((x float3*) (y float3*)))
    (set (aref x 0) (initializer))
    (set (aref y 0) c)
    (return))

+ #+nil
  (subtest "Initializers"

    (with-cuda (0)
      (with-memory-blocks ((x 'float3 1)
                           (y 'float3 1))
        (use-initializer x y :grid-dim (list 1 1 1)
                             :block-dim (list 1 1 1))
        (sync-memory-block x :device-to-host)
        (sync-memory-block y :device-to-host)
        (is (memory-block-aref x 0)
          (make-float3 1.0 2.0 3.0)
            :test #'float3-=
            "Ok. - returning with initializer")
        (is (memory-block-aref y 0)
          (make-float3 3.0 2.0 1.0)
            :test #'float3-=
            "Ok. - initializing with initializer"))))

@jaccarmac
Copy link
Author

Test suite loads now, but with tons of errors. The colors don't render properly in cmd.exe, so give me a few minutes to figure out how to get the VS2013 dev tools onto a PATH inside a better terminal emulator.

@jaccarmac
Copy link
Author

No red tests. There is a ton of grey, however, and several places red times show up. I'm assuming grey is passed tests and red times in ms mean the test ran long, but not familiar enough with the test framework to say.

@takagi
Copy link
Owner

takagi commented Jul 1, 2016

Thanks. Sounds good. Would you try this sample code? This computes elementwise addition across two arrays. c[i] = a[i] + b[i]

(ql:quickload :cl-cuda)
(load #P"cl-cuda/examples/vector-add.lisp")
(cl-cuda-examples.vector-add:main)
(setf cl-cuda:*show-messages* nil)

If you get the following message, cl-cuda does work on Windows.

verification succeed.

@jaccarmac
Copy link
Author

That does indeed show up!

@takagi
Copy link
Owner

takagi commented Jul 1, 2016

Great! I hope you will enjoy CUDA in Common Lisp.

And can I have your environment?

  • Windows and Visual Studio versions
  • Native or MSYS2/MinGW64, if latter its version
  • GPU
  • CUDA toolkit version
  • CL implementation and its version

@takagi
Copy link
Owner

takagi commented Jul 1, 2016

I will note it on README.

@jaccarmac
Copy link
Author

  • Windows 10 Insider Preview 14379.
  • Microsoft Visual Studio Community 2013 with Update 5.
  • MSYS2/MinGW64 with GCC 5.4.0.
  • NVIDIA Quadro K2100M.
  • NVIDIA CUDA Toolkit 7.5.
  • SBCL 1.3.6.

@takagi
Copy link
Owner

takagi commented Jul 2, 2016

Is this result on MSYS2/MinGW64?

That does indeed show up!

@jaccarmac
Copy link
Author

No, that's the trick of this method. You have to load the system in MSYS2/MinGW64 first so that GCC can and do Lisp's groveling work. On the other hand, subsequent loads need to be done from the VS2013 CMD so NVCC has access to MSBuild. In theory you could use the MSYS2 shell and source vcvarsall.bat or something, but I didn't try it.

@takagi
Copy link
Owner

takagi commented Jul 2, 2016

I got it, thanks.

@jaccarmac
Copy link
Author

Quick update: You can get the MinGW shell running with VS 2013 PATH augmentations, but not the way I thought. The shell can't source batch files AFAICT, so you need to launch MinGW64 from CMD.

First, make sure the following line is uncommented in $MSYS_ROOT/msys2_shell.cmd (where $MSYS_ROOT is where MSYS2 is installed, not an actual environment variable.

set MSYS2_PATH_TYPE=inherit

Then launch the Developer Command Prompt for Visual Studio 2013. From there, run %MSYS_ROOT%\msys2_shell.cmd -mingw64.

You will have a decent terminal emulator with Unix utilities and NVCC on the path properly. This allows using cl-cuda from Emacs, etc, or just a more nicely formatted SBCL prompt.

@Symbolics
Copy link

What is the current status on this? It looks like MS Windows works, but I cannot find reference to it in the README. Looking at the issues, it seems that some of the patches mentioned have not been applied to master (?).

Just taking stock of the situation before I get started. I need cl-cuda on Windows as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants