Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTXDirectSamplerEngine with large sample sizes crashes #7

Closed
gurvesh opened this issue Jul 17, 2019 · 11 comments
Closed

GTXDirectSamplerEngine with large sample sizes crashes #7

gurvesh opened this issue Jul 17, 2019 · 11 comments

Comments

@gurvesh
Copy link

gurvesh commented Jul 17, 2019

Hi Dragan. The implementation of GTXDirectSamplerEngine in uncomplicate.bayadera.internal.device seems a bit off. Basically if you pass in a sample size more than a certain number, it crashes due to some Cuda error (CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS). This is only true for the Gaussian and Uniform distributions, which use this implementation . I know you've implemented Uniform and Gaussian directly into the Neanderthal library v. 25, but Bayadera is incompatible with it. Would you mind please taking a look, when you can? Thanks

@blueberry
Copy link
Member

yes. can you please create a midje test that demonstrates the error?

@gurvesh
Copy link
Author

gurvesh commented Jul 17, 2019

Sure. Here you go:

`
(ns testing-uniform
(:use (clojure repl))
(:require [uncomplicate.neanderthal
[core :refer :all]
[native :refer :all]]
[midje.sweet :refer :all]
[uncomplicate.bayadera
[core :refer [sampler dataset density distribution evidence sample! sample]]
[library :as nl]
[cuda :refer [with-default-bayadera]]]
[uncomplicate.commons.core :refer [with-release]]))

(defn check-sample [n]
(facts
(with-default-bayadera
(with-release [uniform-dist (nl/uniform 1 2)
test-sampler (sampler uniform-dist)
test-sample (sample test-sampler n)]

   ((native (row test-sample 0))) => n ))))

`

Note that:

  1. The above will pass for small values of n (on my computer, n <= 1,000,000 works fine) - but anything higher will kill it (and once the error is triggered, any lower of value of n will still keep triggering it - you need to restart lein).

Exception thrown: clojure.lang.ExceptionInfo (CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS.)

error - uncomplicate.clojurecuda.internal.utils - (utils.clj:37)
fn - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.clojurecuda.internal.impl.CUModule - (impl.clj:91)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXDatasetEngine - (nvidia_gtx.clj:155)
release - uncomplicate.bayadera.internal.device.nvidia-gtx.GTXBayaderaFactory - (nvidia_gtx.clj:763)
check-sample/fn/fn - testing-uniform - (form-init10386720772629001284.clj:14)
with-altered-roots* - midje.util.thread-safe-var-nesting - (thread_safe_var_nesting.clj:32)
check-sample/fn - testing-uniform - (form-init10386720772629001284.clj:14)
applyToHelper - (AFn.java:152)
applyTo - (AFn.java:144)
doInvoke - (AFunction.java:31)
invoke - (RestFn.java:397)
check-one/fn - midje.checking.facts - (facts.clj:32)
check-one - midje.checking.facts - (facts.clj:31)
creation-time-check - midje.checking.facts - (facts.clj:35)
check-sample - testing-uniform - (form-init10386720772629001284.clj:13)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval60767 - testing-uniform - (form-init10386720772629001284.clj:1)
eval - (Compiler.java:7176)
eval - (Compiler.java:7131)
eval - clojure.core - (core.clj:3214)
eval - clojure.core - (core.clj:3210)
repl/read-eval-print/fn - clojure.main - (main.clj:414)
repl/read-eval-print - clojure.main - (main.clj:414)
repl/fn - clojure.main - (main.clj:435)
repl - clojure.main - (main.clj:435)
evaluate - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:106)
interruptible-eval/fn/fn - nrepl.middleware.interruptible-eval - (interruptible_eval.clj:140)
run - (AFn.java:22)
session-exec/main-loop/fn - nrepl.middleware.session - (session.clj:171)
session-exec/main-loop - nrepl.middleware.session - (session.clj:170)
run - (AFn.java:22)
run - (Thread.java:748)
  1. If you replace nl/uniform with nl/beta or any other distribution that doesn't use the GTXDirectSampler, it will work for values of n upto around 400,000,000 (provided the code in the pull request I created is implemented first - otherwise it will fail for other reasons). After that it gives another error of the form:
    Actual:
    java.lang.IllegalArgumentException: Value out of range for int: 4000000000

Hope this helps!

@blueberry
Copy link
Member

OK. Let me finish a book chapter (1-2 days) and then I'll take a look at this and your pull request, and probably update bayadera to the latest neanderthal, too.

@gurvesh
Copy link
Author

gurvesh commented Jul 18, 2019

Look forward to it!

@blueberry
Copy link
Member

I'm working on this. Currently, there are many updates that I need to add to bayadera to make it keep up to the latest neanderthal.

Particularly, switching my AMD GPU from 290X to Vega64 a couple months ago required switching the drivers from old proprietary catalyst to the newest open-source ROCm raised many OpenCL compatibility breakages (their new OpenCL C compiler does not swallow many things that were allowed by catalyst).

This error might be caused by many changes that I made to bayadera in this migration process months ago. Now I'll see to make a thorough pass and update everything. I hope that it will reveal what makes this particular error. Anyway, this might take more than a few afternoons, but expect updates in a few days.

@gurvesh
Copy link
Author

gurvesh commented Jul 21, 2019

Ah OK. That would be great (the "upgrade everything" part). But I'll just say that my error was on Cuda, not OpenCL. And if you look at the GTXDirectSamplerEngine code in the uncomplicate.bayadera.internal.device.nvidia_gtx.clj, it just seems like an older version of the code (lacks with-release etc), while the OpenCL version in the Amd_gcn.clj file (GCNDirectSamplerEngine) has a lot more idiomatic code - as in, similar to the rest of of the library. But yes, getting everything to work with the latest Neanderthal would be awesome!

Btw - any plans of making v1.0.0 of everything anytime soon? :)

@blueberry
Copy link
Member

The CUDA/OpenCL discrepancy is not due to and older version of code, but by constraints on what is possible with cuda and OpenCL. That is exactly why I need to tidy everything up first and then fix this error. Otherwise, CUDA and OpenCL implementation would diverge, which would make them much harder to properly maintain. I am already using some advanced code there, and the behavior is highly dependent on OpenCL drivers, that I regularly do not have anywhere to look up for a solution, and am left to experimenting. In fact, this migration to ROCm caused a block where some AMD code won't run as expected even with fixes that run in other places, and there are zero results for the error and the wierd behavior that I get...

@blueberry
Copy link
Member

Update: I've solved other showstoppers, so this issue will be next to look at (but not today).

@blueberry
Copy link
Member

blueberry commented Jul 24, 2019

About this particular issue: there is a bug in the GTXSamplerEngines that launches too many threads. Instead of (dim res), the number of threads should be 4 times less (let's assume that you use power of two size increments, so this does not support dim=1234567). So, instead of:

(do (launch! sample-kernel (grid-1d (dim res) WGS) hstream

(do (launch! sample-kernel (grid-1d (/ (dim res) 4) WGS) hstream

As I decided to change direct sampler substantially, the update will not come that soon, so please make this change in your own copy of the project.

I am not sure why this error does not manifest with smaller arrays. It should fail with any size, but it doesn't.

@gurvesh
Copy link
Author

gurvesh commented Jul 25, 2019

Thanks a lot for your time, and it works!

@blueberry
Copy link
Member

Fixed in latest commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants