mnist example panick #1
Comments
i believe this is due to the opencl lib bug. I tried the example in gpuarray-rs. still doesn't work. it emits the same CL_INVALID_EVENT_WAIT_LIST error. |
What OS, OpenCL driver, and OpenCL device are you using? EDIT: Also, can you run |
in order to build on win 64. i tweaked little bit of the rust-opencl repo. cargo test outputs the following: running 25 tests failures: ---- ops::tensor_divide_axis1 stdout ---- ---- ops::tensor_add stdout ---- ---- ops::tensor_add_reuse stdout ---- ---- ops::tensor_add_axis stdout ---- ---- ops::tensor_dtanh stdout ---- thread 'ops::tensor_dtanh' panicked at 'assertion failed: b.buffer() == ---- ops::tensor_matmul stdout ---- ---- ops::tensor_multiply_axis1 stdout ---- ---- ops::tensor_tanh stdout ---- thread 'ops::tensor_tanh' panicked at 'assertion failed: b.buffer() == ---- ops::test_add_slice stdout ---- ---- ops::test_multiply_slice stdout ---- failures: test result: FAILED. 15 passed; 10 failed; 0 ignored; 0 measured error: test failed |
It seems that it's failing on operations that wait on more than one OpenCL event (events that take only one input like |
a quick test. it doesn't change anything. I tried git pull later. and it doesn't compile now. complaining: |
ah, it's the git pull conflict I didn't resolve. all opencl tests passes. gpuarray improves a lot with the following: running 25 tests failures: ---- ops::tensor_dtanh stdout ---- thread 'ops::tensor_dtanh' panicked at 'assertion failed: b.buffer() == ---- ops::tensor_tanh stdout ---- thread 'ops::tensor_tanh' panicked at 'assertion failed: b.buffer() == failures: test result: FAILED. 23 passed; 2 failed; 0 ignored; 0 measured error: test failed |
also please change the gpuarray context.rs with corresponding:
and change the rust-opencl src/cl.rs with following:
|
umm, interesting. There are some rounding errors. And I don't understand why the code in gpuarray/ops.rs doesn't need to queue.write(args) as opencl examples. |
I applied your patches. In the future you can always do a pull request ;) Yeah the rounding errors make sense. I should probably change those tanh and dtanh unit tests to compare with an epsilon tolerance. The gpuarray operations operate on Tensors, which are basically just a wrapper around CLBuffers. It is the user's responsibility to Tensor::set the buffer with the right data. Does the mnist example work for you now? If so, please close the issue. Thanks for working through it with me! |
yeah mnist works. but the accuracy is just 90% something. Is it expected? if so you can close it. Also why on earth the wait on 2 events doesn't work? i'm curious why you wrote in that way and why it doesn't work in my case. |
Yes, right now the validation accuracy is just 92%. It's just a simple linear model. I have it set up so each Tensor has an associated event that represents the last kernel that wrote to the Tensor's buffer. Operations using a given Tensor as an input need to wait on it's event to make sure the last kernel that wrote to it is done executing. The problem was it was an Option<Rc>, so it was possible to get event lists with NULL events (if the event was None in the Tensor, for instance after creating the Tensor but before any use in kernels). I guessed that your OpenCL implementation didn't like those NULL events, so I created user events and set them as complete when creating Tensors, so events are no longer Options in Tensor. I hope that makes sense! |
stack backtrace:
0: 0x7ff7336a72cc - std::rt::lang_start::hfe4efe1fc39e4a30
1: 0x7ff7336a661d - std::rt::lang_start::hfe4efe1fc39e4a30
2: 0x7ff73369a8fd - std::panicking::rust_panic_with_hook::h587239a80cad02d2
3: 0x7ff7336a996b - rust_begin_unwind
4: 0x7ff73369b99f - std::panicking::begin_panic_fmt::hb3024643f3039337
5: 0x7ff7336942f4 - check
at D:\Dev\deeplearn-rs:8
6: 0x7ff733658be3 - doubledeeplearn::var_store::VarIndex
7: 0x7ff7336584c9 - as_event_listcore::cell::Refopencl::hl::Event,opencl::hl::Event,closure
at C:\Users\davel.cargo\git\checkouts\rust-opencl-34f1354d1798ac72\master\src\hl.rs:864
8: 0x7ff7336582d7 - enqueue_async_kernel<(usize, usize),&[core::option::Optioncore::cell::Refopencl::hl::Event]>
at C:\Users\davel.cargo\git\checkouts\rust-opencl-34f1354d1798ac72\master\src\hl.rs:461
9: 0x7ff733669e1c - matmul
at C:\Users\davel.cargo\git\checkouts\gpuarray-rs-23536b5f6e829730\master\src\ops.rs:129
10: 0x7ff7336698ba - forward
at D:\Dev\deeplearn-rs\src\op.rs:72
11: 0x7ff733653ad7 - forward
at D:\Dev\deeplearn-rs\src\graph.rs:171
12: 0x7ff733647575 - train
at D:\Dev\deeplearn-rs\src\train.rs:24
13: 0x7ff73362232a - main
at D:\Dev\deeplearn-rs\examples\mnist.rs:136
14: 0x7ff7336a5fac - std::rt::lang_start::hfe4efe1fc39e4a30
15: 0x7ff7336abd41 - _rust_maybe_catch_panic
16: 0x7ff7336a5cee - std::rt::lang_start::hfe4efe1fc39e4a30
17: 0x7ff7336477d9 - main
18: 0x7ff7336b427f - __scrt_common_main_seh
at f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255
19: 0x7ffcc4488101 - BaseThreadInitThunk
error: Process didn't exit successfully:
target\debug\examples\mnist.exe
(exit code: 101)The text was updated successfully, but these errors were encountered: