-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max pooling cause error on empty batch #21338
Comments
Interesting findings! The error disappears when you use NCHW data format. However the NHWC code path uses a custom cuda kernel which does not explicitly handle empty input tensor. |
Adding an if-else somewhere in the op to check empty inputs can solve this issue. But before that I think it's worth fixing the unit test framework first: There is actually a test that should've triggered this error: tensorflow/tensorflow/python/kernel_tests/pooling_ops_test.py Lines 570 to 578 in 1a13c4f
|
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. |
@ppwwyyxx Thanks for your nice notes! The error disappears after I change to NCWH. I go through some test cases and find that most of them just do a session.run() and check result with function like assertAllClose(). And rare case of cudaGetLastError() or cudaPeekAtLastError() are used directly in tf repo (Does it mean kernel error will somewhat "broadcast" into execution engine?). In WhereOp case, the error seems to finally get detected in nvidia cub lib https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/where_op_gpu.cu.h#L21 Any idea about mechanism to do such kind of checking in python? |
I don't think the error can be checked from Python. I hope the TF team will find a way to fix the tests soon. It will be even better if there is a way to identify ops that do not support empty inputs. Similar error probably exists in many places. I reference a horovod issue above which may be related (I'm also using empty inputs when I met that issue, but not using NHWC maxpool). Others have seen similar issues, for example: tensorpack/tensorpack#760, CharlesShang/FastMaskRCNN#159, #16035, tensorflow/serving#627. All of these issues have not been clearly resolved, and they are all object detection use cases where empty inputs appear quite often. So it sounds like they are all related to the bug you found. |
Thanks @ppwwyyxx! I marked this as "Contributions Welcome" since you're looking at it. |
@angersson No I'm not. I can fix the pooling ops but I expect TF team to find out why the tests did not detect such failure. |
At least in synchronous execution on gpu devices, whether the computation is buggy or not seems to be checked by OpContext's status, which requires OP developers to follow some error handling conventions, or else errors may "leak". I test to do some additional check here and recompile, do get nonzero cuda error code (9). Wonder why such kind of post op check not done currently. |
To summarize, what happened is:
I think it's a quite serious issue: |
@asimshankar Can you review this with the changes to the team? I was not sure which github was Alek. This was marked as contributions welcome and that should be reviewed. |
Thanks Yuxin, I've approved your fix PR. I will investigate what we can do about checking CUDA errors where they happen. The correct way to prevent CUDA errors from tunneling to other places is to check cudaGetLastError after each kernel launch. Most kernel launches don't currently do that. We are considering adding infrastructure code that all kernel launches would go through, which would allow for a central place to fix this (among other things related to launching kernels directly through the CUDA runtime). Until this is in place, it might be worthwhile to just add those checks where they are missing. |
Thanks @chsigg! |
Hi @wrongtest ! Closing this issue as it has been resolved in 2.8 version. |
System information
Describe the problem
When batch_size is 0, max pooling operation seems to produce an unhandled cudaError_t status. It may cause subsequent operations fail with odd error message. That is extremely difficult to debug.
(This corner case bothers us, where we first extract some bounding boxes and then run traditional convolution operations on areas specified by them. The above error occurs in case that no bounding boxes are detected thus batch_size becomes 0. However, the python exception will be randomly thrown at following operation or following session run steps)
Above code report error:
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 1, status: invalid configuration argument
"invalid configuration argument" seems to be message return by cudaGetError, which indicates a failed kernel launch due to zero or too large number of block threads.
Source code / logs
The text was updated successfully, but these errors were encountered: