-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nondeterministic behaviour in SpatialMaxPooling #84
Comments
This is expected. Its Extremely hard to write a max pooling kernel that is deterministic when kW > dW or kH > dH. cunn, cudnn, caffe, all of these are non-deterministic maxpoolings for that case. when kW <= dW, the kernel can be made deterministic with some effort (I think it is deterministic, but I haven't verified). |
Looking at the code in Caffe (https://github.com/BVLC/caffe/blob/master/src/caffe/layers/pooling_layer.cu), I believe their backward passes are deterministic: for each |
So after empirical testing, I can say that all other implementations (caffe, cudnn V2, convnet2) behave deterministically. Therefore, I would like to reopen the issue. If this is not going to be fixed, it should be at least documented, similarly to the "Reproducibility" chapter in cudnn user guide. It would be also interesting to document how the convolution modules behave. |
@wickedfoo investigated this, and this is not what he found. He found that cudnn's max pooling is clearly non-deterministic. @wickedfoo comments? |
The behavior of cudnn's maxpool seems to have changed: while experimentally non-deterministic in V2R2, it appears to be ok in V2. |
oh I see. Good to know that. The last investigation was indeed probably done in V1 or V2Rxx |
I don't put my hand in fire for that, though:). |
I think this has been addressed in dc77610 |
I have encountered nondeterministic behavior in the backpropagation step of
SpatialMaxPooling
whenkW~=dW or kH~=dH
(i.e.atomicmaxgradinput
kernel is run). I suspect the issue being caused byatomicAdd()
and general non-associativity of floats. Thus, I'm fairly sure that this is a feature of parallelism but somehow I wanted to make you aware of it (it's scary)...The following code works on my machine (GTX Titan, CUDA 346.46) when the GPU is under some load. If I let the computation run on CPU wit FloatTensors, it's deterministic.
Note further that
On a slightly related note, is there any reason for calling
atomicmaxgradinput
instead ofmaxgradinput
on SpatialMaxPooling.cu:320?The text was updated successfully, but these errors were encountered: