Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic behaviour in SpatialMaxPooling #84

Closed
mys007 opened this issue May 1, 2015 · 8 comments
Closed

Nondeterministic behaviour in SpatialMaxPooling #84

mys007 opened this issue May 1, 2015 · 8 comments

Comments

@mys007
Copy link
Contributor

mys007 commented May 1, 2015

I have encountered nondeterministic behavior in the backpropagation step of SpatialMaxPooling when kW~=dW or kH~=dH (i.e. atomicmaxgradinput kernel is run). I suspect the issue being caused by atomicAdd() and general non-associativity of floats. Thus, I'm fairly sure that this is a feature of parallelism but somehow I wanted to make you aware of it (it's scary)...

The following code works on my machine (GTX Titan, CUDA 346.46) when the GPU is under some load. If I let the computation run on CPU wit FloatTensors, it's deterministic.

local inp = torch.CudaTensor(1,18,18):zero()
inp[1][3][3] = 1 --will force adding up 3 numbers

fw = torch.CudaTensor(1,8,8):fill(0)
fw[1][2][1] = -0.00055786536540836
fw[1][2][2] = 0.00075417151674628
fw[1][1][2] = -0.00029314149287529

local model = nn.SpatialMaxPooling(3, 3, 2, 2):cuda()    
model:forward(inp)
local bw = model:backward(inp, fw):clone()

for i=1,100 do
    local diff = bw - model:backward(inp, fw)
    print(diff:sum()) --sometimes, the input is not zero!
end

Note further that

local a = (-0.00055786536540836 + 0.00075417151674628) -0.00029314149287529
local b = 0.00075417151674628 + (-0.00029314149287529 -0.00055786536540836)
print(a-b)  -- not zero

On a slightly related note, is there any reason for calling atomicmaxgradinput instead of maxgradinput on SpatialMaxPooling.cu:320?

@soumith
Copy link
Member

soumith commented May 1, 2015

This is expected. Its Extremely hard to write a max pooling kernel that is deterministic when kW > dW or kH > dH. cunn, cudnn, caffe, all of these are non-deterministic maxpoolings for that case.

when kW <= dW, the kernel can be made deterministic with some effort (I think it is deterministic, but I haven't verified).
The atomic adds were done at some point for just handling the case of kW != dW, for correctness, and they are needed. Otherwise two threads might race.

@soumith soumith closed this as completed May 1, 2015
@mys007
Copy link
Contributor Author

mys007 commented May 12, 2015

Looking at the code in Caffe (https://github.com/BVLC/caffe/blob/master/src/caffe/layers/pooling_layer.cu), I believe their backward passes are deterministic: for each bottom element, all possible top elements that could contribute to it are scanned in a single thread, the summing order thus being deterministic. As another benefit, they don't need atomics.

@mys007
Copy link
Contributor Author

mys007 commented May 20, 2015

So after empirical testing, I can say that all other implementations (caffe, cudnn V2, convnet2) behave deterministically. Therefore, I would like to reopen the issue. If this is not going to be fixed, it should be at least documented, similarly to the "Reproducibility" chapter in cudnn user guide. It would be also interesting to document how the convolution modules behave.

@soumith soumith reopened this May 20, 2015
@soumith
Copy link
Member

soumith commented Jun 11, 2015

@wickedfoo investigated this, and this is not what he found. He found that cudnn's max pooling is clearly non-deterministic. @wickedfoo comments?

@mys007
Copy link
Contributor Author

mys007 commented Jun 11, 2015

The behavior of cudnn's maxpool seems to have changed: while experimentally non-deterministic in V2R2, it appears to be ok in V2.

@soumith
Copy link
Member

soumith commented Jun 11, 2015

oh I see. Good to know that. The last investigation was indeed probably done in V1 or V2Rxx

@mys007
Copy link
Contributor Author

mys007 commented Jun 11, 2015

I don't put my hand in fire for that, though:).
UPDATE: Actually, now I believe that cudnn.SpatialMaxPooling is not deterministic in V2 either. Sorry for confusion. But Caffe's maxpooling definitely is.

@fmassa
Copy link
Contributor

fmassa commented Sep 22, 2015

I think this has been addressed in dc77610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants