Nondeterministic behaviour in SpatialMaxPooling #84

mys007 · 2015-05-01T18:56:35Z

I have encountered nondeterministic behavior in the backpropagation step of SpatialMaxPooling when kW~=dW or kH~=dH (i.e. atomicmaxgradinput kernel is run). I suspect the issue being caused by atomicAdd() and general non-associativity of floats. Thus, I'm fairly sure that this is a feature of parallelism but somehow I wanted to make you aware of it (it's scary)...

The following code works on my machine (GTX Titan, CUDA 346.46) when the GPU is under some load. If I let the computation run on CPU wit FloatTensors, it's deterministic.

local inp = torch.CudaTensor(1,18,18):zero()
inp[1][3][3] = 1 --will force adding up 3 numbers

fw = torch.CudaTensor(1,8,8):fill(0)
fw[1][2][1] = -0.00055786536540836
fw[1][2][2] = 0.00075417151674628
fw[1][1][2] = -0.00029314149287529

local model = nn.SpatialMaxPooling(3, 3, 2, 2):cuda()    
model:forward(inp)
local bw = model:backward(inp, fw):clone()

for i=1,100 do
    local diff = bw - model:backward(inp, fw)
    print(diff:sum()) --sometimes, the input is not zero!
end

Note further that

local a = (-0.00055786536540836 + 0.00075417151674628) -0.00029314149287529
local b = 0.00075417151674628 + (-0.00029314149287529 -0.00055786536540836)
print(a-b)  -- not zero

On a slightly related note, is there any reason for calling atomicmaxgradinput instead of maxgradinput on SpatialMaxPooling.cu:320?

The text was updated successfully, but these errors were encountered:

soumith · 2015-05-01T19:48:40Z

This is expected. Its Extremely hard to write a max pooling kernel that is deterministic when kW > dW or kH > dH. cunn, cudnn, caffe, all of these are non-deterministic maxpoolings for that case.

when kW <= dW, the kernel can be made deterministic with some effort (I think it is deterministic, but I haven't verified).
The atomic adds were done at some point for just handling the case of kW != dW, for correctness, and they are needed. Otherwise two threads might race.

mys007 · 2015-05-12T12:31:27Z

Looking at the code in Caffe (https://github.com/BVLC/caffe/blob/master/src/caffe/layers/pooling_layer.cu), I believe their backward passes are deterministic: for each bottom element, all possible top elements that could contribute to it are scanned in a single thread, the summing order thus being deterministic. As another benefit, they don't need atomics.

mys007 · 2015-05-20T10:05:16Z

So after empirical testing, I can say that all other implementations (caffe, cudnn V2, convnet2) behave deterministically. Therefore, I would like to reopen the issue. If this is not going to be fixed, it should be at least documented, similarly to the "Reproducibility" chapter in cudnn user guide. It would be also interesting to document how the convolution modules behave.

soumith · 2015-06-11T12:36:58Z

@wickedfoo investigated this, and this is not what he found. He found that cudnn's max pooling is clearly non-deterministic. @wickedfoo comments?

mys007 · 2015-06-11T13:11:07Z

The behavior of cudnn's maxpool seems to have changed: while experimentally non-deterministic in V2R2, it appears to be ok in V2.

soumith · 2015-06-11T13:12:11Z

oh I see. Good to know that. The last investigation was indeed probably done in V1 or V2Rxx

mys007 · 2015-06-11T13:12:40Z

I don't put my hand in fire for that, though:).
UPDATE: Actually, now I believe that cudnn.SpatialMaxPooling is not deterministic in V2 either. Sorry for confusion. But Caffe's maxpooling definitely is.

fmassa · 2015-09-22T06:58:06Z

I think this has been addressed in dc77610

soumith closed this as completed May 1, 2015

soumith reopened this May 20, 2015

soumith closed this as completed Sep 22, 2015

shiviser mentioned this issue Jun 9, 2016

Mention that GPU reductions are nondeterministic in docs tensorflow/tensorflow#2732

Closed

tianjianjiang mentioned this issue Feb 7, 2018

non-deterministic behavior on the GPU? clab/dynet#399

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic behaviour in SpatialMaxPooling #84

Nondeterministic behaviour in SpatialMaxPooling #84

mys007 commented May 1, 2015

soumith commented May 1, 2015

mys007 commented May 12, 2015

mys007 commented May 20, 2015

soumith commented Jun 11, 2015

mys007 commented Jun 11, 2015

soumith commented Jun 11, 2015

mys007 commented Jun 11, 2015

fmassa commented Sep 22, 2015

Nondeterministic behaviour in SpatialMaxPooling #84

Nondeterministic behaviour in SpatialMaxPooling #84

Comments

mys007 commented May 1, 2015

soumith commented May 1, 2015

mys007 commented May 12, 2015

mys007 commented May 20, 2015

soumith commented Jun 11, 2015

mys007 commented Jun 11, 2015

soumith commented Jun 11, 2015

mys007 commented Jun 11, 2015

fmassa commented Sep 22, 2015