Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

ivankreso · 2015-11-26T11:21:35Z

Is this an expected performance difference?

require 'torch'

local dim1 = 385
local dim2 = 1252
local dim3 = 200

local data = torch.FloatTensor(dim1, dim2, dim3)

local timer = torch.Timer() 
local s = data:storage()
for i = 1,s:size() do
  s[i] = 0
end
xlua.print('time: ' .. timer:time().real .. ' secs')
timer:reset()
for i = 1, dim1 do
  for j = 1, dim2 do
    for k = 1, dim3 do
      data[i][j][k] = 0
    end
  end
end
xlua.print('time: ' .. timer:time().real .. ' secs')

Output:

time: 14.446336984634 secs
time: 126.32362508774 secs

The text was updated successfully, but these errors were encountered:

albanD · 2015-11-26T11:42:21Z

I think this comes from the lua loop overhead.

If you replace the loop for the tensor by

data_view = data:view(-1)
for i = 1, data_view:nElement() do
  data_view[i] = 0
end

you should get similar timings.

fmassa · 2015-11-26T11:42:39Z

This is an expected behaviour. There are three things affecting performance:

Every element access does a bunch of checks to see if the indexes are out of bounds
Each time you use the __index operator ([]), it creates a new tensor which points to the specified memory position. So calling it 3 times [i][j][k] is much slower than using [{i,j,k}]. I replaced it on my side and we get an speedup of a factor of 3, but we are still 3x slower than iterating though storage.
When you index a Tensor, it computes the absolute index using the strides of the tensor. So we are indeed doing 3x more operations (accessing one element in your tensor requires computing stride1*i + stride2*j + stride3*k, whereas accessing the storage only requires the index i.

Now, if you want super fast memory access for CPU tensors, you can use the :data() function and iterate over the raw C data. In my machine, this corresponds to an speed up of 130x compared to accessing the storage elements sequentially.

ivankreso · 2015-11-26T11:55:21Z

@fmassa Thank you very much for a great answer. I didn't know about :data(), that will solve my problems. :)

@albanD Luajit loop is super fast, if I repace data tensor with just a float variable I get:

time: 0.03554105758667 secs

fmassa · 2015-11-26T12:02:00Z

@ivankreso just one remark, if you are using the C array from :data(), don't forget that indexing starts at 0.

ivankreso · 2015-11-26T12:09:48Z

@fmassa I just saw it in the docs, thanks.

ivankreso closed this as completed Nov 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

ivankreso commented Nov 26, 2015

albanD commented Nov 26, 2015

fmassa commented Nov 26, 2015

ivankreso commented Nov 26, 2015

fmassa commented Nov 26, 2015

ivankreso commented Nov 26, 2015

Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

Comments

ivankreso commented Nov 26, 2015

albanD commented Nov 26, 2015

fmassa commented Nov 26, 2015

ivankreso commented Nov 26, 2015

fmassa commented Nov 26, 2015

ivankreso commented Nov 26, 2015