Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to tensor is 9 times slower with torch.Tensor compared to torch.Storage #474

Closed
ivankreso opened this issue Nov 26, 2015 · 5 comments

Comments

@ivankreso
Copy link

Is this an expected performance difference?

require 'torch'

local dim1 = 385
local dim2 = 1252
local dim3 = 200

local data = torch.FloatTensor(dim1, dim2, dim3)

local timer = torch.Timer() 
local s = data:storage()
for i = 1,s:size() do
  s[i] = 0
end
xlua.print('time: ' .. timer:time().real .. ' secs')
timer:reset()
for i = 1, dim1 do
  for j = 1, dim2 do
    for k = 1, dim3 do
      data[i][j][k] = 0
    end
  end
end
xlua.print('time: ' .. timer:time().real .. ' secs')

Output:

time: 14.446336984634 secs
time: 126.32362508774 secs
@albanD
Copy link
Contributor

albanD commented Nov 26, 2015

I think this comes from the lua loop overhead.

If you replace the loop for the tensor by

data_view = data:view(-1)
for i = 1, data_view:nElement() do
  data_view[i] = 0
end

you should get similar timings.

@fmassa
Copy link
Contributor

fmassa commented Nov 26, 2015

This is an expected behaviour. There are three things affecting performance:

  • Every element access does a bunch of checks to see if the indexes are out of bounds
  • Each time you use the __index operator ([]), it creates a new tensor which points to the specified memory position. So calling it 3 times [i][j][k] is much slower than using [{i,j,k}]. I replaced it on my side and we get an speedup of a factor of 3, but we are still 3x slower than iterating though storage.
  • When you index a Tensor, it computes the absolute index using the strides of the tensor. So we are indeed doing 3x more operations (accessing one element in your tensor requires computing stride1*i + stride2*j + stride3*k, whereas accessing the storage only requires the index i.

Now, if you want super fast memory access for CPU tensors, you can use the :data() function and iterate over the raw C data. In my machine, this corresponds to an speed up of 130x compared to accessing the storage elements sequentially.

@ivankreso
Copy link
Author

@fmassa Thank you very much for a great answer. I didn't know about :data(), that will solve my problems. :)

@albanD Luajit loop is super fast, if I repace data tensor with just a float variable I get:

time: 0.03554105758667 secs

@fmassa
Copy link
Contributor

fmassa commented Nov 26, 2015

@ivankreso just one remark, if you are using the C array from :data(), don't forget that indexing starts at 0.

@ivankreso
Copy link
Author

@fmassa I just saw it in the docs, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants