Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup and reduce memory usage in Normalize #426

Merged
merged 1 commit into from Oct 13, 2015

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Oct 11, 2015

It's almost 40x faster on CPU, and doesn't use extra memory. Should
be faster on GPU as well, but I haven't benchmarked.

  • Instead of creating a temporary matrix b1*b2 of size (n,d,d), rearrange the computations to avoid creating this huge matrix which was eating all the memory.
  • forward and backward supports varying dimensionality (which wasn't the case before because of the eye matrix)

Should be of interest to @ffmpbgrnn and @bamos.

It's almost 40x faster on CPU, and doesn't use extra memory. Should
be faster on GPU as well
@fmassa
Copy link
Contributor Author

fmassa commented Oct 13, 2015

Just did a quick test on the GPU, backward seems to be much faster than the previous version as well.
For an input of dimensions (32,4096), this version is 100x faster than the old one (and the old version used 2GB of temporary buffers).

@bamos
Copy link
Contributor

bamos commented Oct 13, 2015

Hi, I ran @ffmpbgrnn's test from #341 for some simple profiling of this PR on a Tesla K40 GPU and 3.70GHz CPU. The speedup is amazing!

require 'nn'
require 'cutorch'
require 'cunn'

local module = nn.Normalize(2):cuda()
module:fastMode(false)
local input = torch.rand(64, 2400):cuda()
local t = torch.Timer()
for i = 1, 100 do
    module:forward(input)
    module:backward(input, input)
    print(i)
end
print(t:time().real/100)

Current Master

  • Commit: b80bda2
  • md5sum of Normalize.lua: 4b1f217a796f00ff8cfe575da8e4a409
  • CPU mean execution time: 3.82 s
  • K40 mean execution time: 0.108 s

This PR

  • md5sum of Normalize.lua: a9c7d3f642c08de361369be201234fca
  • CPU: 0.00634 s
  • K40: 0.000624 s

@soumith
Copy link
Member

soumith commented Oct 13, 2015

this is super awesome. If unit tests pass, and it's exactly the same implementation as before, why not!!!

soumith added a commit that referenced this pull request Oct 13, 2015
Speedup and reduce memory usage in Normalize
@soumith soumith merged commit 0c6c2f4 into torch:master Oct 13, 2015
bamos pushed a commit to cmusatyalab/openface that referenced this pull request Oct 13, 2015
torch/nn#426, by @fmassa

Also change from 50 -> 500 runs.
@fmassa fmassa deleted the normalize_opt branch October 13, 2015 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants