Speedup and reduce memory usage in Normalize #426

fmassa · 2015-10-11T17:25:07Z

It's almost 40x faster on CPU, and doesn't use extra memory. Should
be faster on GPU as well, but I haven't benchmarked.

Instead of creating a temporary matrix b1*b2 of size (n,d,d), rearrange the computations to avoid creating this huge matrix which was eating all the memory.
forward and backward supports varying dimensionality (which wasn't the case before because of the eye matrix)

Should be of interest to @ffmpbgrnn and @bamos.

It's almost 40x faster on CPU, and doesn't use extra memory. Should be faster on GPU as well

fmassa · 2015-10-13T17:26:58Z

Just did a quick test on the GPU, backward seems to be much faster than the previous version as well.
For an input of dimensions (32,4096), this version is 100x faster than the old one (and the old version used 2GB of temporary buffers).

bamos · 2015-10-13T19:07:17Z

Hi, I ran @ffmpbgrnn's test from #341 for some simple profiling of this PR on a Tesla K40 GPU and 3.70GHz CPU. The speedup is amazing!

require 'nn'
require 'cutorch'
require 'cunn'

local module = nn.Normalize(2):cuda()
module:fastMode(false)
local input = torch.rand(64, 2400):cuda()
local t = torch.Timer()
for i = 1, 100 do
    module:forward(input)
    module:backward(input, input)
    print(i)
end
print(t:time().real/100)

Current Master

Commit: b80bda2
md5sum of Normalize.lua: 4b1f217a796f00ff8cfe575da8e4a409
CPU mean execution time: 3.82 s
K40 mean execution time: 0.108 s

This PR

md5sum of Normalize.lua: a9c7d3f642c08de361369be201234fca
CPU: 0.00634 s
K40: 0.000624 s

soumith · 2015-10-13T19:08:09Z

this is super awesome. If unit tests pass, and it's exactly the same implementation as before, why not!!!

Speedup and reduce memory usage in Normalize

@fmassa

torch/nn#426, by @fmassa Also change from 50 -> 500 runs.

Speedup and reduce memory usage in Normalize

d09b5a8

It's almost 40x faster on CPU, and doesn't use extra memory. Should be faster on GPU as well

soumith added a commit that referenced this pull request Oct 13, 2015

Merge pull request #426 from fmassa/normalize_opt

0c6c2f4

Speedup and reduce memory usage in Normalize

soumith merged commit 0c6c2f4 into torch:master Oct 13, 2015

bamos pushed a commit to cmusatyalab/openface that referenced this pull request Oct 13, 2015

Update runtimes with optimized nn.Normalize

c3525e0

torch/nn#426, by @fmassa Also change from 50 -> 500 runs.

fmassa deleted the normalize_opt branch October 13, 2015 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup and reduce memory usage in Normalize #426

Speedup and reduce memory usage in Normalize #426

fmassa commented Oct 11, 2015

fmassa commented Oct 13, 2015

bamos commented Oct 13, 2015

soumith commented Oct 13, 2015

Speedup and reduce memory usage in Normalize #426

Speedup and reduce memory usage in Normalize #426

Conversation

fmassa commented Oct 11, 2015

fmassa commented Oct 13, 2015

bamos commented Oct 13, 2015

Current Master

This PR

soumith commented Oct 13, 2015