Adds nn.Normalize #341

Merged
merged 1 commit into from Sep 14, 2015

Projects

None yet

6 participants

@fmassa
Contributor
fmassa commented Aug 1, 2015

Generalizes #260 from @karpathy to accept arbitrary L_p norms.
A few remarks:

  • Maybe LpNormalize or Normalize is a better name ?
  • Only accepts 1D/2D inputs. @nicholas-leonard proposed to add a dim parameter to make it equivalent to torch.norm. Is it worth it given that we have SpatialBatchNormalization ?
  • updateGradInput is quite memory consuming for large d.

cc: @bamos

@szagoruyko
Member

definitely the name has to be Normalize, not Norm
is it hard to extend to other dimensions? SpatialBatchNormalization is different

@fmassa
Contributor
fmassa commented Aug 2, 2015

@szagoruyko Maybe it's not too difficult to extend it to work for other dimensions, by viewing the input to be 2D with the last dimension the desired normalized dimension, but one has to take care about batched/non-batched inputs as well. Maybe we could add a setNumInputDims function, as in nn.View ?
I'll spend some more time trying to make it more generic.

Everybody ok to renaming it to Normalize ?

@karpathy
Contributor
karpathy commented Aug 2, 2015

Why is it obvious that the name should be Normalize and not Norm? As far as I can tell almost all operations from torch are ported without name changes to nn layers. E.g. analogous to max operation there is a nn.Max, so why is it obvious that norm operation should be nn.Normalize? Shouldn't we have nn.Maximize then? My first instinct would be to stick with the current naming for naming consistency.

@soumith
Member
soumith commented Aug 2, 2015

torch.norm returns the p-norm. isn't this module also normalizing the input according to the returned norm?

@karpathy
Contributor
karpathy commented Aug 2, 2015

Good point. Almost suggest that there should be both nn.Norm that does exactly what norm does, and then nn.Normalize that also does a div right afterwards. But perhaps that gets a bit too hairy then :)

@soumith
Member
soumith commented Aug 2, 2015

That's actually a good idea :)

@soumith soumith changed the title from Adds Norm to Adds Normalize Sep 4, 2015
@soumith soumith changed the title from Adds Normalize to Adds nn.Normalize Sep 4, 2015
@soumith
Member
soumith commented Sep 4, 2015

@fmassa looks like it's good to go is it?

@fmassa
Contributor
fmassa commented Sep 4, 2015

It's good. I just need to squash the commits.
I didn't had the time to add a dimension parameter though, it would complicate a bit the logic because of the setNumInputDims function. But it could be added later if needed.

@soumith
Member
soumith commented Sep 11, 2015

squash, good to go.

@fmassa
Contributor
fmassa commented Sep 12, 2015

@soumith I just squashed the commits.

@soumith soumith merged commit 5fa1a8b into torch:master Sep 14, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@soumith
Member
soumith commented Sep 14, 2015

thanks a lot Francisco, for the excellent PR.

@fmassa fmassa deleted the fmassa:norm branch Sep 14, 2015
@ffmpbgrnn

Lots of memory is needed in backprop of this module. One reason might be creating the eyeExpand matrix and later doing multiplication. In my case, when the batch size is 64, input dimension is 4800, two Normalize layer would be out of memory for one 4G memory GPU. Any idea to implement a more space efficient Normalize layer?

@fmassa
Contributor
fmassa commented Oct 3, 2015

@ffmpbgrnn here is a version of Normalize which uses much less memory (it doesn't depend on the batch size anymore). It should be slower on GPU. The tests passes so it should be fine. Use it with fastMode(false).
fmassa@015ba9c

@ffmpbgrnn

@fmassa thanks for your work. I will give a try!

@fmassa
Contributor
fmassa commented Oct 5, 2015

@ffmpbgrnn so, does this patch works fine for you ? Is it much slower than the previous version ?
Maybe we could push this simplified version to master (taking of the faster mode to make things simple) ?
cc @soumith

@ffmpbgrnn

@fmassa , sorry for my late response..Yes, it passes the test. But when I test fastMode(false) mode on GPU, it's even faster than the fastMode(true).

require 'nn'
require 'cutorch'
require 'cunn'

local module = nn.Normalize(2):cuda()
module:fastMode(false)
local input = torch.rand(64, 2400):cuda()
local t = torch.Timer()
for i = 1, 100 do
    module:forward(input)
    module:backward(input, input)
    print(i)
end
print(t:time().real/100)

fastMode(false), output: 0.09
fastMode(true), output: 0.12

@dineshj1

This line could cause division by 0 when self.p<2, e.g. L1 normalization. Perhaps clamp to have minimum value eps? Could do this only for self.p<2 if necessary.

Contributor

Thanks for the remark. I've added a clamping factor, check it #444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment