Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight modification of adam.lua causing different training losses with the same seed #158

Open
szhengac opened this issue Jun 15, 2017 · 0 comments

Comments

@szhengac
Copy link

szhengac commented Jun 15, 2017

I just came across a strange problem. I slightly modified some parts of adam.lua as follows:

  -- Initialization
   state.t = state.t or 0
   -- Exponential moving average of gradient values
   state.m = state.m or x.new(x:size()):zero()
   -- Exponential moving average of squared gradient values
   state.v = state.v or x.new(x:size()):zero()
   -- A tmp tensor to hold the sqrt(v) + epsilon
   state.denom = state.denom or x.new(x:size()):zero()

   -- (3) learning rate decay (annealing)
   local clr = lr / (1 + state.t*lrd)

   state.t = state.t + 1
   local biasCorrection1 = 1 - beta1^state.t 
   local biasCorrection2 = 1 - beta2^state.t 

   -- (1) evaluate f(x) and df/dx
   local fx, dfdx = opfunc(x)

   -- (2) weight decay
   if wd ~= 0 then
      dfdx:add(wd, x)
   end

I changed the order of (1), (2) and (3), and placed

local biasCorrection1 = 1 - beta1^state.t
local biasCorrection2 = 1 - beta2^state.t

after state.t = state.t + 1. With such changes, the training losses can not be ensured the same even though I used the same seed. If I added a print() between state.t = state.t + 1 and local biasCorrection1 = 1 - beta1^state.t, then I can obtain the same training losses with multiple runs. The original adam.lua can produce the same results with multiple runs.

Does anyone have any idea about what might be happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant