Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model file too big after batch learning #112

Closed
nagadomi opened this issue Nov 25, 2014 · 6 comments
Closed

model file too big after batch learning #112

nagadomi opened this issue Nov 25, 2014 · 6 comments

Comments

@nagadomi
Copy link

Probably, This issue caused by Module#output and Module#gradInput.

require 'nn'

function batch_learning(model, criterion, x, y)
   local z = model:forward(x)
   local df_do = torch.Tensor(z:size(1), y:size(2)):zero()
   for i = 1, z:size(1) do
      local f = criterion:forward(z[i], y[i])
      df_do[i]:copy(criterion:backward(z[i], y[i]))
   end
   model:backward(x, df_do)
end
function online_learning(model, criterion, x, y)
   for i = 1, x:size(1) do
      local z = model:forward(x[i])
      local f = criterion:forward(z, y[i])
      model:backward(x[i], criterion:backward(z, y[i]))
   end
end
function model_file_too_big()
   local EXAMPLES = 10000 -- 1000000
   local FEATS = 1000
   local x = torch.Tensor(EXAMPLES, FEATS):uniform()
   local y = torch.Tensor(EXAMPLES, 1):uniform()
   local criterion = nn.MSECriterion()

   local model = nn.Sequential()
   model:add(nn.Linear(x:size(2), 1))
   model:add(nn.Reshape(1))
   model:add(nn.Sigmoid())

   online_learning(model, criterion, x, y)
   torch.save("online.model", model)

   batch_learning(model, criterion, x, y)
   torch.save("batch.model", model)
end
model_file_too_big()
% th model_too_big.lua
% ls -laS
total 78424
-rw-rw-r-- 1 nagadomi nagadomi 80257915 Nov 26 01:15 batch.model
-rw-rw-r-- 1 nagadomi nagadomi    25891 Nov 26 01:15 online.model
@jonathantompson
Copy link
Contributor

Below is a little code snippit I use to recursively cleanup the model before saving to disk. It's not overly robust (it fails for some container classes), but it works on 99% of my models. All I'm saying is that you may need to modify it a little.

function zeroDataSize(data)
  if type(data) == 'table' then
    for i = 1, #data do
      data[i] = zeroDataSize(data[i])
    end
  elseif type(data) == 'userdata' then
    data = torch.Tensor():typeAs(data)
  end
  return data
end

-- Resize the output, gradInput, etc temporary tensors to zero (so that the
-- on disk size is smaller)
function cleanupModel(node)
  if node.output ~= nil then
    node.output = zeroDataSize(node.output)
  end
  if node.gradInput ~= nil then
    node.gradInput = zeroDataSize(node.gradInput)
  end
  if node.finput ~= nil then
    node.finput = zeroDataSize(node.finput)
  end
  -- Recurse on nodes with 'modules'
  if (node.modules ~= nil) then
    if (type(node.modules) == 'table') then
      for i = 1, #node.modules do
        local child = node.modules[i]
        cleanupModel(child)
      end
    end
  end

  collectgarbage()
end

FYI: I don't think this is an issue, and you should probably close it. There are plenty of examples I can think of where I would want to save the gradInput and output tensors.

@nagadomi
Copy link
Author

Thanks, I am using following code for now.

model:get(1).output = torch.Tensor()
model:get(1).gradInput = torch.Tensor()

@jonathantompson
Copy link
Contributor

Then you should close the issue.

@nagadomi
Copy link
Author

Is this by design?

@jonathantompson
Copy link
Contributor

Yes. Torch will serialize anything in those module instances before writing to disk, including the output and gradInput values and any other tensors in the table. It's up to the user to clean up the module if all you want are the weights and bias values.

@nagadomi
Copy link
Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants