Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crazy idea about reusing pre-trained char-based RNNLM #747

Closed
coodoo opened this issue Apr 4, 2016 · 6 comments
Closed

Crazy idea about reusing pre-trained char-based RNNLM #747

coodoo opened this issue Apr 4, 2016 · 6 comments

Comments

@coodoo
Copy link

coodoo commented Apr 4, 2016

This might be a crazy idea, normally we have limited amount of labelled data set to train a char-based RNN LM for sentiment analysis, I'm wondering would it be possible to train a RNNLM with large data set like Penn Tree Bank or billion words from wikipedia first, then base on that model (possibly as a nn.LookupTable) to further train with our labelled data for sentiment analysis task?

Has anyone done this before?

@soumith
Copy link
Member

soumith commented Apr 4, 2016

Yes, using pretrained language model embeddings is a very standard practice.

@coodoo
Copy link
Author

coodoo commented Apr 4, 2016

Mind shedding a light on how to implement it with torch and/or nn.LookupTable? :)

Ps. After a quick scan I didn't see any api to read out or load weights into LookupTable.

@coodoo
Copy link
Author

coodoo commented Apr 4, 2016

@soumith I hacked together a modified version of nn.LookupTable to take in pre-trained weights here, mind having a look?

The goal here is to later use that LookupTable in a model structure like below to run sentiment analysis, would really love to know your thought about it, thanks!

nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output]
  (1): nn.LookupTable
  (2): nn.SplitTable
  (3): nn.Sequencer @ nn.LSTM(100 -> 100)
  (4): nn.SelectTable
  (5): nn.Linear(100 -> 7)
  (6): nn.LogSoftMax
}

@fmassa
Copy link
Contributor

fmassa commented Apr 4, 2016

@coodoo one usually simply copy the weights from a pre-trained model. For example,

old_lookup = torch.load(...) -- can be a full model
lookup = nn.LookupTable(...)
lookup.weight:copy(old_lookup.weight)
-- add lookup in your network

Another way (if you only want to finetune the last layer for example) is to load the full network, removing the last layer and adding a new randomly-initialized layer on top of the network.

net = torch.load(...)
net:remove() -- remove last layer
net:add(nn.Linear(4096,21)) -- add new layer replacing the old one

But your approach also seems to work, but would require adding an extra entry to every module that one wants to reuse the weights.

@coodoo
Copy link
Author

coodoo commented Apr 4, 2016

@fmassa Thanks for the detailed instructions, obviously those are much better ways to do the job, appreciate it! 😀

@coodoo
Copy link
Author

coodoo commented Apr 5, 2016

Verified the approach is working and uploaded an updated version here, @fmass thanks again for your kind instructions, those were very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants