Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great Stuff but Needs Better Usability #1

Open
kayuksel opened this issue Feb 22, 2021 · 12 comments
Open

Great Stuff but Needs Better Usability #1

kayuksel opened this issue Feb 22, 2021 · 12 comments

Comments

@kayuksel
Copy link

Hello,

Thanks for such a great work. Auto-initializing a DNN in a proper way definetely sounds amazing.

Yet, the usability needs to be significantly improved so that I can plug this in my existing networks.

It would be great if that could be as easy as installing and then importing an additional package.

We should maybe open a feature request in PyTorch so that they integrate this into the framework.

@kayuksel
Copy link
Author

kayuksel commented Feb 22, 2021

FYI, I have just opened a feature request in Pytorch repository as well: pytorch/pytorch#52626

@zhuchen03
Copy link
Owner

Thank you for your interest in our work, and opening the feature request! It is a good idea to make the code applicable to any network without having to adding some functions to the network class to switch between the states.

I just looked up the documents and feel this can be achievable without adding new features to PyTorch. We can use register_forward_pre_hook to switch into the updated parameters or let the BN layers switch to the training behavior without affecting dropout, and use remove (pytorch/pytorch#5037) to switch between the states. I will look into the details later and hopefully release a version soon.

@kayuksel
Copy link
Author

Sounds great!!! If we could also use it with any given optimizer, that would be perfect.
Nobody uses default Adam anymore but e.g. optimizers from torch_optimizer package.

@danarte
Copy link

danarte commented Feb 28, 2021

Hi,
I'm also really interested in a general Pytorch implementation, I'll try it out as soon as some basic example code will be available and will be sure to cite the work in our upcoming publication.
I'm not even sure if a package installation is the best route here since it sounds like it could be sufficient to add a few functions from gradinit_modules at certain points in the model code.
Our group works with costume model (mix of rnn, cnn, dnn, attention, and transformers) so an explanation of where gradinit_modules should be implemented could help us immensely (instead of premade code for specific models).

In the end we will probably try to implement it by ourselves, but we would rather use the official instructions to insure it works properly.

Great work! thanks for the publication,
Artem.

@zhuchen03
Copy link
Owner

zhuchen03 commented Mar 9, 2021

@kayuksel I have just pushed a new version to support any CNN that only has nn.Conv2d, nn.Linear and nn.BatchNorm2d as its parameterized layers. Please refer to the note in README for more details and feel free to ask if you have any further question.

@danarte Thanks for the interest in our work! Basically we use GradInit on all parameters of the network. We learn a scale factor for each weight and bias (if any, and non-zero at initialization). Please refer to the notes in the updated README.md to see how to extend to other models like Transformers. Basically we just need to enable iterating all trainable modules in a fixed order and take gradient steps (to compute the objective of Eq.1) for all their parameters. I will release the code for fairseq ASAP. Feel free to open a new issue if you have any question.

@kayuksel
Copy link
Author

@zhuchen03 I see that it requires dataloader and seems to be specific to the classification.
This is too specific unfortunately for me to use in some novel problems that I am working on
(where initialization is crucial). But thank you for helping and letting me know about update.

@zhuchen03
Copy link
Owner

@kayuksel I'm curious. How is your problem like? I think you can try it out as long as your model can be optimized with SGD. You just need to replace the loss function with yours. I agree the current version is restricted to image classification but it shouldn't be too difficult to adapt to other tasks. Happy to assist or maybe improve the API if you could provide more details.

@kayuksel
Copy link
Author

@zhuchen03 In my case, it is a generative model that is trained by QHAdam (with an adaptive gradient clipping wrapper), which learns to continously generates population of solutions to e.g. a mathematical function.

In these type of reinforcement learning problems, the network initialization can be an important factor as it effects how the agent starts taking actions and hence how experiences are acquired to update the policy.

(leading to the severe reproducibility issues and random seed sensitivity of RL)

@zhuchen03
Copy link
Owner

@kayuksel I see. I do not have much background in the problem you are trying to solve. From your description, it looks like you are using some Adam-like optimizer, and GradInit should be applicable as long as we can write down its update rule for the first step.

I can check whether there are other issues hindering implementing GradInit for your problem, if you could share some simple sample code.

@kayuksel
Copy link
Author

Thanks @zhuchen03, how can I send you a sample code? Can I use the (cs.umd.edu) e-mail that is mentioned at your resume?

@zhuchen03
Copy link
Owner

Yes that works. Thank you!

@bknyaz
Copy link

bknyaz commented Jan 4, 2022

Hi @kayuksel, @danarte. Just in case I wanted to point out to our recent work https://github.com/facebookresearch/ppuda. You should be able to initialize almost any neural net in a single function call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants