Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How did you used LAMB optimizer with ZeRO CPU offload? #13

Closed
ghosthamlet opened this issue Jun 25, 2022 · 2 comments
Closed

How did you used LAMB optimizer with ZeRO CPU offload? #13

ghosthamlet opened this issue Jun 25, 2022 · 2 comments

Comments

@ghosthamlet
Copy link

Thanks for this great project.
In your blog: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6, you have used LAMB optimizer and ZeRO offload, but isn't ZeRO CPU offload have to use DeepSpeedCPUAdam for good performace?
And i did not find LAMB optimizer codes in this project code.

@MichaelEk
Copy link

Hi! We did use LAMB, but we used ZeRO-3, not ZeRO Offload. Thus, we were able to use the GPU version of LAMB from Apex.

Interestingly, in the case of ZeRO-3, LAMB's behavior is slightly different. But, according to our experiments, this does not affect the quality of training in any way.

@ghosthamlet
Copy link
Author

Thanks for the detailed reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants