You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! We did use LAMB, but we used ZeRO-3, not ZeRO Offload. Thus, we were able to use the GPU version of LAMB from Apex.
Interestingly, in the case of ZeRO-3, LAMB's behavior is slightly different. But, according to our experiments, this does not affect the quality of training in any way.
Thanks for this great project.
In your blog: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6, you have used LAMB optimizer and ZeRO offload, but isn't ZeRO CPU offload have to use DeepSpeedCPUAdam for good performace?
And i did not find LAMB optimizer codes in this project code.
The text was updated successfully, but these errors were encountered: