How did you used LAMB optimizer with ZeRO CPU offload? #13

ghosthamlet · 2022-06-25T12:09:00Z

Thanks for this great project.
In your blog: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6, you have used LAMB optimizer and ZeRO offload, but isn't ZeRO CPU offload have to use DeepSpeedCPUAdam for good performace?
And i did not find LAMB optimizer codes in this project code.

MichaelEk · 2022-06-25T19:39:48Z

Hi! We did use LAMB, but we used ZeRO-3, not ZeRO Offload. Thus, we were able to use the GPU version of LAMB from Apex.

Interestingly, in the case of ZeRO-3, LAMB's behavior is slightly different. But, according to our experiments, this does not affect the quality of training in any way.

ghosthamlet · 2022-06-26T04:10:58Z

Thanks for the detailed reply.

ghosthamlet closed this as completed Jun 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How did you used LAMB optimizer with ZeRO CPU offload? #13

How did you used LAMB optimizer with ZeRO CPU offload? #13

ghosthamlet commented Jun 25, 2022

MichaelEk commented Jun 25, 2022

ghosthamlet commented Jun 26, 2022

How did you used LAMB optimizer with ZeRO CPU offload? #13

How did you used LAMB optimizer with ZeRO CPU offload? #13

Comments

ghosthamlet commented Jun 25, 2022

MichaelEk commented Jun 25, 2022

ghosthamlet commented Jun 26, 2022