Separate model/optimizer step, implement basic optimizers and batch normalization #73

Ebanflo42 · 2024-04-04T13:01:54Z

Models and optimizers are generally thought of as separate objects, although currently they are executed in the same context.

This might be appropriate as a second step after forward/backward pass separation.

The critical reasons why we want the optimizer separate are 1) if updated weights are returned by the same context returning training loss which we want to print then weights are being bussed too and from the GPU at every step and 2) XLA supports Send and Recv operations which would allow us to compute gradient updates while simultaneously bussing the next model inputs and labels to the GPU.

We should also support SGD, RMSProp, and Adam optimizers. It would also make sense to make batch normalization a part of this issue (as it is its own sort of custom optimizer).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate model/optimizer step, implement basic optimizers and batch normalization #73

Separate model/optimizer step, implement basic optimizers and batch normalization #73

Ebanflo42 commented Apr 4, 2024

Separate model/optimizer step, implement basic optimizers and batch normalization #73

Separate model/optimizer step, implement basic optimizers and batch normalization #73

Comments

Ebanflo42 commented Apr 4, 2024