Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate model/optimizer step, implement basic optimizers and batch normalization #73

Open
Ebanflo42 opened this issue Apr 4, 2024 · 0 comments

Comments

@Ebanflo42
Copy link
Contributor

Models and optimizers are generally thought of as separate objects, although currently they are executed in the same context.

This might be appropriate as a second step after forward/backward pass separation.

The critical reasons why we want the optimizer separate are 1) if updated weights are returned by the same context returning training loss which we want to print then weights are being bussed too and from the GPU at every step and 2) XLA supports Send and Recv operations which would allow us to compute gradient updates while simultaneously bussing the next model inputs and labels to the GPU.

We should also support SGD, RMSProp, and Adam optimizers. It would also make sense to make batch normalization a part of this issue (as it is its own sort of custom optimizer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant