-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Multi-GPU training #33
Comments
Whether it is a future work for you to implement multi-GPU training with DistributedDataParallel function. |
I'm sorry, I'm not sure if I understood your comment @BUAAers. Are you asking for us to implement multi-GPU with DistributedDataParallel instead/in addition to implementing with the local DataParallel class? |
Hi guys, |
Yes, we paused development of this feature for a while, but as of right now it is under active development on our side. Meanwhile, PRs are very welcome, thanks for your fix! |
Update on this. Our initial impression was that we'd get Multi-gpu for "free" by adopting Pytorch-Lightning. Unfortunately while it does make it much easier to support. There are issues with metrics calculation (calculating corpus-level metrics when data is separated through the gpus) which made it harder to implement. This is not a priority for us and has been paused from our side. In case anyone would like to take a stab a it, feel free to comment on this issue as I have made some (non-public) progress on making it work with kiwi >=2.0.0. |
Is your feature request related to a problem? Please describe.
Currently, there is no solution to training QE models using multiple GPUs which can significantly speed up training of large models.
There are have been several issues of people requesting this feature. (#31 #29)
Describe the solution you'd like
Ideally, it should be possible to pass several GPU IDs to the
gpu-id
yams flag. OpenKiwi should use all of them in parallel to train the model.Additional context
An important thing to take into account is that other parts of the pipeline might become a bottleneck when using multiple GPUS. Things like data injestion/tokenisation, etc
The text was updated successfully, but these errors were encountered: