Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Multi-GPU training #33

Open
captainvera opened this issue Jun 25, 2019 · 5 comments
Open

Implement Multi-GPU training #33

captainvera opened this issue Jun 25, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@captainvera
Copy link
Contributor

captainvera commented Jun 25, 2019

Is your feature request related to a problem? Please describe.
Currently, there is no solution to training QE models using multiple GPUs which can significantly speed up training of large models.
There are have been several issues of people requesting this feature. (#31 #29)

Describe the solution you'd like
Ideally, it should be possible to pass several GPU IDs to the gpu-id yams flag. OpenKiwi should use all of them in parallel to train the model.

Additional context
An important thing to take into account is that other parts of the pipeline might become a bottleneck when using multiple GPUS. Things like data injestion/tokenisation, etc

@captainvera captainvera added the enhancement New feature or request label Jun 25, 2019
@captainvera captainvera mentioned this issue Jun 25, 2019
@BUAAers
Copy link

BUAAers commented Jul 18, 2019

Whether it is a future work for you to implement multi-GPU training with DistributedDataParallel function.

@captainvera
Copy link
Contributor Author

I'm sorry, I'm not sure if I understood your comment @BUAAers. Are you asking for us to implement multi-GPU with DistributedDataParallel instead/in addition to implementing with the local DataParallel class?

@francoishernandez
Copy link
Contributor

Hi guys,
Just starting to get hands-on with your toolkit.
Multi GPU training would surely be useful to speed up experiments. Is this under active development on your side?
Thanks!

@captainvera
Copy link
Contributor Author

Hi @francoishernandez,

Yes, we paused development of this feature for a while, but as of right now it is under active development on our side.
We are making some structural changes to the Kiwi framework (mostly data-loading and how we handle that, see additional context of original issue [spoiler: it was a huge bottleneck]) and intend to bump the version shortly to include both these changes and add Multi-gpu training. Stay tuned!

Meanwhile, PRs are very welcome, thanks for your fix!

@captainvera
Copy link
Contributor Author

Update on this.

Our initial impression was that we'd get Multi-gpu for "free" by adopting Pytorch-Lightning. Unfortunately while it does make it much easier to support. There are issues with metrics calculation (calculating corpus-level metrics when data is separated through the gpus) which made it harder to implement.

This is not a priority for us and has been paused from our side.

In case anyone would like to take a stab a it, feel free to comment on this issue as I have made some (non-public) progress on making it work with kiwi >=2.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants