Implement Multi-GPU training #33

captainvera · 2019-06-25T14:00:05Z

Is your feature request related to a problem? Please describe.
Currently, there is no solution to training QE models using multiple GPUs which can significantly speed up training of large models.
There are have been several issues of people requesting this feature. (#31 #29)

Describe the solution you'd like
Ideally, it should be possible to pass several GPU IDs to the gpu-id yams flag. OpenKiwi should use all of them in parallel to train the model.

Additional context
An important thing to take into account is that other parts of the pipeline might become a bottleneck when using multiple GPUS. Things like data injestion/tokenisation, etc

The text was updated successfully, but these errors were encountered:

BUAAers · 2019-07-18T13:37:16Z

Whether it is a future work for you to implement multi-GPU training with DistributedDataParallel function.

captainvera · 2019-07-19T10:06:57Z

I'm sorry, I'm not sure if I understood your comment @BUAAers. Are you asking for us to implement multi-GPU with DistributedDataParallel instead/in addition to implementing with the local DataParallel class?

francoishernandez · 2019-12-06T14:22:48Z

Hi guys,
Just starting to get hands-on with your toolkit.
Multi GPU training would surely be useful to speed up experiments. Is this under active development on your side?
Thanks!

captainvera · 2019-12-09T15:38:29Z

Hi @francoishernandez,

Yes, we paused development of this feature for a while, but as of right now it is under active development on our side.
We are making some structural changes to the Kiwi framework (mostly data-loading and how we handle that, see additional context of original issue [spoiler: it was a huge bottleneck]) and intend to bump the version shortly to include both these changes and add Multi-gpu training. Stay tuned!

Meanwhile, PRs are very welcome, thanks for your fix!

captainvera · 2020-09-30T13:18:34Z

Update on this.

Our initial impression was that we'd get Multi-gpu for "free" by adopting Pytorch-Lightning. Unfortunately while it does make it much easier to support. There are issues with metrics calculation (calculating corpus-level metrics when data is separated through the gpus) which made it harder to implement.

This is not a priority for us and has been paused from our side.

In case anyone would like to take a stab a it, feel free to comment on this issue as I have made some (non-public) progress on making it work with kiwi >=2.0.0.

captainvera added the enhancement New feature or request label Jun 25, 2019

captainvera mentioned this issue Jun 25, 2019

GPU ID #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Multi-GPU training #33

Implement Multi-GPU training #33

captainvera commented Jun 25, 2019 •

edited

Loading

BUAAers commented Jul 18, 2019

captainvera commented Jul 19, 2019

francoishernandez commented Dec 6, 2019

captainvera commented Dec 9, 2019

captainvera commented Sep 30, 2020

Implement Multi-GPU training #33

Implement Multi-GPU training #33

Comments

captainvera commented Jun 25, 2019 • edited Loading

BUAAers commented Jul 18, 2019

captainvera commented Jul 19, 2019

francoishernandez commented Dec 6, 2019

captainvera commented Dec 9, 2019

captainvera commented Sep 30, 2020

captainvera commented Jun 25, 2019 •

edited

Loading