Reproducing training, recommended hardware setup #9

pvoigtlaender · 2022-04-07T09:33:02Z

Hi,

Thanks for releasing the great code!
I'm trying to reproduce the training, for now I start with the pre-trained model and just do the fine-tuning on YouTube-VOS. What is the recommended number of GPUs and what run-time should I expect?
Is the currently released code able to support multi-node training?
So far, I was able to run the YouTube-VOS training on a single machine with 4xA100 GPU and it took ~2 hours per epoch, so 12 hours in total.
Please let me know about the recommended hardware setup for the YouTube-VOS fine-tuning and also for the pre-training step (I think for this maybe not all code is released yet?).
And if I use a different number of GPUs, can I expect the same result quality, just longer run-time, or will this maybe lead to problems/worse results?

Thank you!

Best,

Paul

wjn922 · 2022-04-07T12:43:56Z

Hi,

Thanks for your interest.

We do not support multi-node training in the repo. If you are interested, you may refer to repo Deformable-DETR and do some modifications.

We run the finetune stage on 8 V100 GPUs with 32 memory. It is also ~2 hours per epoch.
For the pretraining stage, we run it on 32 V100 GPUs around 1~2 days. Actually, we have released the data conversion and main file for pretraining. We may provide a more detailed instruction on pretraining in the future.

As the model has been pretrained, the result would be stable. And because the evaluation is not convenient, we do not test the results for other configurations. Empirically, longer training time would lead to slightly high performance.

pvoigtlaender · 2022-04-07T13:26:28Z

Great, thank you for the details :)
I might come back with more questions at a later point.

pvoigtlaender closed this as completed Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing training, recommended hardware setup #9

Reproducing training, recommended hardware setup #9

pvoigtlaender commented Apr 7, 2022 •

edited

wjn922 commented Apr 7, 2022 •

edited

pvoigtlaender commented Apr 7, 2022

Reproducing training, recommended hardware setup #9

Reproducing training, recommended hardware setup #9

Comments

pvoigtlaender commented Apr 7, 2022 • edited

wjn922 commented Apr 7, 2022 • edited

pvoigtlaender commented Apr 7, 2022

pvoigtlaender commented Apr 7, 2022 •

edited

wjn922 commented Apr 7, 2022 •

edited