What is the configuration of your computer, such as GPU model and GPU memory size #430

zhongqiu1245 · 2020-07-17T12:02:29Z

Thank you for your amazing job!
What is the configuration of your computer, such as GPU model and GPU memory size?
I'm looking forward to your reply.

zylo117 · 2020-07-17T13:06:41Z

2080ti, 11G

zhongqiu1245 · 2020-07-17T13:08:58Z

If I want to train D7, how many GPUs(2080ti, 11G) should I prepare?

NikZak · 2020-08-07T08:44:27Z

I believe that you would not be able to fit the D7 model on 2080 Ti. It does not matter how many as D7 would not fit into RAM of a single 2080Ti (and you need to be able to load whole model into RAM of each GPU) even with batchsize of 1, unless you

train head-only
decrease the default resize resolution of input images in the train.py file
turn relevant calculations to FP16 (and thus make use of your tensor cores) - I would actually appreciate if @zylo117 could give some tips on how to do that in the code

I have a 2080 Ti and I stumbled at D5

zylo117 · 2020-08-07T09:01:43Z

@NikZak It seems training in FP16 is much harder for effdet based on my previous experiments.
Most models like yolo, rcnn can performs well in fp16 even if they are trained in fp32. But effdet doesn't. You can try running coco_eval in fp16 with fp32 weights and the mAP will be half of it used to be. Probably because there are too many shared parameters, so they have to be more precise.

You can modify the train.py following the coco_eval.py to train in fp16. BTW, fp16 is not supported when using dataparallel.

NikZak · 2020-08-07T09:32:27Z

Just investigating the memory issue further:
Here is a thread from the official repository of EfficientDet. Looks like memory for GPU training is a big problem:

D0 with batch size 8 not trainable on 11Gb GPU
D5 with batch size 1 not trainable on 11Gb GPU
D6 with batch size 1 not trainable on 24Gb(!) GPU

However it can be trained on TPU without same memory issues

D7 with batch size 4 trainable on TPUv3, where each core has 16Gb

Obviously Google cares about TPU and nuances of the code could be different.

Someone suggested that the trouble is caused by lines like this:
# Sum per level losses to total loss.
cls_loss = tf.add_n(cls_losses)
box_loss = tf.add_n(box_losses)
as TensorFlow has to keep activations from each layer to aggregate a single gradient

For Tensorflow, there is a probable solution here (aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) but it does not seem to be working for EfficientDet implementation as people are still complaining.

However a solution must exist as training (of the Tensorflow official model) works on TPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the configuration of your computer, such as GPU model and GPU memory size #430

What is the configuration of your computer, such as GPU model and GPU memory size #430

zhongqiu1245 commented Jul 17, 2020

zylo117 commented Jul 17, 2020

zhongqiu1245 commented Jul 17, 2020

NikZak commented Aug 7, 2020 •

edited

Loading

zylo117 commented Aug 7, 2020

NikZak commented Aug 7, 2020 •

edited

Loading

What is the configuration of your computer, such as GPU model and GPU memory size #430

What is the configuration of your computer, such as GPU model and GPU memory size #430

Comments

zhongqiu1245 commented Jul 17, 2020

zylo117 commented Jul 17, 2020

zhongqiu1245 commented Jul 17, 2020

NikZak commented Aug 7, 2020 • edited Loading

zylo117 commented Aug 7, 2020

NikZak commented Aug 7, 2020 • edited Loading

NikZak commented Aug 7, 2020 •

edited

Loading

NikZak commented Aug 7, 2020 •

edited

Loading