Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the configuration of your computer, such as GPU model and GPU memory size #430

Open
zhongqiu1245 opened this issue Jul 17, 2020 · 5 comments

Comments

@zhongqiu1245
Copy link

Thank you for your amazing job!
What is the configuration of your computer, such as GPU model and GPU memory size?
I'm looking forward to your reply.

@zylo117
Copy link
Owner

zylo117 commented Jul 17, 2020

2080ti, 11G

@zhongqiu1245
Copy link
Author

If I want to train D7, how many GPUs(2080ti, 11G) should I prepare?

@NikZak
Copy link

NikZak commented Aug 7, 2020

I believe that you would not be able to fit the D7 model on 2080 Ti. It does not matter how many as D7 would not fit into RAM of a single 2080Ti (and you need to be able to load whole model into RAM of each GPU) even with batchsize of 1, unless you

  1. train head-only
  2. decrease the default resize resolution of input images in the train.py file
  3. turn relevant calculations to FP16 (and thus make use of your tensor cores) - I would actually appreciate if @zylo117 could give some tips on how to do that in the code

I have a 2080 Ti and I stumbled at D5

@zylo117
Copy link
Owner

zylo117 commented Aug 7, 2020

@NikZak It seems training in FP16 is much harder for effdet based on my previous experiments.
Most models like yolo, rcnn can performs well in fp16 even if they are trained in fp32. But effdet doesn't. You can try running coco_eval in fp16 with fp32 weights and the mAP will be half of it used to be. Probably because there are too many shared parameters, so they have to be more precise.

You can modify the train.py following the coco_eval.py to train in fp16. BTW, fp16 is not supported when using dataparallel.

@NikZak
Copy link

NikZak commented Aug 7, 2020

Just investigating the memory issue further:
Here is a thread from the official repository of EfficientDet. Looks like memory for GPU training is a big problem:

  • D0 with batch size 8 not trainable on 11Gb GPU
  • D5 with batch size 1 not trainable on 11Gb GPU
  • D6 with batch size 1 not trainable on 24Gb(!) GPU

However it can be trained on TPU without same memory issues

  • D7 with batch size 4 trainable on TPUv3, where each core has 16Gb

Obviously Google cares about TPU and nuances of the code could be different.

Someone suggested that the trouble is caused by lines like this:
# Sum per level losses to total loss.
cls_loss = tf.add_n(cls_losses)
box_loss = tf.add_n(box_losses)
as TensorFlow has to keep activations from each layer to aggregate a single gradient

For Tensorflow, there is a probable solution here (aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) but it does not seem to be working for EfficientDet implementation as people are still complaining.

However a solution must exist as training (of the Tensorflow official model) works on TPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants