-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do you get the result more slowly than in the official paper? #77
Comments
I have no idea why tf version is so slow, but it's true, it is slow. And it's not just me, The community also found out tf version is much slower. check this out: It's 16ms on the paper, but it runs at 200-300ms, 700 if it's a complex image. My best bet is that there is a bug or environment issue in tf effdet. |
@mingxingtan Hi, please could you explain the cause of this problem? |
@zylo117 Great to see you make the pytorch version much faster! Congratulations! @AlexeyAB Our paper mainly focuses on network architecture, so the latency is for network architecture excluding post-processing / nms. When open sourcing it, I was lazy and just copied the same python-based post-processing from TF-RetinaNet, which is slow since it purely runs with python on CPU. See google/automl#97, I will try to prioritize speeding up post-processing. (+ @fsx950223, who is the main contributor for this part). |
@mingxingtan Thanks! Did you develope EfficientDet primarily for GPU or TPU? So the results |
@AlexeyAB @mingxingtan I think the large differences in timing is due to NMS. Python NMS can often be >10X slower than C/CUDA NMS, so if python NMS is used it can easily dominate the total detection time. Pytorch updated their cpu/gpu NMS code last year, and now NMS operations are very fast, so this is probably why this repo is showing faster speeds. Over on https://github.com/ultralytics/yolov3, the average speed for each image across the 5000 coco test images on a V100 using yolov3-spp (at 43mAP) is:
|
In the C Darknet implementation https://github.com/AlexeyAB/darknet NMS ~ 0.0ms, but it depends on number of detected bboxes. On RTX 2070: AlexeyAB/darknet#4497 (comment)
|
@AlexeyAB ah this is very fast too! In the past when I had python NMS it might take up to 10-30 ms or more for NMS per image, so this reduction down to 2 ms is a great speedup for me. The time I posted is for testing (i.e. very low --conf 0.001), which will generate many hundreds/thousands of boxes needing NMS. For regular inference (i.e. --conf 0.50) the NMS cost should be closer to near zero as your number shows. Maybe this makes it especially odd that the TF efficientdet postdetection time is slow for regular inference. Part of the slowdown is surely due to the grouped convolutions, but this number should be baked in to Table 2 I assume. |
Could you explain why your official result is much slower than my cpu? |
I think it's just the cuDNN's heuristics making a mistake in the algorithm selection. You need to override cuDNN's heuristics to use "unfused" I have seen models like MobileNet, EfficientNet YOLO, etc. become 10x faster after forcing cuDNN to use Since this model is using depthwise convolution, I speculate that this might be causing the problems. |
you use the Yet-Another-EfficientDet-D0 and get the 36.2 FPS using efficientdet_test code? |
@zylo117 Hi!
Nice work!
Can you explain please, why do you get the result more slowly than in the official paper?
Does the official code https://github.com/google/automl/tree/master/efficientdet not reproduce the results from the article?
From the official paper Official D0 41.67 FPS on Titan V - it is about ~33.33 FPS on RTX 2080 Ti, while you get only 2.09 FPS with official D0 on RTX 2080 Ti.
https://arxiv.org/abs/1911.09070v4
The text was updated successfully, but these errors were encountered: