Skip to content
This repository was archived by the owner on Jun 5, 2024. It is now read-only.

Commit 51a136d

Browse files
committed
Added EfficientDet
1 parent 943795e commit 51a136d

File tree

3 files changed

+49
-2
lines changed

3 files changed

+49
-2
lines changed

COMMANDS.md

+40
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,43 @@ $ python obj_detect_tracking.py \
1414
```
1515
This is for processing AVI videos. For MP4 videos, run without `--use_lijun`.
1616
Add `--log_time_and_gpu` to get GPU utilization and time profile.
17+
18+
19+
## 04-2020, added EfficientDet
20+
The [EfficientDet (CVPR 2020)](https://github.com/google/automl/tree/master/efficientdet) is reported to be more than 14 mAP better than the Resnet-50 FPN model we used on COCO.
21+
22+
I have made the following changes based on the code from early March:
23+
+ The original code assumes width==height and it will pad (1280x720) frame to (1280x1280) at the beginning, which wastes much computation. See [this issue](https://github.com/google/automl/issues/162). This is an easy fix. Note that I make sure the image sizes are multipliers of 128 (2^7) with some paddings. So (1280x720) inputs would be (1280x768).
24+
+ Added multi-level ROI align with the final detection boxes since we need the FPN box features for deep-SORT tracking. Basically since one-stage object detection models have box predictions at each feature level, I added a level index variable to keep track of each box's feature level so that in the end they can be efficiently backtracked to the original feature map and crop the features.
25+
+ Similar to the MaskRCNN model, I modified the EfficientDet to allow NMS on only some of the COCO classes (currently we only care about person and vehicle) and save computations.
26+
+ Separate the tf.py_func stuff since this part of the graph cannot be saved to a .pb model. (The official EfficientDet code is still actively being developed and this problem seems to have been solved. Will look into this later.)
27+
28+
Example command \[[d0 model from early March](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/efficientdet-d0.tar.gz)\]:
29+
```
30+
$ python obj_detect_tracking.py \
31+
--model_path efficientdet-d0 \
32+
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
33+
--efficientdet_max_detection_topk 1000 \
34+
--video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
35+
--version 2 --is_coco_model --use_partial_classes --frame_gap 8 \
36+
--get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
37+
--max_size 1280 --short_edge_size 720 \
38+
--use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
39+
--max_cosine_distance 0.5 --nn_budget 5
40+
```
41+
This is for processing AVI videos. For MP4 videos, run without `--use_lijun`.
42+
Add `--log_time_and_gpu` to get GPU utilization and time profile.
43+
44+
Example command with a partial frozen graph \[[d0-TFv1.15](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/efficientd0_tfv1.15_1280x720.pb)\]:
45+
```
46+
$ python obj_detect_tracking.py \
47+
--model_path efficientd0_tfv1.15_1280x720.pb --is_load_from_pb \
48+
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
49+
--efficientdet_max_detection_topk 1000 \
50+
--video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
51+
--version 2 --is_coco_model --use_partial_classes --frame_gap 8 \
52+
--get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
53+
--max_size 1280 --short_edge_size 720 \
54+
--use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
55+
--max_cosine_distance 0.5 --nn_budget 5
56+
```

README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,11 @@ We utilize state-of-the-art object detection and tracking algorithm in surveilla
3535
</div>
3636

3737
## Updates
38-
[02/2020] We used Resnet-50 FPN model trained on MS-COCO for [MEVA](http://mevadata.org/) activity detection and got a competitive pAUDC of [0.49](images/inf_actev_0.49audc_02-2020.png) on the [leaderboard](https://actev.nist.gov/sdl) with a total processing speed of 0.64x real-time on a 4-GPU machine. The object detection module's processing speed is about 0.125x real-time. \[[Frozen Model](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb)\] \[[Example Command](COMMANDS.md)\]
38+
+ [04/2020] Added [EfficientDet (CVPR 2020)](https://github.com/google/automl/tree/master/efficientdet) for inferencing. It is reported to be more than 14 mAP better than the Resnet-50 FPN model we used. Modified to be more efficient and tested with Python 2 & 3 and TF 1.15. See example commands [here](COMMANDS.md).
3939

40-
[01/2020] We discovered a problem with using OpenCV to extract frames for avi videos. Some avi videos have duplicate frames that are not physically presented in the files but only text instructions to duplicate previous frames. The problem is that OpenCV skip these frames without warning according to [this bug report](https://github.com/opencv/opencv/issues/9053) and [here](https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037). Therefore with OpenCV you may get fewer frames which causes the frame index of detection results to be incorrect. Solution: 1. convert the avi videos to mp4 format; 2. use MoviePy or [Lijun's video loader](https://github.com/Lijun-Yu/diva_io) (based on PyAV) but they are 10% ~ 30% slower than OpenCV frame extraction. See `obj_detect_tracking.py` for implementation.
40+
+ [02/2020] We used Resnet-50 FPN model trained on MS-COCO for [MEVA](http://mevadata.org/) activity detection and got a competitive pAUDC of [0.49](images/inf_actev_0.49audc_02-2020.png) on the [leaderboard](https://actev.nist.gov/sdl) with a total processing speed of 0.64x real-time on a 4-GPU machine. The object detection module's processing speed is about 0.125x real-time. \[[Frozen Model](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb)\] \[[Example Command](COMMANDS.md)\]
41+
42+
+ [01/2020] We discovered a problem with using OpenCV to extract frames for avi videos. Some avi videos have duplicate frames that are not physically presented in the files but only text instructions to duplicate previous frames. The problem is that OpenCV skip these frames without warning according to [this bug report](https://github.com/opencv/opencv/issues/9053) and [here](https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037). Therefore with OpenCV you may get fewer frames which causes the frame index of detection results to be incorrect. Solution: 1. convert the avi videos to mp4 format; 2. use MoviePy or [Lijun's video loader](https://github.com/Lijun-Yu/diva_io) (based on PyAV) but they are 10% ~ 30% slower than OpenCV frame extraction. See `obj_detect_tracking.py` for implementation.
4143

4244
## Dependencies
4345
The code is originally written for Tensorflow v1.10 with Python 2/3 but it works on v1.13.1, too. Note that I didn't change the code for v1.13.1 instead I just disable Tensorflow warnings and logging. I have also tested this on tf v1.14.0 (ResNeXt backbone will need >=1.14 for group convolution support).

obj_detect_tracking.py

+5
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@
2020
import logging
2121
logging.getLogger("tensorflow").disabled = True
2222

23+
import matplotlib
24+
# avoid the warning "gdk_cursor_new_for_display:
25+
# assertion 'GDK_IS_DISPLAY (display)' failed" with Python 3
26+
matplotlib.use('Agg')
27+
2328
from tqdm import tqdm
2429

2530
import numpy as np

0 commit comments

Comments
 (0)