You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 5, 2024. It is now read-only.
This is for processing AVI videos. For MP4 videos, run without `--use_lijun`.
16
16
Add `--log_time_and_gpu` to get GPU utilization and time profile.
17
+
18
+
19
+
## 04-2020, added EfficientDet
20
+
The [EfficientDet (CVPR 2020)](https://github.com/google/automl/tree/master/efficientdet) is reported to be more than 14 mAP better than the Resnet-50 FPN model we used on COCO.
21
+
22
+
I have made the following changes based on the code from early March:
23
+
+ The original code assumes width==height and it will pad (1280x720) frame to (1280x1280) at the beginning, which wastes much computation. See [this issue](https://github.com/google/automl/issues/162). This is an easy fix. Note that I make sure the image sizes are multipliers of 128 (2^7) with some paddings. So (1280x720) inputs would be (1280x768).
24
+
+ Added multi-level ROI align with the final detection boxes since we need the FPN box features for deep-SORT tracking. Basically since one-stage object detection models have box predictions at each feature level, I added a level index variable to keep track of each box's feature level so that in the end they can be efficiently backtracked to the original feature map and crop the features.
25
+
+ Similar to the MaskRCNN model, I modified the EfficientDet to allow NMS on only some of the COCO classes (currently we only care about person and vehicle) and save computations.
26
+
+ Separate the tf.py_func stuff since this part of the graph cannot be saved to a .pb model. (The official EfficientDet code is still actively being developed and this problem seems to have been solved. Will look into this later.)
27
+
28
+
Example command \[[d0 model from early March](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/efficientdet-d0.tar.gz)\]:
This is for processing AVI videos. For MP4 videos, run without `--use_lijun`.
42
+
Add `--log_time_and_gpu` to get GPU utilization and time profile.
43
+
44
+
Example command with a partial frozen graph \[[d0-TFv1.15](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/efficientd0_tfv1.15_1280x720.pb)\]:
Copy file name to clipboardExpand all lines: README.md
+4-2
Original file line number
Diff line number
Diff line change
@@ -35,9 +35,11 @@ We utilize state-of-the-art object detection and tracking algorithm in surveilla
35
35
</div>
36
36
37
37
## Updates
38
-
[02/2020]We used Resnet-50 FPN model trained on MS-COCO for [MEVA](http://mevadata.org/) activity detection and got a competitive pAUDC of [0.49](images/inf_actev_0.49audc_02-2020.png) on the [leaderboard](https://actev.nist.gov/sdl) with a total processing speed of 0.64x real-time on a 4-GPU machine. The object detection module's processing speed is about 0.125x real-time. \[[Frozen Model](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb)\]\[[Example Command](COMMANDS.md)\]
38
+
+[04/2020]Added [EfficientDet (CVPR 2020)](https://github.com/google/automl/tree/master/efficientdet) for inferencing. It is reported to be more than 14 mAP better than the Resnet-50 FPN model we used. Modified to be more efficient and tested with Python 2 & 3 and TF 1.15. See example commands [here](COMMANDS.md).
39
39
40
-
[01/2020] We discovered a problem with using OpenCV to extract frames for avi videos. Some avi videos have duplicate frames that are not physically presented in the files but only text instructions to duplicate previous frames. The problem is that OpenCV skip these frames without warning according to [this bug report](https://github.com/opencv/opencv/issues/9053) and [here](https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037). Therefore with OpenCV you may get fewer frames which causes the frame index of detection results to be incorrect. Solution: 1. convert the avi videos to mp4 format; 2. use MoviePy or [Lijun's video loader](https://github.com/Lijun-Yu/diva_io) (based on PyAV) but they are 10% ~ 30% slower than OpenCV frame extraction. See `obj_detect_tracking.py` for implementation.
40
+
+[02/2020] We used Resnet-50 FPN model trained on MS-COCO for [MEVA](http://mevadata.org/) activity detection and got a competitive pAUDC of [0.49](images/inf_actev_0.49audc_02-2020.png) on the [leaderboard](https://actev.nist.gov/sdl) with a total processing speed of 0.64x real-time on a 4-GPU machine. The object detection module's processing speed is about 0.125x real-time. \[[Frozen Model](https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb)\]\[[Example Command](COMMANDS.md)\]
41
+
42
+
+[01/2020] We discovered a problem with using OpenCV to extract frames for avi videos. Some avi videos have duplicate frames that are not physically presented in the files but only text instructions to duplicate previous frames. The problem is that OpenCV skip these frames without warning according to [this bug report](https://github.com/opencv/opencv/issues/9053) and [here](https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037). Therefore with OpenCV you may get fewer frames which causes the frame index of detection results to be incorrect. Solution: 1. convert the avi videos to mp4 format; 2. use MoviePy or [Lijun's video loader](https://github.com/Lijun-Yu/diva_io) (based on PyAV) but they are 10% ~ 30% slower than OpenCV frame extraction. See `obj_detect_tracking.py` for implementation.
41
43
42
44
## Dependencies
43
45
The code is originally written for Tensorflow v1.10 with Python 2/3 but it works on v1.13.1, too. Note that I didn't change the code for v1.13.1 instead I just disable Tensorflow warnings and logging. I have also tested this on tf v1.14.0 (ResNeXt backbone will need >=1.14 for group convolution support).
0 commit comments