Files

.github
001_deeplabv3
002_mobilenetv3-ssd
003_posenet
004_efficientnet
005_one_class_anomaly_detection
006_mobilenetv2-ssdlite
007_mobilenetv2-poseestimation
008_mask_rcnn_inceptionv2
009_multi-scale_local_planar_guidance_for_monocular_depth_estimation
010_mobilenetv3
011_mobilenetv2
012_Fast_Accurate_and_Lightweight_Super-Resolution
013_ml-sound-classifier
014_tf-monodepth2
015_Faster-Grad-CAM
016_EfficientNet-lite
017_Artistic-Style-Transfer
018_EfficientDet
019_White-box-Cartoonization
020_edgetpu-deeplab
021_edgetpu-deeplab-slim
022_Learning_to_See_Moving_Objects_in_the_Dark
023_yolov3-nano
024_yolov3-lite
025_head_pose_estimation
026_mobile-deeplabv3-plus
027_minimal-hand
028_struct2depth
029_human-pose-estimation-3d-0001
030_BlazeFace
031_yolov4
032_FaceMesh
033_Hand_Detection_and_Tracking
034_ssd_mobilenet_v2_mnasfpn_shared_box_predictor
035_BodyPix
036_Objectron
037_First_Neural_Style_Transfer
038_ssdlite_mobiledet_edgetpu
039_ssdlite_mobiledet_cpu
040_DSFD_vgg
041_DBFace
042_centernet
043_face_landmark
044_selfie2anime
045_ssd_mobilenet_v2_oid_v4
046_yolov4-tiny
047_SpineNetMB_49
048_mobile_bert
049_iris_landmark
050_AnimeGANv2
051_East_Text_Detection
052_Handwritten_Text_Recognition
053_BlazePose
054_KNIFT
055_Handwritten_Japanese_Recognition
056_TextBoxes++
057_BiSeNetV2
058_keras-retinanet
060_hair_segmentation
061_U-2-Net
062_facial_cartoonization
063_3d-bounding-box-estimation-for-autonomous-driving
064_Dense_Depth
065_ThreeDPoseUnityBarracuda
066_footprints
067_MiDaS
068_Colorful_Image_Colorization
069_ENet
070_age-gender-recognition
071_Noise2Noise
072_NanoDet
073_RetinaNet
074_Yolact
075_ERFNet
076_Deep_White_Balance
077_ESRGAN
078_MODNet
079_MIRNet
080_tf_pose_estimation
081_MiDaS_v2
082_MediaPipe_Meet_Segmentation
083_Person_Reidentification
084_EfficientPose
085_Yolact_Edge
086_defocus-deblurring-dual-pixel
087_DeepSort
088_mobilenetv3-poseestimation
089_DETR
090_Ghost-free_Shadow_Removal
091_gaze-estimation-adas-0002
092_weld-porosity-detection-0001
093_ocr_japanese
094_hand_recrop
095_centerface
096_RetinaFace
097_YAMNet
098_SPICE
099_efficientnet_anomaly_detection_segmentation
100_HiFill
101_arbitrary_image_stylization
102_Coconet
103_EfficientDet_lite
104_DeeplabV3-plus
105_MobileStyleGAN
106_WHENet
107_SFA3D
108_HAWP
109_Selfie_Segmentation
110_L-CNN
111_SRN-Deblur
112_DeblurGANv2
113_Anime2Sketch
114_Two-branch-dehazing
115_MoveNet
116_DroNet
117_DTLN
118_Speech-enhancement
119_M-LSD
120_FRILL
121_GPT2_DistillGPT2
122_DistillBert
123_YOLOR
124_person-attributes-recognition-crossroad-0230
125_person-attributes-recognition-crossroad-0234
126_person-attributes-recognition-crossroad-0238
127_dino
129_SCRFD
131_CFNet
132_YOLOX
133_Real-ESRGAN
134_head-pose-estimation-adas-0001
135_CoEx
136_road-segmentation-adas-0001
137_MoveNet_MultiPose
138_BackgroundMattingV2
139_PSD-Principled-Synthetic-to-Real-Dehazing-Guided-by-Physical-Priors
140_Ultra-Fast-Lane-Detection
141_lanenet-lane-detection
142_HITNET
143_RAPiD
144_YuNet
145_text_detection_db
146_FastDepth
147_PackNet-SfM
148_LapDepth
149_depth_estimation
150_MobileStereoNet
151_object_detection_mobile_object_localizer
152_DeepLPF
153_MegaDepth
154_driver-action-recognition-adas-0002-encoder
155_driver-action-recognition-adas-0002-decoder
156_MobileHumanPose
157_3DMPPE_POSENET
158_HR-Depth
159_EPCDepth
160_msg_chn_wacv20
161_EigenGAN-Tensorflow
162_PyDNet
163_MST_inpainting
164_MADNet
165_RealtimeStereo
166_Insta-DM
167_LSTR
168_DPT
169_spaghettinet_edgetpu
170_Learning-to-See-in-the-Dark
171_Fast-SRGAN
172_Real-Time-Super-Resolution
173_MVDepthNet
174_PP-PicoDet
175_face-recognition-resnet100-arcface-onnx
176_StableLLVE
177_BirdNET-Lite
178_vehicle-detection-0200
179_person-detection-0202
181_models_edgetpu_checkpoint_and_tflite_vision_segmentation-edgetpu_tflite_default_argmax
182_models_edgetpu_checkpoint_and_tflite_vision_segmentation-edgetpu_tflite_fused_argmax
183_pedestrian-detection-adas-0002
184_pedestrian-and-vehicle-detector-adas-0001
185_person-vehicle-bike-detection-crossroad-0078
186_person-vehicle-bike-detection-crossroad-1016
187_vehicle-attributes-recognition-barrier-0039
188_vehicle-attributes-recognition-barrier-0042
189_vehicle-license-plate-detection-barrier-0106
190_person-detection-asl-0001
191_anti-spoof-mn3
192_open-closed-eye-0001
193_CoCosNet
194_face_recognizer_fast
195_person_reid_youtu
196_human_segmentation_pphumanseg
197_yolact-resnet50-fpn
198_YOLOF
199_NSFW
200_AGLLNet
201_CityscapesSOTA
202_stereoDNN
203_SRHNet
204_HINet
205_MBLLEN
206_Matting
207_GLADNet
208_SAPNet
209_MSBDN-DFF
210_SC_Depth_pl
211_Lac-GwcNet
212_GFN
213_TBEFN
214_EnlightenGAN
215_AOD-Net
216_Zero-DCE-TF
217_RUAS
218_DSLR
219_StereoNet
220_HEP
221_YOLACT-PyTorch
222_LFT
223_DA_dahazing
224_Y-net
225_NTIRE-2021-Dehazing-Two-branch
226_CascadeTableNet
227_face-detection-adas-0001
228_Fast-SCNN
229_DexiNed
230_Single-Image-Desnowing-HDCWNet
231_DRBL
232_MIMO-UNet
233_HRNet-for-Fashion-Landmark-Estimation
234_FBCNN
235_W-Stereo-Disp
236_A-TVSNet
237_piano_transcription
238_SUIM-Net
239_CasStereoNet
240_BSRGAN
241_SCL-LLE
242_RobustVideoMatting
243_Zero-DCE-improved
244_FINNger
245_GLPDepth
246_SqueezeSegV3
247_PoseC3D
248_MS-G3D
249_Real-CUGAN
250_Face-Mask-Detection
251_AU-GAN
252_RAFT
253_TransWeather
254_FullSubNet-plus
255_FILM
256_SFace
257_PiCANet
258_TinyHITNet
259_Emotion_FERPlus
260_KP2D
261_EfficientDerain
262_ByteTrack
263_EgoNet
264_object_localization_network
265_PoseAug
266_ACVNet
267_LIOT
268_Lite-HRNet
269_Higher-HRNet
270_HWMNet
271_HRNet
272_CSFlow
273_OPN
274_DeepFillv2
275_FD-GAN
276_HybridNets
277_EDN-GTM
278_DWARF
279_F-Clip
280_GASDA
281_IMDN
282_face_landmark_with_attention
283_UIE-WD
284_CREStereo
285_Decoupled-Low-light-Image-Enhancement
286_SCI
287_Topformer
288_perceptual-reflection-removal
289_face-detection-0100
290_AdaFace
291_SeAFusion
292_Graft-PSMNet
293_Lightweight-Head-Pose-Estimation
294_FSRE-Depth
295_SparseInst
296_MGNet
297_GazeNet
297x_↑↑↑_OpenVINO_2021.4.582_↓↓↓_OpenVINO_2022.1.0
298_DEQ-Flow
299_DGNet
300_6DRepNet
301_YOLOv4_Face
302_SLPT
303_FAN
304_SynergyNet
305_DMHead
306_GMFlowNet
307_YOLOv7
308_FastestDet
309_ImageForensicsOSN
310_attentive-gan-derainnet
311_HHP-Net
312_NeWCRFs
313_IS-Net
314_PyDNet2
315_Illumination-Adaptive-Transformer
316_night_enhancement
317_MobileOne
318_pips
319_ACR-Loss
320_Dehamer
321_DID-M3D
322_YOLOv7_Head
323_Stripformer
324_Ultra-Fast-Lane-Detection-v2
325_DehazeFormer
326_YOLOPv2
327_EMDC
328_Stable_Diffusion
329_YOLOX-PAI
330_MOSAIC
332_CrowdDet
333_E2Pose
334_DAMO-YOLO
335_PIDNet
336_PP-YOLOE-Plus
337_FreeYOLO
338_Fast-ACVNet
339_DeepLSD
340_Dense-Head-Pose-Estimation
341_YOLOv6
342_ALIKE
343_PP-MattingV2
344_XYDeblur
346_facial_expression_recognition_mobilefacenet
347_RGBX_Semantic_Segmentation
348_Bread
349_PMN
350_P-STMO
351_RFDN
352_MAXIM
353_ShadowFormer
354_DEA-Net
355_MHFormer
356_EdgeYOLO
357_Unimatch
358_CGI-Stereo
359_MSPFN
360_PARSeq
361_KBNet
362_ZoeDepth
363_YOLO-6D-Pose
364_IGEV
365_HTNet
366_text_recognition_CRNN
367_FLW-Net
368_C2PNet
369_Segment_Anything
370_Semantic-Guided-Low-Light-Image-Enhancement
371_Lite-Mono
372_URetinex-Net
373_LiteTrack
374_LaneSOD
375_SCANet
376_RT-DETR
377_DRSformer
378_P2PNet_tfkeras
379_PP-LCNetV2
380_Skin-Clothes-Hair-Segmentation-using-SMP
381_Whisper
382_Light-SERNet
383_DirectMHP
384_TCMonoDepth
385_PairLIE
386_naruto_handsign_detection
387_YuNetV2
388_LightGlue
389_WGWS-Net
390_BlendshapeV2
391_MagicTouch
392_STCFormer
393_RTMPose_WholeBody
394_RTMPose_Animal
395_FFNet
396_MixDehazeNet
397_MiDaSv3.1
398_L2CS-Net
399_RetinaFace_MobileNetv2
400_CSRNet
401_CLRerNet
402_trt_pose
403_trt_pose_hand
404_HDR-Transformer
405_Ear_Segmentation
406_DeDoDe
407_Generalizing_Gaze_Estimation
408_UAED
409_nighttime_dehaze
410_FaceMeshV2
411_UDR-S2Former_deraining
412_pytorch_cpn
413_DocShadow
414_STAR
415_High-frequency-Stereo-Matching-Network
416_GeoNet
417_PopNet
418_Diffusion-Low-Light
419_MobileViT_v1_v2
420_Gold-YOLO-Hand
421_Gold-YOLO-Head
422_Gold-YOLO-Head-Hand
423_6DRepNet360
424_Gold-YOLO-Body
425_Gold-YOLO-Body-Head-Hand
426_YOLOX-Body-Head-Hand
427_RTMPose_Hand
428_ISR
429_OSNet
430_FastReID
431_NITEC
432_face-reidentification-retail-0095
433_FaceBoxes.PyTorch
434_YOLOX-Body-Head-Hand-Face
435_MobileFaceNet
436_Peppa_Pig_Face_Landmark
437_PIPNet
438_PeCLR
439_Depth-Anything
440_ViTPose
441_YOLOX-Body-Head-Hand-Face-Dist
442_YOLOX-Body-Head-Face-HandLR-Dist
443_Opal23_HeadPose
444_YOLOX-Foot-Dist
445_YOLOX-Body-Head-Face-HandLR-Foot-Dist
446_YOLOX-Body-With-Wheelchair
447_YOLOX-Wholebody-with-Wheelchair
448_YOLOX-Eye-Nose-Mouth-Ear
449_YOLOX-WholeBody12
450_YOLOv9-Wholebody-with-Wheelchair
451_DAN
452_FairFace
453_FairDAN
454_YOLOv9-Wholebody13
455_YOLOv9-Gender
456_YOLOv9-Wholebody15
457_YOLOv9-Wholebody17
458_YOLOv9-Discrete-HeadPose-Yaw
459_YOLOv9-Wholebody25
460_RT-DETRv2-Wholebody25
461_YOLOv9-Phone
- demo
- yolov9
- LICENSE
- README.md
- download.sh
- url.txt
462_Gaze-LLE
463_YOLOv9-Shoulder-Elbow-Knee
464_YOLOv9-Wholebody28
465_DEIM-Wholebody28
466_People_Segmentation
467_Human_Parsing
999_media
third_party
.gitignore
.gitmodules
LICENSE
README.md
log-cleaner.sh

461_YOLOv9-Phone

Name		Name	Last commit message	Last commit date
parent directory ..
demo		demo
yolov9		yolov9
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
url.txt		url.txt

README.md

461_YOLOv9-Phone

Lightweight phone detection models.

output_phone_e.mp4

output `Objects score threshold >= 0.35`	output `Objects score threshold >= 0.35`

The use of CD-COCO: Complex Distorted COCO database for Scene-Context-Aware computer vision has also greatly improved resistance to various types of noise.

Global distortions
- Noise
- Contrast
- Compression
- Photorealistic Rain
- Photorealistic Haze
- Motion-Blur
- Defocus-Blur
- Backlight illumination
Local distortions
- Motion-Blur
- Defocus-Blur
- Backlight illumination

1. Dataset

2. Test

Python 3.10
onnx 1.16.1+
onnxruntime-gpu v1.18.1 (TensorRT Execution Provider Enabled Binary. See: onnxruntime-gpu v1.18.1 + CUDA 12.5 + TensorRT 10.2.0 build (RTX3070)
opencv-contrib-python 4.10.0.84+
numpy 1.24.3

TensorRT 10.2.0.19-1+cuda12.5

# Common ############################################
pip install opencv-contrib-python numpy onnx

# For ONNX ##########################################
pip uninstall onnxruntime onnxruntime-gpu

pip install onnxruntime
or
pip install onnxruntime-gpu

Demonstration of models with built-in post-processing (Float32/Float16)

usage:
  demo_yolov9_onnx_phone.py \
  [-h] \
  [-m MODEL] \
  (-v VIDEO | -i IMAGES_DIR) \
  [-ep {cpu,cuda,tensorrt}] \
  [-it] \
  [-dvw] \
  [-dwk] \
  [-ost] \
  [-ast] \
  [-dnm] \
  [-dgm] \
  [-dlr] \
  [-dhm] \
  [-oyt]

options:
  -h, --help
    show this help message and exit
  -m MODEL, --model MODEL
    ONNX/TFLite file path for YOLOv9.
  -v VIDEO, --video VIDEO
    Video file path or camera index.
  -i IMAGES_DIR, --images_dir IMAGES_DIR
    jpg, png images folder path.
  -ep {cpu,cuda,tensorrt}, \
      --execution_provider {cpu,cuda,tensorrt}
    Execution provider for ONNXRuntime.
  -it {fp16,int8}, --inference_type {fp16,int8}
    Inference type. Default: fp16
  -dvw, --disable_video_writer
    Disable video writer. Eliminates the file I/O load associated with automatic
    recording to MP4. Devices that use a MicroSD card or similar for main
    storage can speed up overall processing.
  -dwk, --disable_waitKey
    Disable cv2.waitKey(). When you want to process a batch of still images,
    disable key-input wait and process them continuously.
  -dnm, --disable_generation_identification_mode
    Disable generation identification mode.
    (Press N on the keyboard to switch modes)
  -dgm, --disable_gender_identification_mode
    Disable gender identification mode.
    (Press G on the keyboard to switch modes)
  -dlr, --disable_left_and_right_hand_identification_mode
    Disable left and right hand identification mode.
    (Press H on the keyboard to switch modes)
  -dhm, --disable_headpose_identification_mode
    Disable HeadPose identification mode.
    (Press P on the keyboard to switch modes)
  -oyt, --output_yolo_format_text
    Output YOLO format texts and images.

YOLOv9-Phone - N - Swish/SiLU (PINTO original implementation, 2.4 MB)

Class Images Instances     P     R mAP50 mAP50-95
  all   2007      2676 0.706 0.517 0.582    0.389

YOLOv9-Phone - T - Swish/SiLU

Class Images Instances     P     R mAP50 mAP50-95
  all   2007      2676 0.792 0.647 0.711    0.505

YOLOv9-Phone - S - Swish/SiLU

Class Images Instances     P     R mAP50 mAP50-95
  all   2007      2676 0.885 0.704 0.792    0.590

YOLOv9-Phone - E - Swish/SiLU

Class Images Instances     P     R mAP50 mAP50-95
  all   2007      2676 0.896 0.732 0.805    0.612

Pre-Process

To ensure fair benchmark comparisons with YOLOX, BGR to RGB conversion processing and normalization by division by 255.0 are added to the model input section. In addition, a resizing process for input images has been added to improve operational flexibility. Thus, in any model, inferences can be made at any image size. The string 1x3x{H}x{W} at the end of the file name does not indicate the input size of the image, but the processing resolution inside the model. Therefore, the smaller the values of {H} and {W}, the lower the computational cost and the faster the inference speed. Models with larger values of {H} and {W} increase the computational cost and decrease the inference speed. Since the concept is different from the resolution of an image, any size image can be batch processed. e.g. 240x320, 480x640, 720x1280, ...

Post-Process

Because I add my own post-processing to the end of the model, which can be inferred by TensorRT, CUDA, and CPU, the benchmarked inference speed is the end-to-end processing speed including all pre-processing and post-processing. EfficientNMS in TensorRT is very slow and should be offloaded to the CPU.

NMS default parameter

param	value	note
max_output_boxes_per_class	20	Maximum number of outputs per class of one type. `20` indicates that the maximum number of people detected is `20`, the maximum number of heads detected is `20`, and the maximum number of hands detected is `20`. The larger the number, the more people can be detected, but the inference speed slows down slightly due to the larger overhead of NMS processing by the CPU. In addition, as the number of elements in the final output tensor increases, the amount of information transferred between hardware increases, resulting in higher transfer costs on the hardware circuit. Therefore, it would be desirable to set the numerical size to the minimum necessary.
iou_threshold	0.40	A value indicating the percentage of occlusion allowed for multiple bounding boxes of the same class. `0.40` is excluded from the detection results if, for example, two bounding boxes overlap in more than 41% of the area. The smaller the value, the more occlusion is tolerated, but over-detection may increase.
score_threshold	0.25	Bounding box confidence threshold. Specify in the range of `0.00` to `1.00`. The larger the value, the stricter the filtering and the lower the NMS processing load, but in exchange, all but bounding boxes with high confidence values are excluded from detection. This is a parameter that has a very large percentage impact on NMS overhead.

Change NMS parameters

Use PINTO0309/sam4onnx to rewrite the NonMaxSuppression parameter in the ONNX file.

For example,

pip install onnxsim==0.4.33 \
&& pip install -U simple-onnx-processing-tools \
&& pip install -U onnx \
&& python -m pip install -U onnx_graphsurgeon \
    --index-url https://pypi.ngc.nvidia.com

### max_output_boxes_per_class
### Example of changing the maximum number of detections per class to 100.
sam4onnx \
--op_name main01_nonmaxsuppression13 \
--input_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--output_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--input_constants main01_max_output_boxes_per_class int64 [100]

### iou_threshold
### Example of changing the allowable area of occlusion to 20%.
sam4onnx \
--op_name main01_nonmaxsuppression13 \
--input_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--output_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--input_constants main01_iou_threshold float32 [0.20]

### score_threshold
### Example of changing the bounding box score threshold to 15%.
sam4onnx \
--op_name main01_nonmaxsuppression13 \
--input_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--output_onnx_file_path yolov9_e_phone_post_0100_1x3x480x640.onnx \
--input_constants main01_score_threshold float32 [0.15]

Post-processing structure

PyTorch alone cannot generate this post-processing. For operational flexibility, EfficientNMS is not used.

INT8 quantization (YOLOv9-QAT)

3. Citiation

If this work has contributed in any way to your research or business, I would be happy to be cited in your literature.

@software{YOLOv9-Phone,
  author={Katsuya Hyodo},
  title={Lightweight phone detection models.},
  url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/461_YOLOv9-Phone},
  year={2024},
  month={11},
  doi={10.5281/zenodo.10229410}
}

4. Cited

I am very grateful for their excellent work.

MS-COCO

CD-COCO: Complex Distorted COCO database for Scene-Context-Aware computer vision

@INPROCEEDINGS{10323035,
  author={Beghdadi, Ayman and Beghdadi, Azeddine and Mallem, Malik and Beji, Lotfi and Cheikh, Faouzi Alaya},
  booktitle={2023 11th European Workshop on Visual Information Processing (EUVIP)},
  title={CD-COCO: A Versatile Complex Distorted COCO Database for Scene-Context-Aware Computer Vision},
  year={2023},
  volume={},
  number={},
  pages={1-6},
  doi={10.1109/EUVIP58404.2023.10323035}
}

YOLOv9

https://github.com/WongKinYiu/yolov9

@article{wang2024yolov9,
  title={{YOLOv9}: Learning What You Want to Learn Using Programmable Gradient Information},
  author={Wang, Chien-Yao  and Liao, Hong-Yuan Mark},
  booktitle={arXiv preprint arXiv:2402.13616},
  year={2024}
}

YOLOv9-QAT

https://github.com/levipereira/yolov9-qat

5. License

GPLv3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

461_YOLOv9-Phone

461_YOLOv9-Phone

README.md

461_YOLOv9-Phone

1. Dataset

2. Test

3. Citiation

4. Cited

5. License

Files

461_YOLOv9-Phone

Directory actions

More options

Directory actions

More options

Latest commit

History

461_YOLOv9-Phone

Folders and files

parent directory

README.md

461_YOLOv9-Phone

1. Dataset

2. Test

3. Citiation

4. Cited

5. License