Files

.github
001_deeplabv3
002_mobilenetv3-ssd
003_posenet
004_efficientnet
005_one_class_anomaly_detection
006_mobilenetv2-ssdlite
007_mobilenetv2-poseestimation
008_mask_rcnn_inceptionv2
009_multi-scale_local_planar_guidance_for_monocular_depth_estimation
010_mobilenetv3
011_mobilenetv2
012_Fast_Accurate_and_Lightweight_Super-Resolution
013_ml-sound-classifier
014_tf-monodepth2
015_Faster-Grad-CAM
016_EfficientNet-lite
017_Artistic-Style-Transfer
018_EfficientDet
019_White-box-Cartoonization
020_edgetpu-deeplab
021_edgetpu-deeplab-slim
022_Learning_to_See_Moving_Objects_in_the_Dark
023_yolov3-nano
024_yolov3-lite
025_head_pose_estimation
026_mobile-deeplabv3-plus
027_minimal-hand
028_struct2depth
029_human-pose-estimation-3d-0001
030_BlazeFace
031_yolov4
032_FaceMesh
033_Hand_Detection_and_Tracking
034_ssd_mobilenet_v2_mnasfpn_shared_box_predictor
035_BodyPix
036_Objectron
037_First_Neural_Style_Transfer
038_ssdlite_mobiledet_edgetpu
039_ssdlite_mobiledet_cpu
040_DSFD_vgg
041_DBFace
042_centernet
043_face_landmark
044_selfie2anime
045_ssd_mobilenet_v2_oid_v4
046_yolov4-tiny
047_SpineNetMB_49
048_mobile_bert
049_iris_landmark
050_AnimeGANv2
051_East_Text_Detection
052_Handwritten_Text_Recognition
053_BlazePose
054_KNIFT
055_Handwritten_Japanese_Recognition
056_TextBoxes++
057_BiSeNetV2
058_keras-retinanet
060_hair_segmentation
061_U-2-Net
062_facial_cartoonization
063_3d-bounding-box-estimation-for-autonomous-driving
064_Dense_Depth
065_ThreeDPoseUnityBarracuda
066_footprints
067_MiDaS
068_Colorful_Image_Colorization
069_ENet
070_age-gender-recognition
071_Noise2Noise
072_NanoDet
073_RetinaNet
074_Yolact
075_ERFNet
076_Deep_White_Balance
077_ESRGAN
078_MODNet
079_MIRNet
080_tf_pose_estimation
081_MiDaS_v2
082_MediaPipe_Meet_Segmentation
083_Person_Reidentification
084_EfficientPose
085_Yolact_Edge
086_defocus-deblurring-dual-pixel
087_DeepSort
088_mobilenetv3-poseestimation
089_DETR
090_Ghost-free_Shadow_Removal
091_gaze-estimation-adas-0002
092_weld-porosity-detection-0001
093_ocr_japanese
094_hand_recrop
095_centerface
096_RetinaFace
097_YAMNet
098_SPICE
099_efficientnet_anomaly_detection_segmentation
100_HiFill
101_arbitrary_image_stylization
102_Coconet
103_EfficientDet_lite
104_DeeplabV3-plus
105_MobileStyleGAN
106_WHENet
107_SFA3D
108_HAWP
109_Selfie_Segmentation
110_L-CNN
111_SRN-Deblur
112_DeblurGANv2
113_Anime2Sketch
114_Two-branch-dehazing
115_MoveNet
116_DroNet
117_DTLN
118_Speech-enhancement
119_M-LSD
120_FRILL
121_GPT2_DistillGPT2
122_DistillBert
123_YOLOR
124_person-attributes-recognition-crossroad-0230
125_person-attributes-recognition-crossroad-0234
126_person-attributes-recognition-crossroad-0238
127_dino
129_SCRFD
131_CFNet
132_YOLOX
133_Real-ESRGAN
134_head-pose-estimation-adas-0001
135_CoEx
136_road-segmentation-adas-0001
137_MoveNet_MultiPose
138_BackgroundMattingV2
139_PSD-Principled-Synthetic-to-Real-Dehazing-Guided-by-Physical-Priors
140_Ultra-Fast-Lane-Detection
141_lanenet-lane-detection
142_HITNET
143_RAPiD
144_YuNet
145_text_detection_db
146_FastDepth
147_PackNet-SfM
148_LapDepth
149_depth_estimation
150_MobileStereoNet
151_object_detection_mobile_object_localizer
152_DeepLPF
153_MegaDepth
154_driver-action-recognition-adas-0002-encoder
155_driver-action-recognition-adas-0002-decoder
156_MobileHumanPose
157_3DMPPE_POSENET
158_HR-Depth
159_EPCDepth
160_msg_chn_wacv20
161_EigenGAN-Tensorflow
162_PyDNet
163_MST_inpainting
164_MADNet
165_RealtimeStereo
166_Insta-DM
167_LSTR
168_DPT
169_spaghettinet_edgetpu
170_Learning-to-See-in-the-Dark
171_Fast-SRGAN
172_Real-Time-Super-Resolution
173_MVDepthNet
174_PP-PicoDet
175_face-recognition-resnet100-arcface-onnx
176_StableLLVE
177_BirdNET-Lite
178_vehicle-detection-0200
179_person-detection-0202
181_models_edgetpu_checkpoint_and_tflite_vision_segmentation-edgetpu_tflite_default_argmax
182_models_edgetpu_checkpoint_and_tflite_vision_segmentation-edgetpu_tflite_fused_argmax
183_pedestrian-detection-adas-0002
184_pedestrian-and-vehicle-detector-adas-0001
185_person-vehicle-bike-detection-crossroad-0078
186_person-vehicle-bike-detection-crossroad-1016
187_vehicle-attributes-recognition-barrier-0039
188_vehicle-attributes-recognition-barrier-0042
189_vehicle-license-plate-detection-barrier-0106
190_person-detection-asl-0001
191_anti-spoof-mn3
192_open-closed-eye-0001
193_CoCosNet
194_face_recognizer_fast
195_person_reid_youtu
196_human_segmentation_pphumanseg
197_yolact-resnet50-fpn
198_YOLOF
199_NSFW
200_AGLLNet
201_CityscapesSOTA
202_stereoDNN
203_SRHNet
204_HINet
205_MBLLEN
206_Matting
207_GLADNet
208_SAPNet
209_MSBDN-DFF
210_SC_Depth_pl
211_Lac-GwcNet
212_GFN
213_TBEFN
214_EnlightenGAN
215_AOD-Net
216_Zero-DCE-TF
217_RUAS
218_DSLR
219_StereoNet
220_HEP
221_YOLACT-PyTorch
222_LFT
223_DA_dahazing
224_Y-net
225_NTIRE-2021-Dehazing-Two-branch
226_CascadeTableNet
227_face-detection-adas-0001
228_Fast-SCNN
229_DexiNed
230_Single-Image-Desnowing-HDCWNet
231_DRBL
232_MIMO-UNet
233_HRNet-for-Fashion-Landmark-Estimation
234_FBCNN
235_W-Stereo-Disp
236_A-TVSNet
237_piano_transcription
238_SUIM-Net
239_CasStereoNet
240_BSRGAN
241_SCL-LLE
242_RobustVideoMatting
243_Zero-DCE-improved
244_FINNger
245_GLPDepth
246_SqueezeSegV3
247_PoseC3D
248_MS-G3D
249_Real-CUGAN
250_Face-Mask-Detection
251_AU-GAN
252_RAFT
253_TransWeather
254_FullSubNet-plus
255_FILM
256_SFace
257_PiCANet
258_TinyHITNet
259_Emotion_FERPlus
260_KP2D
261_EfficientDerain
262_ByteTrack
263_EgoNet
264_object_localization_network
265_PoseAug
266_ACVNet
267_LIOT
268_Lite-HRNet
269_Higher-HRNet
270_HWMNet
271_HRNet
272_CSFlow
273_OPN
274_DeepFillv2
275_FD-GAN
276_HybridNets
277_EDN-GTM
278_DWARF
279_F-Clip
280_GASDA
281_IMDN
282_face_landmark_with_attention
283_UIE-WD
284_CREStereo
285_Decoupled-Low-light-Image-Enhancement
286_SCI
287_Topformer
288_perceptual-reflection-removal
289_face-detection-0100
290_AdaFace
291_SeAFusion
292_Graft-PSMNet
293_Lightweight-Head-Pose-Estimation
294_FSRE-Depth
295_SparseInst
296_MGNet
297_GazeNet
297x_↑↑↑_OpenVINO_2021.4.582_↓↓↓_OpenVINO_2022.1.0
298_DEQ-Flow
299_DGNet
300_6DRepNet
301_YOLOv4_Face
302_SLPT
303_FAN
304_SynergyNet
305_DMHead
306_GMFlowNet
307_YOLOv7
308_FastestDet
309_ImageForensicsOSN
310_attentive-gan-derainnet
311_HHP-Net
312_NeWCRFs
313_IS-Net
314_PyDNet2
315_Illumination-Adaptive-Transformer
316_night_enhancement
317_MobileOne
318_pips
319_ACR-Loss
320_Dehamer
321_DID-M3D
322_YOLOv7_Head
323_Stripformer
324_Ultra-Fast-Lane-Detection-v2
325_DehazeFormer
326_YOLOPv2
327_EMDC
328_Stable_Diffusion
329_YOLOX-PAI
330_MOSAIC
332_CrowdDet
333_E2Pose
334_DAMO-YOLO
335_PIDNet
336_PP-YOLOE-Plus
337_FreeYOLO
338_Fast-ACVNet
339_DeepLSD
340_Dense-Head-Pose-Estimation
341_YOLOv6
342_ALIKE
343_PP-MattingV2
344_XYDeblur
346_facial_expression_recognition_mobilefacenet
347_RGBX_Semantic_Segmentation
348_Bread
349_PMN
350_P-STMO
351_RFDN
352_MAXIM
353_ShadowFormer
354_DEA-Net
355_MHFormer
356_EdgeYOLO
357_Unimatch
358_CGI-Stereo
359_MSPFN
360_PARSeq
361_KBNet
362_ZoeDepth
363_YOLO-6D-Pose
364_IGEV
365_HTNet
366_text_recognition_CRNN
367_FLW-Net
368_C2PNet
369_Segment_Anything
370_Semantic-Guided-Low-Light-Image-Enhancement
371_Lite-Mono
372_URetinex-Net
373_LiteTrack
374_LaneSOD
375_SCANet
376_RT-DETR
377_DRSformer
378_P2PNet_tfkeras
379_PP-LCNetV2
380_Skin-Clothes-Hair-Segmentation-using-SMP
381_Whisper
382_Light-SERNet
383_DirectMHP
384_TCMonoDepth
385_PairLIE
386_naruto_handsign_detection
387_YuNetV2
388_LightGlue
389_WGWS-Net
390_BlendshapeV2
391_MagicTouch
392_STCFormer
393_RTMPose_WholeBody
394_RTMPose_Animal
395_FFNet
396_MixDehazeNet
397_MiDaSv3.1
398_L2CS-Net
399_RetinaFace_MobileNetv2
400_CSRNet
401_CLRerNet
402_trt_pose
403_trt_pose_hand
404_HDR-Transformer
405_Ear_Segmentation
406_DeDoDe
407_Generalizing_Gaze_Estimation
408_UAED
409_nighttime_dehaze
410_FaceMeshV2
411_UDR-S2Former_deraining
412_pytorch_cpn
413_DocShadow
414_STAR
415_High-frequency-Stereo-Matching-Network
416_GeoNet
417_PopNet
418_Diffusion-Low-Light
419_MobileViT_v1_v2
420_Gold-YOLO-Hand
421_Gold-YOLO-Head
422_Gold-YOLO-Head-Hand
423_6DRepNet360
424_Gold-YOLO-Body
425_Gold-YOLO-Body-Head-Hand
426_YOLOX-Body-Head-Hand
427_RTMPose_Hand
428_ISR
429_OSNet
430_FastReID
431_NITEC
432_face-reidentification-retail-0095
433_FaceBoxes.PyTorch
434_YOLOX-Body-Head-Hand-Face
435_MobileFaceNet
436_Peppa_Pig_Face_Landmark
437_PIPNet
438_PeCLR
439_Depth-Anything
440_ViTPose
441_YOLOX-Body-Head-Hand-Face-Dist
442_YOLOX-Body-Head-Face-HandLR-Dist
443_Opal23_HeadPose
444_YOLOX-Foot-Dist
445_YOLOX-Body-Head-Face-HandLR-Foot-Dist
446_YOLOX-Body-With-Wheelchair
447_YOLOX-Wholebody-with-Wheelchair
448_YOLOX-Eye-Nose-Mouth-Ear
449_YOLOX-WholeBody12
450_YOLOv9-Wholebody-with-Wheelchair
451_DAN
452_FairFace
453_FairDAN
454_YOLOv9-Wholebody13
455_YOLOv9-Gender
456_YOLOv9-Wholebody15
457_YOLOv9-Wholebody17
458_YOLOv9-Discrete-HeadPose-Yaw
459_YOLOv9-Wholebody25
460_RT-DETRv2-Wholebody25
461_YOLOv9-Phone
462_Gaze-LLE
- demo
- LICENSE
- README.md
- download.sh
- url.txt
463_YOLOv9-Shoulder-Elbow-Knee
464_YOLOv9-Wholebody28
465_DEIM-Wholebody28
466_People_Segmentation
467_Human_Parsing
999_media
third_party
.gitignore
.gitmodules
LICENSE
README.md
log-cleaner.sh

462_Gaze-LLE

Name		Name	Last commit message	Last commit date
parent directory ..
demo		demo
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
url.txt		url.txt

README.md

462_Gaze-LLE

Gaze-LLE provides a streamlined gaze architecture that learns only a lightweight gaze decoder on top of a frozen, pretrained visual encoder (DINOv2). Gaze-LLE learns 1-2 orders of magnitude fewer parameters than prior works and doesn't require any extra input modalities like depth and pose!

The reason the processing speed of the demo video appears to be around 30ms is because the heatmap rendering process using Pillow is slow. If you stop the heatmap rendering process by pressing the A key on the keyboard, the processing speed will increase to around 17ms. Of the 17ms, 10ms is the inference time for YOLOv9-E. If it is important to increase processing speed, it would be better to use YOLOv9-N or YOLOv9-T.

Single person test - gazelle_dinov2_vitb14_inout_1x3x448x448_1xNx4.onnx + ONNX-TensorRT

output_tensorrt.mp4
Multi person test - gazelle_dinov2_vitb14_inout_1x3x448x448_1xNx4.onnx + ONNX-TensorRT

output_perf_multiperson.mp4
Gaze estimation test when facing backwards

output_backword.mp4
Switch Heatmap mode (A key)

output_perf.mp4

1. Test

Python 3.10
onnx 1.16.1+
onnxruntime-gpu v1.18.1 (TensorRT Execution Provider Enabled Binary. See: onnxruntime-gpu v1.18.1 + CUDA 12.5 + TensorRT 10.2.0 build (RTX3070)
opencv-contrib-python 4.10.0.84+
numpy 1.24.3
TensorRT 10.2.0.19-1+cuda12.5

Pillow or pillow-simd

# Common ############################################
pip install opencv-contrib-python numpy onnx

# For ONNX ##########################################
pip uninstall onnxruntime onnxruntime-gpu

pip install onnxruntime
or
pip install onnxruntime-gpu

Demonstration of models with built-in post-processing (Float32/Float16)

usage: demo_yolov9_onnx_gazelle.py
[-h]
[-om OBJECT_DETECTION_MODEL]
[-gm GAZELLE_MODEL]
(-v VIDEO | -i IMAGES_DIR)
[-ep {cpu,cuda,tensorrt}]
[-it {fp16,int8}]
[-dvw]
[-dwk]
[-ost OBJECT_SOCRE_THRESHOLD]
[-ast ATTRIBUTE_SOCRE_THRESHOLD]
[-cst CENTROID_SOCRE_THRESHOLD]
[-dnm]
[-dgm]
[-dlr]
[-dhm]
[-dah]
[-drc [DISABLE_RENDER_CLASSIDS ...]]
[-oyt]
[-bblw BOUNDING_BOX_LINE_WIDTH]

options:
  -h, --help
    show this help message and exit
  -om OBJECT_DETECTION_MODEL, --object_detection_model OBJECT_DETECTION_MODEL
    ONNX/TFLite file path for YOLOv9.
  -gm GAZELLE_MODEL, --gazelle_model GAZELLE_MODEL
    ONNX/TFLite file path for Gaze-LLE.
  -v VIDEO, --video VIDEO
    Video file path or camera index.
  -i IMAGES_DIR, --images_dir IMAGES_DIR
    jpg, png images folder path.
  -ep {cpu,cuda,tensorrt}, --execution_provider {cpu,cuda,tensorrt}
    Execution provider for ONNXRuntime.
  -it {fp16,int8}, --inference_type {fp16,int8}
    Inference type. Default: fp16
  -dvw, --disable_video_writer
    Disable video writer. Eliminates the file I/O load associated with automatic recording to MP4.
    Devices that use a MicroSD card or similar for main storage can speed up overall processing.
  -dwk, --disable_waitKey
    Disable cv2.waitKey(). When you want to process a batch of still images,
    disable key-input wait and process them continuously.
  -ost OBJECT_SOCRE_THRESHOLD, --object_socre_threshold OBJECT_SOCRE_THRESHOLD
    The detection score threshold for object detection. Default: 0.35
  -ast ATTRIBUTE_SOCRE_THRESHOLD, --attribute_socre_threshold ATTRIBUTE_SOCRE_THRESHOLD
    The attribute score threshold for object detection. Default: 0.70
  -cst CENTROID_SOCRE_THRESHOLD, --centroid_socre_threshold CENTROID_SOCRE_THRESHOLD
    The heatmap centroid score threshold. Default: 0.30
  -dnm, --disable_generation_identification_mode
    Disable generation identification mode. (Press N on the keyboard to switch modes)
  -dgm, --disable_gender_identification_mode
    Disable gender identification mode. (Press G on the keyboard to switch modes)
  -dlr, --disable_left_and_right_hand_identification_mode
    Disable left and right hand identification mode. (Press H on the keyboard to switch modes)
  -dhm, --disable_headpose_identification_mode
    Disable HeadPose identification mode. (Press P on the keyboard to switch modes)
  -dah, --disable_attention_heatmap_mode
    Disable Attention Heatmap mode. (Press A on the keyboard to switch modes)
  -drc [DISABLE_RENDER_CLASSIDS ...], --disable_render_classids [DISABLE_RENDER_CLASSIDS ...]
    Class ID to disable bounding box drawing. List[int]. e.g. -drc 17 18 19
  -oyt, --output_yolo_format_text
    Output YOLO format texts and images.
  -bblw BOUNDING_BOX_LINE_WIDTH, --bounding_box_line_width BOUNDING_BOX_LINE_WIDTH
    Bounding box line width. Default: 2

2. Cited

I am very grateful for their excellent work.

Gaze-LLE

https://github.com/fkryan/gazelle

@article{ryan2024gazelle,
  author       = {Ryan, Fiona and Bati, Ajay and Lee, Sangmin and Bolya, Daniel and Hoffman, Judy and Rehg, James M},
  title        = {Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders},
  journal      = {arXiv preprint arXiv:2412.09586},
  year         = {2024},
}

3. License

MIT

4. Ref

ONNX custom: https://github.com/PINTO0309/gazelle
Head Detection model 1: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/459_YOLOv9-Wholebody25
Head Detection model 2: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/460_RT-DETRv2-Wholebody25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

462_Gaze-LLE

462_Gaze-LLE

README.md

462_Gaze-LLE

1. Test

2. Cited

3. License

4. Ref

Files

462_Gaze-LLE

Directory actions

More options

Directory actions

More options

Latest commit

History

462_Gaze-LLE

Folders and files

parent directory

README.md

462_Gaze-LLE

1. Test

2. Cited

3. License

4. Ref