[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
video-understanding
multimodal-learning
vision-and-language
visual-grounding
spatio-temporal-video-grounding
stvg
vidstg
hc-stvg
-
Updated
Sep 24, 2023 - Python