The Official PyTorch implementation of Spatio-Temporal Context Prompting for Zero-Shot Action Detection (WACV'25).
Wei-Jhe Huang1, Min-Hung Chen2, Shang-Hong Lai1
1National Tsing Hua University, 2NVIDIA Research Taiwan
This work proposes Spatio-Temporal Context Prompting for Zero-Shot Action Detection (ST-CLIP), which aims to adapt the pretrained image-language model to detect unseen actions. We propose the Person-Context Interaction which employs pretrained knowledge to model the relationship between people and their surroundings, and the Context Prompting module which can utilize visual information to augment the text content. To address multi-action videos, we further introduce the Interest Token Spotting mechanism to identify the visual tokens most relevant to each individual action. To evaluate the ability to detect unseen actions, we propose a comprehensive benchmark on different datasets. The experiments show that our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.
Please check INSTALL.md to install the environment.
Please check DATA.md to prepare data needed for training or inference. We provide the data for J-HMDB first, and the others will be released as soon as possible.
Please check GETTING_STARTED.md for the training/inference instructions.
We are very grateful to the authors of HIT and X-CLIP for open-sourcing their code from which this repository is heavily sourced. If your find these researchs useful, please consider citing their paper as well.
If this project helps you in your research or project, please cite this paper:
@article{huang2024spatio,
title={Spatio-Temporal Context Prompting for Zero-Shot Action Detection},
author={Huang, Wei-Jhe and Chen, Min-Hung and Lai, Shang-Hong},
journal={arXiv preprint arXiv:2408.15996},
year={2024}
}
Copyright © 2025, NVIDIA Corporation. All rights reserved.
This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.
