Skip to content

webber2933/ST-CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[WACV'25] Spatio-Temporal Context Prompting for Zero-Shot Action Detection (ST-CLIP)

The Official PyTorch implementation of Spatio-Temporal Context Prompting for Zero-Shot Action Detection (WACV'25).

Wei-Jhe Huang1, Min-Hung Chen2, Shang-Hong Lai1
1National Tsing Hua University, 2NVIDIA Research Taiwan

[Paper] [Website] [BibTeX]

This work proposes Spatio-Temporal Context Prompting for Zero-Shot Action Detection (ST-CLIP), which aims to adapt the pretrained image-language model to detect unseen actions. We propose the Person-Context Interaction which employs pretrained knowledge to model the relationship between people and their surroundings, and the Context Prompting module which can utilize visual information to augment the text content. To address multi-action videos, we further introduce the Interest Token Spotting mechanism to identify the visual tokens most relevant to each individual action. To evaluate the ability to detect unseen actions, we propose a comprehensive benchmark on different datasets. The experiments show that our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Installation

Please check INSTALL.md to install the environment.

Data Preparation

Please check DATA.md to prepare data needed for training or inference. We provide the data for J-HMDB first, and the others will be released as soon as possible.

Training and Inference

Please check GETTING_STARTED.md for the training/inference instructions.

Acknowledgement

We are very grateful to the authors of HIT and X-CLIP for open-sourcing their code from which this repository is heavily sourced. If your find these researchs useful, please consider citing their paper as well.

Citation

If this project helps you in your research or project, please cite this paper:

@article{huang2024spatio,
        title={Spatio-Temporal Context Prompting for Zero-Shot Action Detection},
        author={Huang, Wei-Jhe and Chen, Min-Hung and Lai, Shang-Hong},
        journal={arXiv preprint arXiv:2408.15996},
        year={2024}
      }

Licenses

Copyright © 2025, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

About

[WACV 2025] Spatio-Temporal Context Prompting for Zero-Shot Action Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages