Skip to content

VisualAIKHU/SRF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[AAAI 2026 Oral] See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection

arXiv

Official Repository for "See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection".

Accepted at AAAI 2026 Oral🔥

by YuEun Lee, Jung Uk Kim*

(* indicate corresponding author)


🛠️ Installation

0. Clone this repository

git clone https://github.com/VisualAIKHU/SRF.git
cd SRF

1. Prepare datasets

QVHighlights

TVSum

Charades-STA

2. Install requirements

condda create -n srf python=3.11.8
conda activate srf

pip install -r requirements.txt

🚀 Training

QVHighlights

You can train the model using only video features or both video and audio features by running the shell below.

bash srf/scripts/train.sh
bash srf/scripts/train_audio.sh

You need to modify reseults_root, exp_id and feat_root before running the shell and make sure each feature directory(v_feat_dirs, t_feat_dir and c_feat_dir) is set correctly.

📊 Evaluation

QVHighlights

You can generate hl_val_submission.jsonl and hl_test_submission.jsonl after training by running the shell below.

bash srf/scripts/inference.sh {results_path}/model_best.ckpt 'val'
bash srf/scripts/inference.sh {results_path}/model_best.ckpt 'test'

where results_path is the path to the saved checkpoint.

For more details for submission, check standalone_eval/README.md

🔖 Citation

@article{lee2025see,
  title={See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection},
  author={Lee, YuEun and Kim, Jung Uk},
  journal={arXiv preprint arXiv:2511.22906},
  year={2025}
}

💛 Acknowledgement

Our codes benefits from the excellent TR-DETR.

About

Official Repository for "See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection" (AAAI 2026 Oral)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages