[AAAI 2026 Oral] See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection

Official Repository for "See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection".

Accepted at AAAI 2026 Oral🔥

by YuEun Lee, Jung Uk Kim*

(* indicate corresponding author)

🛠️ Installation

0. Clone this repository

git clone https://github.com/VisualAIKHU/SRF.git
cd SRF

1. Prepare datasets

QVHighlights

Download the official feature files for the QVHighlights dataset from Moment-DETR.
Download moment_detr_features.tar.gz(8GB) and extract it under the ../features directory.
Additionally, You can download the caption_features_internvl.tar.gz.

TVSum

Download the feature files from UMT.
Additionally, You can download the TVSum_caption_features.tar.gz.

Charades-STA

Download the feature files from UMT.
Additionally, You can download the Charades-STA_caption_features.tar.gz.

2. Install requirements

condda create -n srf python=3.11.8
conda activate srf

pip install -r requirements.txt

🚀 Training

QVHighlights

You can train the model using only video features or both video and audio features by running the shell below.

bash srf/scripts/train.sh
bash srf/scripts/train_audio.sh

You need to modify reseults_root, exp_id and feat_root before running the shell and make sure each feature directory(v_feat_dirs, t_feat_dir and c_feat_dir) is set correctly.

📊 Evaluation

QVHighlights

You can generate hl_val_submission.jsonl and hl_test_submission.jsonl after training by running the shell below.

bash srf/scripts/inference.sh {results_path}/model_best.ckpt 'val'
bash srf/scripts/inference.sh {results_path}/model_best.ckpt 'test'

where results_path is the path to the saved checkpoint.

For more details for submission, check standalone_eval/README.md

🔖 Citation

@article{lee2025see,
  title={See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection},
  author={Lee, YuEun and Kim, Jung Uk},
  journal={arXiv preprint arXiv:2511.22906},
  year={2025}
}

💛 Acknowledgement

Our codes benefits from the excellent TR-DETR.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
checkpoint		checkpoint
data		data
srf		srf
standalone_eval		standalone_eval
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[AAAI 2026 Oral] See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection

🛠️ Installation

0. Clone this repository

1. Prepare datasets

QVHighlights

TVSum

Charades-STA

2. Install requirements

🚀 Training

QVHighlights

📊 Evaluation

QVHighlights

🔖 Citation

💛 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

[AAAI 2026 Oral] See, Rank and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection

🛠️ Installation

0. Clone this repository

1. Prepare datasets

QVHighlights

TVSum

Charades-STA

2. Install requirements

🚀 Training

QVHighlights

📊 Evaluation

QVHighlights

🔖 Citation

💛 Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages