CLIP-based Context-aware Academic Emotion Recognition

This repo is the official implementation for CLIP-based Context-aware Academic Emotion Recognition[arXiv]. The paper has been accepted to ICCV 2025.

Introduction

In this paper, we propose CLIP-CAER, a context-aware academic emotion recognition method based on CLIP. By leveraging contextual information from learning scenarios, our method significantly improves the model’s ability to recognize students’ learning states (i.e., focused or distracted). Notably, it achieves an approximately 20% improvement in accuracy for the distraction category.
Our framework, similar to the vision-language model CLIP, primarily consists of two components: a text block and a visual block. In the text block, for each academic emotion category, a fixed text is pre-generated to describe the associated facial expressions and learning contexts, complemented by a learnable text prompt to capture additional relevant details during training. Subsequently, by inputting the fixed text and the learnable text prompt together into the CLIP text encoder, we obtain a text feature token for each emotion category. Given an input video, the visual block uses the CLIP image encoder to separately extract facial expression features and context features from each video frame. These visual features are then processed through a temporal encoder module to capture their sequential relationships, resulting in a visual feature token that effectively represents both the facial expression and context information within the video. Given the aligned visual and text feature spaces in the pre-trained CLIP model, we classify the input video by calculating the similarity between its visual feature token and the text feature tokens for each academic emotion category.

Weights Download

We provide the model weights trained by the method in this paper, which can be downloaded here.

Important: By downloading, accessing, or using the model weights, you agree to be bound by the terms and conditions of the license agreement located in the file weights/LICENSE-WEIGHTS.md, which permits use only for non-commercial research purposes. Please read the license carefully before downloading or using the weights.

If you do not agree to the non-commercial use terms of the license, please do not download or use the model weights.

Performance

Visualizations

Environment

The code is developed and tested under the following environment:

Python 3.8
PyTorch 2.2.2
CUDA 12.4

conda create -n clip-caer python=3.8
conda activate clip-caer
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121

Usage

Training

bash train.sh

Evaluation

bash valid.sh

Citations

If you find our paper useful in your research, please consider citing:

@InProceedings{Zhao_2025_ICCV,
    author    = {Zhao, Luming and Xuan, Jingwen and Lou, Jiamin and Yu, Yonghui and Yang, Wenwu},
    title     = {Context-Aware Academic Emotion Dataset and Benchmark},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year      = {2025}
}

Acknowledgment

Our codes are mainly based on DFER-CLIP. Many thanks to the authors!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataloader		dataloader
imgs		imgs
models		models
utils		utils
weights		weights
LICENSE		LICENSE
README.md		README.md
main.py		main.py
train.sh		train.sh
trainer.py		trainer.py
valid.sh		valid.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-based Context-aware Academic Emotion Recognition

Introduction

Weights Download

Performance

Visualizations

Environment

Usage

Training

Evaluation

Citations

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLIP-based Context-aware Academic Emotion Recognition

Introduction

Weights Download

Performance

Visualizations

Environment

Usage

Training

Evaluation

Citations

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages