Skip to content

This repository contains the code and resources for a facial expression recognition system developed as a part of my graduation project

License

Notifications You must be signed in to change notification settings

yuuIind/Facial-Expression-Recognition-Using-Vision-Transformers-and-Convolutional-Neural-Networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Facial-Expression-Recognition-Using-Vision-Transformers-and-Convolutional-Neural-Networks

This repository contains the implementation and resources for a facial expression recognition system developed as a part of my graduation project at Yeditepe University. The project explores the use of efficient transformers models for vision tasks in the context of recognizing facial expressions from images.

Abstract

Facial expression recognition (FER) is a subtask of emotion recognition that focuses on categorizing human expressions from face images. It has numerous practical applications, including security, advertising, healthcare, and recommendation systems. Recent advancements in deep learning have led to significant progress in the field. This project tries to find a lightweight and efficient method that combines convolutional neural networks (CNNs) and transformers to achieve high performance for the FER task. The proposed approach tries to improve on vision transformer on by combining a backbone network as a feature extractor to obtain fine-level features from images. Various variants were tested on the AffectNet database to find an effective approach, resulting in %55.54 accuracy on 8 class with using only 5.67M parameters.

RESEARCH OBJECTIVE

The goal of this project is to develop a lightweight and efficient method for facial expression recognition (FER). The proposed approach aims to improve the performance of FER systems by incorporating a backbone network into the vision transformer. The goal is to achieve high performance while using fewer parameters, making the model more practical for real-world applications. Specifically, the project aims to investigate the effectiveness of combining CNNs and transformers for FER and evaluate different backbone networks as feature extractors. The goal is to experiment with different architectures and training techniques to improve the performance of the proposed method. The proposed method will be evaluated on the AffectNet1 database and compared to state-of-the-art approaches to analyse its strengths and weaknesses.

Project Overview

This project tries to find a lightweight and efficient method to achieve high performance for the FER task. It builds on the methods proposed by Hassani et al.2. Backbone Network for extracting fine level features.Patch extractor is applied to extract relevant features from facial patches. Transformer encoder is implemented as described in 2. Training and Evaluation is conducted on AffectNet1.

Results

The proposed method achieves an accuracy of 55.54% on the 8-class facial expression recognition task using the AffectNet database, while utilizing only 5.67M parameters.

Top-1 test set accuracy comparisons. * variants trained without oversampling. For more, please, see here

Model Performance #Params link
VCCT-1 53.49% 11.26M ckpt
VCCT-2 53.41% 11.71M ckpt
*MCCT-1 48.19% 3.00M ckpt
MCCT-1 53.24% 3.00M ckpt
*MCCT-2 44.16% 3.44M ckpt
MCCT-2 54.19% 3.44M ckpt
*MCCT-3 41.49% 3.89M ckpt
MCCT-3 51.21% 3.89M ckpt
*MCCT-6 45.26% 5.23M ckpt
MCCT-6 52.79% 5.23M ckpt
*MCCT-7 40.79% 5.67M ckpt
MCCT-7 55.54% 5.67M ckpt

Upcoming Updates

Currently only notebooks and some scripts are shared. A more structured model and training code will be released soon!

License

This project is licensed under the MIT License.

About

This repository contains the code and resources for a facial expression recognition system developed as a part of my graduation project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published