Feature Aggregation Network (FANet) is a scene segmentation model which is designed for resource constrained embedded devices. By leveraging a new multi-scale feature fusion technique, it maps rich semantic features at different labels and improves model performance. This model has less parameters and FLOPS compare to many existing real-time scene segmentation model and it also produces better prediction accuracy. Model performance is evaluated on cityscapes and CamVid dataset. FANet can handle high resolution input images and can use less memory footprint. To compare our model performance with other existing semantice segmentation models, we also trained FAST-SCNN, bayes-segnet, Deeplab and separable UNet models. Separable UNet is basically an UNet model in which all convolution layers are replaced by separable convolution layers to reduce number of parameters and FLOPS. In DeepLab model, we have used exception as a backbone network. Separable UNet and Deeplab are off-line segmentation model whereas FAST-SCNN and Bayes-segnet are real-time segmentation model. Our experiment exhibits that FANet outperforms than these models. We have achieved 65.9% class mean IoU and 83.6% category mean IoU on Cityscapes vlidation dataset. Here, we are uploading scripts for existing models and our results.
To refer this paper, visit the foloowing link: https://ieeexplore.ieee.org/abstract/document/9363370
For this research work, we have used cityscapes benchmark datasets and CamVid dataset.
- Cityscapes - To access this benchmark, user needs an account. https://www.cityscapes-dataset.com/downloads/
- CamVid - To access this benchmark, visit this link: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
To understand the metrics used for model performance evaluation, please refer here: https://www.cityscapes-dataset.com/benchmarks/#pixel-level-results
For performance comparison, we trained few off-line and real-time existing models under same configuration and compared their performance with FANet. Some existing models require the use of ImageNet pretrained models to initialize their weights. Details will be given soon.
- TensorFlow 2.1
- This requires CUDA >= 10.1
- TensorRT will require an NVIDIA GPU with Tensor Cores.
- Horovod framework (for effective utilization of resources and speed up GPUs)
- Keras 2.3.1
- Python >= 3.7
We trained our model with different input resolutions for cityscapes dataset. Cityscapes provides 1024 x 2048 px resolution images. We mainly focus full resolution of cityscapes images. For CamVid dataset, we use 512 x 1024px resolution altough original image size is 720 x 960px. We trained other models with different resolutions of cityscapes images.
IoU Over Classes on Validation Set
classes | IoU | nIoU |
---|---|---|
road | 0.943 | nan |
sidewalk | 0.761 | nan |
building | 0.876 | nan |
wall | 0.444 | nan |
fence | 0.433 | nan |
pole | 0.434 | nan |
traffic light | 0.511 | nan |
traffic sign | 0.595 | nan |
vegetation | 0.889 | nan |
terrain | 0.546 | nan |
sky | 0.908 | nan |
person | 0.667 | 0.396 |
rider | 0.437 | 0.228 |
car | 0.899 | 0.787 |
truck | 0.552 | 0.196 |
bus | 0.650 | 0.365 |
train | 0.451 | 0.197 |
motorcycle | 0.395 | 0.186 |
bicycle | 0.631 | 0.351 |
Score Average | 0.633 | 0.338 |
IoU Over Categories
categories | IoU | nIoU |
---|---|---|
flat | 0.955 | nan |
construction | 0.882 | nan |
object | 0.529 | nan |
nature | 0.891 | nan |
sky | 0.908 | nan |
human | 0.708 | 0.426 |
vehicle | 0.878 | 0.756 |
Score Average | 0.822 | 0.591 |
To see the performance of FAST-SCNN on test dataset, you can view the .csv file from here: (https://github.com/tanmaysingha/FANet/blob/master/Predicted_images/FAST_SCNN_Test_Results_Evaluated_By_Cityscapes_Server.csv)
IoU Over Classes on Validation Set of Cityscapes
classes | IoU | nIoU |
---|---|---|
road | 0.962 | nan |
sidewalk | 0.751 | nan |
building | 0.893 | nan |
wall | 0.527 | nan |
fence | 0.473 | nan |
pole | 0.470 | nan |
traffic light | 0.535 | nan |
traffic sign | 0.646 | nan |
vegetation | 0.898 | nan |
terrain | 0.552 | nan |
sky | 0.925 | nan |
person | 0.702 | 0.459 |
rider | 0.456 | 0.272 |
car | 0.909 | 0.785 |
truck | 0.470 | 0.201 |
bus | 0.704 | 0.358 |
train | 0.615 | 0.311 |
motorcycle | 0.388 | 0.186 |
bicycle | 0.652 | 0.403 |
Score Average | 0.659 | 0.372 |
IoU Over Categories on validation set of Cityscapes
categories | IoU | nIoU |
---|---|---|
flat | 0.967 | nan |
construction | 0.894 | nan |
object | 0.556 | nan |
nature | 0.901 | nan |
sky | 0.925 | nan |
human | 0.715 | 0.489 |
vehicle | 0.892 | 0.767 |
Score Average | 0.836 | 0.628 |
To see the performance of FANet on test dataset, you can view the .csv file from here: (https://github.com/tanmaysingha/FANet/blob/master/Predicted_images/FANet_Test_Results_evaluated_by_Cityscapes_server.csv)
cff-version: 1.2.0
If FANet is useful for your research work, please consider for citing the paper:
@inproceedings{singha2020fanet,
title={FANet: Feature Aggregation Network for Semantic Segmentation},
author={Singha, Tanmay and Pham, Duc-Son and Krishna, Aneesh},
booktitle={2020 Digital Image Computing: Techniques and Applications (DICTA)},
pages={1--8},
year={2020},
organization={IEEE}
}
Refer the following link for FANet paper: https://ieeexplore.ieee.org/abstract/document/9363370