SFRSeg: Shared Feature Reuse Segmentation Model for Resource Constrained Devices

This is an official site for SFRSeg model. To refer this paper, visit the following link: https://www.sciencedirect.com/science/article/pii/S0031320323002571

Datasets

For this research work, we have used Cityscapes, KITTI, CamVid and Indoor objects datasets.

Cityscapes - To access this benchmark, user needs an account. For test set evaluation, user needs to upload all the test set results into the server. https://www.cityscapes-dataset.com/downloads/
KITTI - To access this benchmark, user needs an account. Like Cityscapes, user needs to submit the test set result to the evaluation server. http://www.cvlibs.net/datasets/kitti/eval_semseg.php?benchmark=semantics2015
CamVid - To access this benchmark, visit this link: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
Indoor Objects - To access this benchmark, visit this link: https://data.mendeley.com/datasets/hs5w7xfzdk/3

Class mapping

Different datasets provide different class annotations. For instance, Camvid dataset has 32 class labels. Refer this link to know about all 32 classes of Camvid: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/#ClassLabels. However, literature have shown that all the existing models are trained by 11 classes (Sky, Building, Pole, Road, Sidewalk, Tree, TrafficLight, Fence, Car, Pedestrian, Bicyclist) of Camvid dataset. Thereby, first 32 class annotations of Camvid are converted into 11 class annotations and then model is trained with 11 class annotations. To improve model performance, we also converted Cityscapes 19 class annotations to 11 class anotation and trained the model first with Cityscapes 11 class annotation, then use the pre-trained weight of Cityscapes to train the model with Camvid 11 class annotations. The following table shows the convertion of 32 classes of Camvid dataset to 11 classes.

TrainId	Camvid 11 classes	Camvid 32 classes
0	Sky	Sky
1	Building	Archway, Bridge, Building, Tunnel, Wall
2	Column_Pole	Column_Pole, Traffic Cone
3	Road	Road, LaneMkgsDriv, LaneMkgsNonDriv
4	Sidewalk	Sidewalk, ParkingBlock, RoadShoulder
5	Tree	Tree, VegetationMisc
6	TrafficLight	TrafficLight, Misc_Text, SignSymbol
7	Fence	Fence
8	Car	Car, OtherMoving, SUVPickupTruck, Train, Truck_Bus
9	Pedestrian	Animal, CartLuggagePram, Child, Pedestrain
10	Bicyclist	Bicyclist, MotorcycleScooter

Note: Void class is not included in the set of 11 classes.

The following table shows the mapping of Cityscapes 19 classes to Camvid 11 classes.

TrainId	Camvid 11 classes	Cityscapes classes
0	Sky	Sky
1	Building	Building, Wall
2	Column_Pole	Pole, Polegroup
3	Road	Road
4	Sidewalk	Sidewalk
5	Tree	Vegetation
6	TrafficLight	Traffic Light, Traffic Sign
7	Fence	Fence
8	Car	Car, Truck, Bus, Caravan
9	Pedestrian	Person
10	Bicyclist	Rider, Bicycle, MotorCycle

Metrics

To understand the metrics used for model performance evaluation, please refer here: https://www.cityscapes-dataset.com/benchmarks/#pixel-level-results

Results

We trained our model by the above mentioned benchmarks at different input resolutions. Cityscapes provides 1024 * 2048 px resolution images. We mainly focus full resolution of cityscapes images. For CamVid dataset, we use 640 * 896 px resolution altough original image size is 720 * 960 px. Similarly, we use 384 * 1280 px resolution input images for KITTI dataset although original size of input image is 375 * 1280 px. For Cityscapes and KITTI datasets, we use 19 classes, however for Camvid dataset we trained the model with 11 classes (suggested by the literature).

Dataset	No. of classes	Test mIoU	No. of parameters	FLOPs
Cityscapes	19	70.6%	1.6 million	37.9 G
KITTI	19	49.3%	1.6 million	8.9 G
Camvid	11	74.7%	1.6 million	10.2 G
Indoor Objects	9	61.4%	1.6 million	19.3 G

Cityscapes, KITTI, CamVid are urban street scenes datasets. Hence, the first three rows in the above table shows the model's performance on outdoor scenes. However, for indoor scenes analysis and indoor navigation mainly for wheelchair users and service robots, we also trained the model with Indoor objects dataset at 768 * 1408 px resolution. FLOPs count varies due to the varied input resolutions.

FPS (Frame Per Second) count

Model FPS count not only depends on model size, it also depends on input resolution, type of model and most importantly on hardware configuration. Hence, We reproduce some of the existing models based on the literature and train the model under the same system configuration for 10 epochs. We saved the checkpoint in .hdf5/.h5 format and optimzed it to tensorRT FP16 model using tensorRT 6.0.1. The code for measuring FPS and the TRT-FP16 models are uploaded under fps directory.

Cityscapes test results

The output of the test set is submitted to Cityscapes evaluation server. To view the test set result evaluated by the server, click the following link: This is an anonymous link given by the Cityscapes server. Upon the acceptance of the paper, test result will be cited by the paper and will be published in the evaluation server.

Color map of Cityscapes dataset and model prediction using validation sample

SFRSeg prediction on Cityscapes test samples

Color map of CamVid dataset and model prediction using validation sample

SFRSeg prediction on CamVid validation sample

Color map of KITTI dataset and model prediction using validation sample

SFRSeg prediction on KITTI test samples

KITTI test set results

Like Cityscapes, KITTI test set result is also sumbitted to the evaluation server. Click the following link to see the result: https://github.com/tanmaysingha/SFRSeg/blob/main/Supplementary/KITTI_Test_Results.pdf

Color map of Indoor objects dataset

SFRSeg prediction on Indoor objects scenes

Citation

cff-version: 1.2.0
If this research work is useful for your research work, then please consider for citing the paper:
@article{SINGHA2023109557,
title = {A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders},
journal = {Pattern Recognition},
volume = {140},
pages = {109557},
year = {2023},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2023.109557},
url = {https://www.sciencedirect.com/science/article/pii/S0031320323002571},
author = {Tanmay Singha and Duc-Son Pham and Aneesh Krishna}
}

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
Images		Images
Supplementary		Supplementary
fps		fps
models		models
pretrained		pretrained
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFRSeg: Shared Feature Reuse Segmentation Model for Resource Constrained Devices

Datasets

Class mapping

Metrics

Results