This project is work that I did for a course on deep learning at UCSD (CSE 251B). It contains two different convolutional neural networks that I implemented - a standard fully convolutional neural network (basic_fcn.py) and U-Net (UNet.py).
These neural networks were trained to perform semantic segmentation on images taken from car dashcams. I used part of the [India Driving Dataset] (https://idd.insaan.iiit.ac.in/) to train, validate, and test the models.
Here are some labelings generated by the trained model on images in the test set. Each strip contains the actual image, ground truth labels for that image, and model predictions, in that order.
A trained model, which can be loaded from latest_model.pt
was evaluated on the test set, and gave a pixel accuracy of 0.8134 as well as the following intersection-over-union (IoU) values on each category
0. Road - 0.903
- Drivable fallback - 0.416
- Sidewalk - 0.101
- Non-drivable fallback - 0.246
- Person/animal - 0.151
- Rider - 0.253
- Motorcycle - 0.317
- Bicycle - 0.034
- Autorickshaw - 0.395
- Car - 0.461
- Truck - 0.238
- Bus - 0.179
- Vehicle Fallback - 0.246
- Curb - 0.426
- Wall - 0.221
- Fence - 0.065
- Guard Rail - 0.137
- Billboard - 0.132
- Traffic Sign - 0.021
- Traffic Light - 0
- Pole - 0.156
- Obs-str-bar-fallback - 0.130
- Building - 0.409
- Bridge/tunnel - 0.344
- Vegetation - 0.756
- Sky - 0.942
basic_fcn.py
contains the class for the basic fully convolutional network.
UNet.py
contains the class for U-Net.
dataloader.py
contains code for loading training, validation, and test datasets.
latest_model.pt
contains a trained model which can be used to make predictions on test images.
starter.py
contains code needed to train a model. If you would like to run this, you'll need to download (some portion of) the India Driving Dataset and save links to each of the training, validation, and test images to the files train.csv
, val.csv
, and test.csv
, respectively, in your working directory.
utils.py
contains functions used to compute pixel accuracy and IoU for each category, as well as a DiceLoss class, which can be used to train a model using dice loss rather than cross entropy loss.
get_weights.py
contains code for computing weights for each of the classes. These weights can be used to train a model using weighted cross entropy loss.
While the trained model has a relatively high pixel accuracy on the test set, this is likely due to several categories (road, sky, vegetation, etc.) dominating the majority of the pictures. I would like to try to improve the model's predictions on some of the other categories by using either dice loss or weighted cross entropy loss. Both of these loss functions have been implemented in the files above, but unfortunately I don't currently have access to computing power necessary to train these models.