This repository presents the implementation of Image Segmentation models, a task in the field of Computer Vision that involves classifying each pixel in an image into a category or a specific instance of a category. This task can be divided into three types:
-
Semantic segmentation: Assigns a class label to each pixel in an image without distinguishing between different instances of the same class.
-
Instance segmentation: Goes beyond Object Detection by labeling each pixel that belongs to a detected object with a specific class and instance. In this way, the models not only provide the coordinates of the bounding box, along with class labels and confidence scores, but also generate binary masks for each detected instance in an image.
-
Panoptic segmentation: Combines semantic segmentation and instance segmentation by assigning each pixel in an image both a class and an instance label. This allows for a detailed segmentation of complex scenes.
Currently, image segmentation is used in a wide range of highly important fields. It plays a key role in medicine by helping identify and analyze tissues and tumors in diagnostic images; in autonomous driving, where it aids in detecting and classifying roads, pedestrians, and obstacles; in environmental monitoring, by using satellite images to classify different types of terrain and detect changes; in robotics, enabling precise object localization and manipulation; and in augmented reality and video editing, improving the integration of digital elements into real-world scenes.
Some of the models in this repository are built and trained from scratch using Convolutional Neural Networks (CNNs). In other cases, fine-tuning is applied through transfer learning, making use of high-performing pretrained models such as Transformers and YOLO11-seg, trained on large datasets. These projects use frameworks like TensorFlow, PyTorch, Hugging Face, and Ultralytics.
In addition, training and fine-tuning are carried out using hardware resources such as TPUs or GPUs available in Google Colab, depending on the project's requirements.
Most of the notebooks in this repository include data augmentation techniques applied to the training set to improve the model's generalization ability. These techniques are implemented manually using libraries like Albumentations or automatically (e.g., with YOLO11). Strategies such as callbacks and learning rate schedulers are also used to prevent overfitting and achieve optimal performance.
Below are the evaluation results of the models implemented to date. In cases where the validation or test set is unavailable or not publicly accessible, the evaluation was performed exclusively on the available split.
Dataset | Domain | Model | Eval. Set | |||
---|---|---|---|---|---|---|
LaRS | Maritime obstacle detection | Mask2Former-Swin-Tiny | 0.564 | 0.791 | 0.686 | Validation |
Dataset | Domain | Model | Eval. Set | ||
---|---|---|---|---|---|
SBD | General object segmentation | YOLO11l-seg | 0.895 | 0.719 | Validation |
PanNuke | Histopathology (nucleus segmentation) | YOLO11s-seg | 0.700 / 0.692 | 0.464 / 0.455 | Validation / Test |
USIS10K | Underwater scene analysis | YOLO11l-seg | 0.634 / 0.635 | 0.490 / 0.495 | Validation / Test |
BDD100K | Autonomous driving | YOLO11m-seg | 0.464 | 0.266 | Validation |
Dataset | Domain | Model | Eval. Set | ||
---|---|---|---|---|---|
UW-Madison GI Tract | Medical imaging (gastrointestinal tract) | SegFormer-B3 | 0.900 | 0.946 | Validation |
LandCover.ai | Aerial landβcover classification | SegFormer-B3 | 0.870 / 0.872 | 0.928 / 0.929 | Validation / Test |
CamVid | Autonomous driving | SegFormer-B2 | 0.869 | 0.927 | Validation |
Carvana | Binary segmentation of cars | U-Net | 0.995 | 0.997 | Validation |
CUB-200-2011 | Binary segmentation of 200 bird species | ConvNeXt-Base U-Net | 0.955 | 0.977 | Test |
Caltech-101 | Binary segmentation of 101 object classes | ConvNeXt-Base U-Net | 0.932 | 0.965 | Test |