# Creating Binary Masks Using Image Segmentation

> Create binary mask for objects in an image using image segmentation

A tobias.fyi blog post by...you guessed it: _Tobias Reaper_.

---

Notebook Outline

* [Resources](#resources)
* [Outline](#outline)
* [Goal](#goal)
  * [Computer vision](#computer-vision)
  * [Mask R-CNN](#mask-r-cnn)

---

## Resources

* [Mask R-CNN](https://github.com/matterport/Mask_RCNN/)
* Other models / frameworks
  * [CenterMask2](https://github.com/youngwanLEE/centermask2)
  * [Yolact](https://github.com/dbolya/yolact)

---

## Outline

* Intro
  * Goal
  * Solution
    * Computer vision models
    * Mask R-CNN
* Content
  * Using pre-trained weights
  * Training a custom model
  * Extracting the data from the predictions:
    * Masks
    * Bounding boxes
* Conclusion

---

## The Problem

Selecting and separating parts of an image can be a tedious, time-consuming process. Anyone who's done a fair amount of tinkering with image manipulation using a program like Photoshop knows the struggle.

As an example, say I'd like to cut out a person from a photo in order to "Photoshop" that person into a different image. A classic example is to put a friend who missed a get-together into the group photo from the event (I'm looking at you, BeeTee... =P).

[[ImageBlock :: Missing Friend]]

To accomplish this, I'd spend anywhere from a few minutes to an hour outlining that person in the original image, mostly by hand. The time investment depends on how easily-separable that person is from the rest of the image, how accurate I want the cut to be, and what tools are available to me. Regarding that last point, the magicians at Adobe have done some rather impressive black magic with Photoshop, giving users very quick and very effective methods for selecting parts of an image.

A mask is basically a method of distinguishing/selecting/separating pixels. That outline that I made to select my friend in Photoshop is an example of a mask. A binary mask is a method of masking which uses a two-tone color scheme, usually black and white, to indicate the different area(s) of an image. By overlaying a binary mask on top of the original image, the boundaries between the two colors can be used to affect the different areas of the image differently, whether that is making pixels transparent (removing them) or applying some sort of effect or transformation.

The goal of this article is to describe a possible method of creating binary masks for one or more objects in an image by using an image segmentation (computer vision) model. Rather than drawing binary masks completely by hand, or use proprietary software like Photoshop, I will show you how to automate the process using completely free, open-source tools.

> [[ImageBlock :: Binary Mask of Missing Friend - half regular image, half binary mask]]

### Other use-cases

Another important use-case for something like this is to apply different effects or transformations to the foreground and background of the image, making it possible to easily manipulate background pixels without affecting the foreground. For example, blurring the background to give a greater depth of field effect, or removing the color from the background to make the foreground really pop out.

---

## Solution

### Computer vision

In order to generate binary masks based on the content of the image, the algorithm must be somewhat intelligent. That is, it must be able to process the image in such a way that it can recognize where the foreground is and draw a polygon around it with some degree of accuracy.

Luckily, there are a number of machine learning models that will do just that. The field is called Computer Vision. More specifically, the models described in this article are known as image segmentation models.

Don't worry if you don't have any experience with this type of thing, or even if you don't want to _get_ experience with it. Modern machine learning tooling makes it incredibly quick and easy to get a model up and predicting with pre-trained weights.

One caveat though: the pre-trained models will do great with the objects that were in their training data. Depending on what the object in the foreground is that you are trying to extract, you may or may not need to extend the model with a custom dataset and training session.

### Mask R-CNN

Although the primary framework used in this article is Matterport's TensorFlow-based implementation of a Mask R-CNN, this process should be translatable to other image segmentation models as well. In fact, at the end of this post I go over using a PyTorch-based image segmentation framework called Detectron2 to accomplish the exact same thing.