Given a segmentation model, Universal Adversarial Perturbations Against Semantic Image Segmentation by Metzen et al. describes how to find a universal perturbation such that adding this perturbation to any image in the Cityscapes data set tends to make the model output a desired target segmentation.
This repo attempts to reproduce this result on a subset of the COCO 2017 data set on a small segmentation model.
In the table below, the first column shows a sample image and its segmentation. The segmentation has blue highlights to mark people and green highlights to mark bicycles. The middle column applies a perturbation that is trained to cause the model to output a particular static segmentation displaying the word "foo." The right column applies a perturbation that is trained to cause the model to not segment any people.
Original image | Static target | People hidden | |
---|---|---|---|
Image | |||
Segmentation | |||
Perturbation | |||
Perturbation amplified 12.75x |
The perturbation in the right column was somewhat unsuccessful—in the displayed image, ideally the segmentation would still segment the bicycle while not segmenting the person. There weren't many recognized objects in the training data, so the perturbation achieved good loss by making the model fail to segment anything rather than making the model only fail to segment people. This can probably be improved by fiddling with the loss function or data set more.
Run
conda env create -f environment.yml && conda activate adversarial-segmentation
to set up the Conda environment, then run
jupyter lab
to launch JupyterLab.