# 1 - Foreground Seperation (Sample Implementation)

[Please note - this is not indended to be the best approach to the problem, this is a worked example to set the problem out and how it can be solved to some extent. Better approaches may be available in and a current overview of the best in class is given.

As discussed in the introduction, the seperation of the pixels representing the cultural hertiage object in a picture
from the pixels of the background/backdrop.

The model training and classes/labels that it knows about really shows up here the difference between the domains of modern photography and for detecting artworks.

## Challenge

Extract the cultural heritage object (painting, sculpture, kimono, etc) pixels from the background pixels.

## Sample Implementation

The code below is heavily based on the example segmentation code in https://pytorch.org/vision/stable/models.html#semantic-segmentation


## Code Setup

In [1]:
pip install Pillow requests torch torchvision








[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.9 -m pip install --upgrade pip[0m


Note: you may need to restart the kernel to use updated packages.


## Model Setup (ResNet50)

First we need to retrieve the pre-trained model weights, trained on the ...

In [2]:
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image

weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()

ModuleNotFoundError: No module named '_lzma'

## Trump - Hogarth's Dog (photographed against a gradient background)

First test object, Hogarth's dog Trump in sculptural form . This is photographed against a gradient background

We need to retrieve the image via the IIIF endpoint at the V&A and transform it into a tensor, as PyTorch expects for processing.

In [42]:
from PIL import Image
import requests
from torchvision.io import read_image
import torchvision.transforms as transforms

url = "https://framemark.vam.ac.uk/collections/2006AT0614/full/full/0/default.jpg"
dog_image_pil = Image.open(requests.get(url, stream=True).raw)

transform = transforms.Compose([
transforms.PILToTensor()])

dog_image_torch = transform(dog_image_pil)

In [44]:
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image

# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(dog_image_torch).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
prediction = model(batch)["out"]
normalized_masks = prediction.softmax(dim=1)
class_to_idx = {cls: idx for (idx, cls) in enumerate(weights.meta["categories"])}
mask = normalized_masks[0, class_to_idx["dog"]]
to_pil_image(mask).show()

In [47]:
from PIL import ImageChops

mask_image_pil = to_pil_image(mask)
dog_image_pil_resized = dog_image_pil.resize(mask_image_pil.size)

masked_image = ImageChops.multiply(dog_image_pil_resized, mask_image_pil.convert('RGB'))
masked_image.show()

## Lion - Landseer's Dog (dog within picture frame)

Let's compare this with a dog *within* a artwork, that is a painting of a dog. Now we have three levels of segmentation
to deal with

  * The white background around the artwork (a painting, unframed in this image)
  * The background in the painting
  * The dog in the foreground of the painting

Let see what the model makes of this

In [48]:
from PIL import Image
import requests
from torchvision.io import read_image
import torchvision.transforms as transforms

landseer_url = "https://framemark.vam.ac.uk/collections/2006AU1447/full/full/0/default.jpg"
landseer_image_pil = Image.open(requests.get(url, stream=True).raw)

transform = transforms.Compose([
transforms.PILToTensor()])

dog_image_torch = transform(landseer_image_pil)

In [49]:
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image

# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(dog_image_torch).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
prediction = model(batch)["out"]
normalized_masks = prediction.softmax(dim=1)
class_to_idx = {cls: idx for (idx, cls) in enumerate(weights.meta["categories"])}
mask = normalized_masks[0, class_to_idx["dog"]]
to_pil_image(mask).show()

In [50]:
from PIL import ImageChops

mask_image_pil = to_pil_image(mask)
dog_image_pil_resized = dog_image_pil.resize(mask_image_pil.size)

masked_image = ImageChops.multiply(dog_image_pil_resized, mask_image_pil.convert('RGB'))
masked_image.show()

The results are much more successful at recognising the dog, but of course it has removed the rest of the painting, making this more relevant to object recognition than segmentation.

This is as expected because the training data classes/labels for the model are not for cultural heritage objects, and perhaps 
training for this is something that cannot be done, because 'what is art' etc etc. Although it would seem possible to train a model specifically on common features of some artworks (paintings in frames, sculptures on a plinth, dress on a mannequin, etc)

Ater working through this simple example, alternative implmentations which may be more successful are given in ...

## Ethical Considerations

  * MsCOCO dataset 
  * FCN Resnet ?

## Environmental Considerations

 * Creating the model
 * Running the model 

## Social/Economic Considerations

If successful in this problem it will reduces work needed in graphic design tasks and related content creation which may or may not contibute to job losses in those industries. Alternatively it removes at tedious manual task and frees up time for more creative work.

# See also 

## References

  * https://pyimagesearch.com/2020/09/28/image-segmentation-with-mask-r-cnn-grabcut-and-opencv/