Osnabrück University - Computer Vision (Winter Term 2020/21) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack, Axel Schaffland

# Exercise Sheet 05: Segmentation 2

## Introduction

This week's sheet should be solved and handed in before the end of **Saturday, December 5, 2020**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 0: Math recap (Periodic functions) [0 Points]

This exercise is supposed to be very easy, does not give any points, and is voluntary. There will be a similar exercise on every sheet. It is intended to revise some basic mathematical notions that are assumed throughout this class and to allow you to check if you are comfortable with them. Usually you should have no problem to answer these questions offhand, but if you feel unsure, this is a good time to look them up again. You are always welcome to discuss questions with the tutors or in the practice session. Also, if you have a (math) topic you would like to recap, please let us know.

**a)** What are periodic functions? Can you provide a definition?

YOUR ANSWER HERE

**b)** What are *amplitude*, *frequency*, *wave length*, and *phase* of a sine function? How can you change these properties?

YOUR ANSWER HERE

**c)** How are sine and cosine defined for complex arguments? In what sense does this generalize the real case?

YOUR ANSWER HERE

## Assignment 1: Edge-based segmentation  [5 Points]

### a) Gradients
What is the gradient of a pixel? How do we calculate the first, how the second derivative of an image?  

The gradient of a pixel is given by the difference in contrast to its neighboring pixels (4- or 8-neighborhood). The gradient points into the direction with highest divergence. We can imagine an image as a function consisting of two variables (x- and y-axes) and its color shading in each pixel as the outcome. The whole image presents a landscape of valleys and hills in respect to its shading and coloring. A sobel-filtered image presents the first derivative of each pixel while the laplace-filter creates the second derivative. 

### b) Edge linking

Describe in your own words the idea of edge linking. What is the goal? Why does it not necessarily yield closed
edge contours?

Edge linking is a variant of **edge-based segmentation** that uses gradient magnitude to link edges.  
The stronger the gradient value at position $(x, y)$, the higher the probability that it is a real edge and not noise.  
If $(x, y)$ belongs to an edge, the idea is that there should be more edge pixels orthogonal to the gradient direction.

**Goal:** Find segments by a search for boundaries between regions of different features.

**TODO: Why not closed edge contours?**

### c) Zero crossings

Explain what zero crossings are. Why does the detection of zero crossings always lead to closed contours?

A zero-crossing in general is a point where the sign of a function changes, represented by an intercept of the axis in the graph of the function.  
In our context, zero crossings of the second derivative correspond to edges.

**TODO:** why lead to closed contours?

### c) Zero crossings (implementation)

Provide an implementation of the zero crossing procedure described in (CV-07 slide 71). To get sensible results you should smooth the image before applying the Laplacian filter, e.g. using the Laplacian of a Gaussian (you may use buildin functions for the filterings steps).

In [None]:
from skimage import filters
from imageio import imread
import matplotlib.pyplot as plt
from scipy.ndimage import shift
import numpy as np
%matplotlib inline

img = imread('images/swampflower.png').astype(float)
img /= img.max()

# Now compute edges and then zero crossings using the 4-neighborhood and the 8-neighborhood
# YOUR CODE HERE

def four_shift(edges):
    x_shift = shift(edges, (1, 0))
    y_shift = shift(edges, (0, 1))
    return (edges * x_shift <= 0) + (edges * y_shift <= 0)

def eight_shift(edges):
    tmp = four_shift(edges)
    xy_shift_one = shift(edges, (1, -1))
    xy_shift_two = shift(edges, (1, 1))
    return tmp + (edges * xy_shift_one <= 0) + (edges * xy_shift_two <= 0)

smooth_img = filters.gaussian(img, sigma=5)
edges = filters.laplace(smooth_img)

zero_crossings_n4 = four_shift(edges)
zero_crossings_n8 = eight_shift(edges)

plt.figure(figsize=(12, 12))
plt.gray()

plt.subplot(2,2,1); plt.axis('off'); plt.imshow(img); plt.title('original')
plt.subplot(2,2,2); plt.axis('off'); plt.imshow(edges); plt.title('edges')
plt.subplot(2,2,3); plt.axis('off'); plt.imshow(zero_crossings_n4); plt.title('zero crossings (N4)')
plt.subplot(2,2,4); plt.axis('off'); plt.imshow(zero_crossings_n8); plt.title('zero crossings (N8)' )

plt.show()

## Assignment 2: Watershed transform  [5 Points]



### a) Watershed transform

Explain in your own words the idea of watershed transform. How do the two different approaches from the lecture work? Why does watershed transform always give a closed contour?



Watershed transform finds segments included by edges. The gradient magnitude image represents the heights of the watershed as segment boundaries.  
The water flows downhill to a local minimum and the result are segments enclosed by edges, but ignoring the differing strength of edges (noise).

Two methods:
- **rain**: compute for each pixel the local minimum (where the water gathers)
- **flood**: starting at local minima, the groundwater floats the relief

**TODO:** Why does watershed transform always give a closed contour?

### b) Implementation

Now implement the watershed transform using the flooding approach (CV-07 slide 76, but note, that the algorithm presented there is somewhat simplified!). Obviously, buildin functions for computing watershed transform are not allowed, but all other functions may be used. In this example we appply the watershed transform to a distance transformed image, so you **do not** have to take the gradient image, but can apply the watershed transform directly.

In [None]:
import numpy as np
import imageio
import matplotlib.pyplot as plt
%matplotlib inline


def watershed(img, step=1):
    """
    Perform watershed transform on a grayscale image.
    
    Args:
        img (ndarray): The grayscale image.
        step (int): The rise of the waterlevel at each step. Default 1.
        
    Returns:
        edges (ndarray): A binary image containing the watersheds.
    """

    NO_LABEL = 0
    WATERSHED = 1
    new_label = 2

    # initialize labels
    label = np.zeros(img.shape, np.uint16)

    # YOUR CODE HERE

    for h in range(int(img.max())):
        for x in range(img.shape[0] - 1):
            for y in range(img.shape[1] - 1):
                if h >= img[x][y] and label[x][y] == 0:
                    # flooded - 3 cases
                    nl = get_neighbor_labels(label, x, y)
                    # isolated
                    if np.sum(nl) == 0:
                        label[x][y] = new_label
                    # segment
                    elif np.sum(nl) == np.all(nl == nl[0]):
                        label[x][y] = nl[0]
                    # watershed
                    else:
                        label[x][y] = WATERSHED

    for x in range(label.shape[0]):
        for y in range(label.shape[1]):
            if label[x][y] == WATERSHED:
                label[x][y] = 0
            else:
                label[x][y] = 1
    return label


def get_neighbor_labels(label, x, y):
    return [
        label[x - 1][y - 1], label[x][y - 1], label[x + 1][y - 1], label[x - 1][y],
        label[x + 1][y], label[x - 1][y + 1], label[x][y + 1], label[x + 1][y + 1]
    ]

img = imageio.imread('images/dist_circles.png', pilmode='L')

plt.gray()
plt.subplot(1,2,1)
plt.axis('off')
plt.imshow(img)

plt.subplot(1,2,2)
plt.axis('off')
plt.imshow(watershed(img))
plt.show()

### c) Application: maze

You can use watershed transform to find your way through a maze. To do so, first apply a distance transform to the maze and then flood the result. The watershed will show you the way through the maze. Explain why this works.
You can use build-in functions instead of your own watershed function.

In [None]:
import numpy as np
import imageio
import matplotlib.pyplot as plt
from scipy.ndimage.morphology import distance_transform_edt
from skimage.segmentation import watershed
%matplotlib inline

img = imageio.imread('images/maze2.png', pilmode = 'L') # 'maze1.png' or 'maze2.png'

result = img[:, :, np.newaxis].repeat(3, 2)
# YOUR CODE HERE
dt = distance_transform_edt(img)
water = watershed(dt)
result[water == 1] = (255, 0, 0)

plt.figure(figsize=(10, 10))
plt.title('Solution')
plt.axis('off')
plt.gray()
plt.imshow(result)
plt.show()

The solution path is the watershed between the catchment basins.

## Assignment 3: $k$-means segmentation [5 Points]


**a)** Explain the idea of $k$-means clustering and how it can be used for segmentation.

Color segmentation in general is used to find segments of constant color.  
$k-$Means in general is used to separate data into $k$ clusters of similar properties represented by a cluster center.

$k-$Means for color segmentation starts with with $k$ random RGB values as cluster centers and assigns each RGB value in the image to its closest
cluster center based on the RGB difference. Afterwards, a new center is computed for each cluster based on its average RGB value.  
It's an iterative procedure of the two steps 'center computation' and 'cluster assignment update' until convergence up to a certain threshold is reached.



**b)** Implement k-means clustering for color segmentation of an RGB image (no use of `scipy.cluster.vq.kmeans` or similar functions allowed here, but you may use functions like `numpy.mean`, `scipy.spatial.distance.pdist` and similar utility functions). Stop calculation when center vectors do not change more than a predefined threshold. Avoid empty clusters by re-initializing the corresponding center vector. (Empirically) determine a good value for $k$ for clustering the image 'peppers.png'.
**Bonus** If you want you can visualize the intermediate steps of the clustering process.

First lets take a look at how our image looks in RGB colorspace. 

In [None]:
from mpl_toolkits.mplot3d import Axes3D
from imageio import imread
import matplotlib.pyplot as plt
%matplotlib inline

img = imread('images/peppers.png')
vec = img.reshape((-1, img.shape[2]))
vec_scaled = vec / 255
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(111, projection='3d')
ret = ax.scatter(vec[:, 0], vec[:, 1], vec[:, 2], c=vec_scaled, marker='.')

In [None]:
import numpy as np
from scipy.spatial import distance
from IPython import display
from imageio import imread
import time
import matplotlib.pyplot as plt
%matplotlib inline


def kmeans_rgb(img, k, threshold=0, do_display=None):
    """
    k-means clustering in RGB space.

    Args:
        img (numpy.ndarray): an RGB image
        k (int): the number of clusters
        threshold (float): Maximal change for convergence criterion.
        do_display (bool): Whether or not to plot, intermediate steps.
        
    Results:
        cluster (numpy.ndarray): an array of the same size as `img`,
            containing for each pixel the cluster it belongs to
        centers (numpy.ndarray): 'number of clusters' x 3 array. 
            RGB color for each cluster center.
    """
    # YOUR CODE HERE

    # initialize random cluster centers (k random rgb tuples)
    centers = np.array([np.random.randint(255, size=3) for _ in range(k)])
    # list of rgb values in img
    rgb_list = [[img[x][y][0], img[x][y][1], img[x][y][2]] for x in range(img.shape[0]) for y in range(img.shape[1])]

    change = np.inf

    while change > threshold:
        change = 0
        # compute distance between each pair of the two collections of inputs
        rgb_dist_to_centers = distance.cdist(rgb_list, centers)
        # assign closest cluster center to each rgb value
        cluster_for_each_rgb = np.array([np.argmin(distances) for distances in rgb_dist_to_centers])

        for i in range(k):
            if i in cluster_for_each_rgb:
                # determine colors that are assigned to the currently considered cluster
                colors = [rgb_list[x] for x in range(len(rgb_list)) if cluster_for_each_rgb[x] == i]

                # update cluster center
                new_center = []
                for channel in range(3):
                    avg = 0
                    for x in colors:
                        avg += x[channel]
                    new_center.append(int(avg / len(colors)))

            else:
                # re-initialize center
                new_center = np.random.randint(255, size=3)
            
            change += distance.cdist([centers[i]], [new_center])
            centers[i] = new_center

    return cluster_for_each_rgb.reshape((img.shape[0], img.shape[1])), centers

img = imread('images/peppers.png')

cluster, centers = kmeans_rgb(img, k=7, threshold=0, do_display=True)
plt.imshow(centers[cluster])
plt.show()

**c)** Now do the same in the HSV space (remember its special topological structure). Check if you can improve the results by ignoring some of the HSV channels.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import distance
from skimage import color
from imageio import imread
%matplotlib inline
# from matplotlib.colors import rgb_to_hsv, hsv_to_rgb

img = imread('images/peppers.png', pilmode = 'RGB')

def kmeans_hsv(img, k, threshold = 0):
    """
    k-means clustering in HSV space.

    Args:
        img (numpy.ndarray): an HSV image
        k (int): the number of clusters
        threshold (float): 
        
    Results:
        cluster (numpy.ndarray): an array of the same size as `img`,
            containing for each pixel the cluster it belongs to
        centers (numpy.ndarray): an array
    """
    # YOUR CODE HERE

    # initialize random cluster centers (k random hsv tuples)
    centers = np.array([np.random.uniform(0, 1, size=3) for _ in range(k)])
    # list of rgb values in img
    hsv_list = [[img[x][y][0], img[x][y][1], img[x][y][2]] for x in range(img.shape[0]) for y in range(img.shape[1])]

    change = np.inf

    while change > threshold:
        change = 0
        # compute distance between each pair of the two collections of inputs
        hsv_dist_to_centers = distance.cdist(hsv_list, centers)
        # assign closest cluster center to each hsv value
        cluster_for_each_hsv = np.array([np.argmin(distances) for distances in hsv_dist_to_centers])

        for i in range(k):
            if i in cluster_for_each_hsv:
                # determine colors that are assigned to the currently considered cluster
                colors = [hsv_list[x] for x in range(len(hsv_list)) if cluster_for_each_hsv[x] == i]

                # update cluster center
                new_center = []
                for channel in range(3):
                    avg = 0
                    for x in colors:
                        avg += x[channel]
                    new_center.append(avg / len(colors))

            else:
                # re-initialize center
                new_center = np.random.uniform(0, 1, size=3)
            
            change += distance.cdist([centers[i]], [new_center])
            centers[i] = new_center

    return cluster_for_each_hsv.reshape((img.shape[0], img.shape[1])), centers


img_hsv = color.rgb2hsv(img)
k = 7
theta = 0.01

cluster, centers_hsv = kmeans_hsv(img_hsv[:,:,:], k, theta)
if (centers_hsv.shape[1] == 3):
    plt.imshow(color.hsv2rgb(centers_hsv[cluster]))
else:
    plt.gray()
    plt.imshow(np.squeeze(centers_hsv[cluster]))
plt.show()


## Assignment 4: Interactive Region Growing [5 Points]

Implement flood fill as described in (CV07 slides 123ff.).

In a recursive implementation the floodfill function is called for the seed pixel. In the function a recursive call for the four neighbouring pixels is made, if the color of the pixel, the function is called with, is similar to the seed color. If this is the case the pixel is added to the region. [Other](https://en.wikipedia.org/wiki/Flood_fill) more elegant solutions exist aswell.

The function `on_press` is called when a mouse button is pressed inside the canvas. From there call `floodfill`. Use the filtered hsv image `img_filtered` for your computation, and show the computed region around the seed point (the position where the mousebutton was pressed) in the original image. You may use a mask to save which pixels belong the the region (and to save which pixels you already visited). 

Hint: If you can not see the image, try restarting the kernel.

In [None]:
%matplotlib widget
import imageio
import math
import numpy as np
from matplotlib import pyplot as plt
from skimage import color
import scipy.ndimage as ndimage
from sys import setrecursionlimit
from scipy.spatial import distance

threshold = .08;

setrecursionlimit(100000)

def floodfill(img, mask, x, y, color):
    """Recursively grows region around seed point
    
    Args: 
        img (ndarray): The image in which the region is grown
        mask (boolean ndarray): Visited pixels which belong to the region.
        x (uint): X coordinate of the pixel. Checks if this pixels belongs to the region
        y (uint): Y coordinate of the pixel.
        color (list): The color at the seed position

    Return:
        mask (boolean ndarray): mask containing region
    """
    # YOUR CODE HERE
    if distance.cdist([img[x][y]], [color]) < threshold:
        mask[x,y] = True
        eight_neighbourhood = get_neighbors(x, y)
        for x, y in eight_neighbourhood:
            if not mask[x][y]:
                mask = floodfill(img, mask, x, y, color)
    return mask
    
def get_neighbors(x, y):
    return [
        (x - 1, y - 1), (x, y - 1), (x + 1, y - 1), (x - 1, y),
        (x + 1, y), (x - 1, y + 1), (x, y + 1), (x + 1, y + 1)
    ]

def on_press(event):
    """Mouse button press event handler
    
    Args:
        event: The mouse event
    """
    y = math.floor(event.xdata)
    x = math.floor(event.ydata)
    color = img_filtered[x, y, :]

    # YOUR CODE HERE
    mask = floodfill(img_filtered, np.zeros((img.shape[0], img.shape[1])), x, y, color)
    img[mask == True] = (255, 255, 255)

    plt.imshow(img)
    fig.canvas.draw()
    

def fill_from_pixel(img, img_filtered, x,y):
    """ Calls floodfill from a pixel position
    
    Args:
        img (ndarray): IO image on which fill is drawn.
        img_filtered (ndarray): Processing image on which floodfill is computed.
        x (uint): Coordinates of pixel position.
        y (uint): Coordinates of pixel position.

    Returns:
        img (ndarray): Image with grown area in white
    """
    mask = np.zeros((img.shape[0],img.shape[1]))
    color = img_filtered[x,y, :]
    mask = floodfill(img_filtered, mask, x, y, color)
    img[mask] = (255, 255, 255)
    
    return img


img = imageio.imread('images/peppers.png')
img_hsv = color.rgb2hsv(img)
img_filtered = ndimage.median_filter(img_hsv, 5)
#img = fill_from_pixel(img, img_filtered, 200, 300) # Comment in to deactivate simple testing at fixed position
fig = plt.figure()
ax = fig.add_subplot(111)
plt.imshow(img)
fig.canvas.mpl_connect('button_press_event', on_press)
plt.show()