Osnabrück University - Computer Vision (Winter Term 2020/21) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack, Axel Schaffland

# Exercise Sheet 01: Basic Operations - Convolution

## Introduction

This is the first "real" homework sheet.

The homework sheets will usually be available on Saturday and are supposed to be solved in groups of three. They have to be handed in before Sunday morning of the following week. The exercises are then presented to your tutor in a small feedback session. To acquire the admission for the final exam, you will have to pass 𝑁−2 of the weekly provided exercise sheets.

Sign up for a group on Stud.IP (See Participants -> Functions/Groups). The times mentioned there are the times for the feedback session of your group. If none of them fits, send any of the tutors an e-mail so we can try to arrange something.

Your group will have a group folder in Stud.IP under Documents. Upload your solutions there to hand them in.

This week's sheet should be solved and handed in before the end of **Saturday, November 07, 2020**. Please upload your results to your group's Stud.IP folder. In case you cannot do this first sheet (due to technical or organizational problems) please upload a description of your problem instead. Your tutor will help you to solve the problems in the first feedback session and you may hand in this sheet together with the second sheet one week later.

## Assignment 1: Twodimensional Convolution [8 Points]

This exercise is purely theoretical and does not require implementation.

### a) Definition

Describe in your own words how convolution works.

- in general, a convolution is an operation that combines two functions to get a new one
- in an image processing context, it's an operation that takes in two different grids of values and combines them to get a new grid of values
- typically, one of the two grids is smaller and called kernel
- the result of applying the kernel (smaller matrix) to the image (bigger matrix) would be another image and we'd say that we convolved the original image using the kernel
- so, a convolution in the image processing context is defined by a filter kernel (mask, receptive field) which is just a matrix
- we basically replace each pixel value of an image by a new value that is based on a functional evaluation of its neighboring pixels
- linear kernels are the easiest (and most efficient)
    - replace each pixel by a linear combination of its neighbors
    - e.g. the kernel is moved across the input image and for each pixel we generate a new pixel by an element-wise multiplication of the kernel and the region of the original image that is currently under the kernel and sum everything up to get the new pixel value
    - example: smoothing an image with a gaussian kernel
        - a gaussian kernel kind of takes the avg of the neighboring pixels, but with a higher weight for the pixels towards the center of the kernel

### b) Properties
Is convolution linear or non-linear? Is it homogenous or inhomogenous? Proof your answers.

There are linear and non-linear convolutions based on the functional evaluation of the neighboring pixels.

In the lecture, we talked about linear, homogeneous convolutions.  
The proof for homogenity is already part of the proof for linearity.

**Linearity Proof:**

The convolution defined by the local operator $L$ is linear if:

**homogenity**: $L(ag) = a L(g)$:  

$L(a \cdot g(x, y)) = \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot a \cdot g(x+i, y+j)$  
$\quad\quad\quad\quad\quad\quad = a \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot g(x+i, y+j)$  
$\quad\quad\quad\quad\quad\quad = a \cdot L(g(x, y))$  

**additivity**: $L(g_1 + g_2) = L(g_1) + L(g_2)$:  

$L(g_1(x, y) + g_2(x, y)) = \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot (g_1(x+i, y+j) + g_2(x+i, y+j))$  
$\quad\quad\quad\quad\quad\quad\quad\quad\quad = \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot g_1(x+i, y+j) + k(i+m, j+n) \cdot g_2(x+i, y+j))$  
$\quad\quad\quad\quad\quad\quad\quad\quad\quad = \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot g_1(x+i, y+j) + \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot g_2(x+i, y+j))$  
$\quad\quad\quad\quad\quad\quad\quad\quad\quad = L(g_1) + L(g_2)$

### c) Complexity

Assume an image $g$ of size $M\times N$ and a kernel $k$ of size $(2m+1)\times(2n+1)$. How many operations (additions and multiplications) are required to compute a convoluted image $g\ast k$ (of the same size as $g$)?

$M \cdot N$ applications of the kernel (once for each pixel). For each application we have $(2m + 1)(2n + 1)$ multiplications and $(2m + 1)(2n + 1) - 1$ additions.  
So, in total we have $MN(2((2m + 1)(2n+1)) - 1)$ operations.  
$\rightarrow$ $MN$ applications of complexity $O(mn)$

### d) Separability

What is a separable kernel? Describe, how it can be applied more efficiently. Compute the number of operations for getting $g\ast k$ (as in (c), but with a separable kernel $k$) and compare the results. Assume that the kernel is of size $m \times n$ and the image is of size $M \times N$. Compute the number of operations first for a single pixel and then extend your answer to the whole image. Ignore the normalization of the kernel, i.e. the fraction in front.

Note that here we define the kernel size as $m \times n$ as opposed to Assignment *1c)*. This is a shorter notation.

Some kernels are separable which means that the kernel $k$ is a product of a row vector and a column vector:

$k(i,j) = k^C(i) \cdot k^R(j)$ with $k^R \in \mathbb{R^{1 \times n}}, k^C \in \mathbb{R}^{m \times 1}$

That leads to a more efficient convolution by first using a $1 \times n$-kernel and then a $m \times 1$-kernel which reduces the computational effort from $O(mn)$ to $O(m + n)$.

We still have $M \cdot N$ applications of the kernels (once for each pixel), but we only need $n$ multiplications and $n-1$ additions for the row kernel
and $m$ multiplications and $m-1$ additions for the column kernel. In total, we have $MN(n + n - 1 + m + m - 1)$ operations.  
$\rightarrow $MN applications of complexity $O(n + m)$

## Assignment 2: Applying Convolution [4 Points]

In this exercise you will apply convolution with different kernels. You may use the function `scipy.ndimage.filters.convolve` to solve this task. Check the documentation to learn how to use this function. In this assignment you do not have to implement the convolution yourself. Realize the following filters, describe their effect and possible applications.

### a) Box filter

- linear filter in which each pixel in the resulting image has a value equal to the average value of its neighboring pixels in the input image
- can be used for smoothing the image, e.g. for noise reduction (at cost of sharpness)
- problem of hard boundary --> artifacts
- with increasing filter size, the image gets increasingly blurry (bigger average)


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from imageio import imread
from skimage import data
from scipy import ndimage

# Load an image
#image = imread('some_file.png', pilmode = 'F')
image = data.coins().astype(np.float32)

# YOUR CODE HERE

# box filter
kernel = 1/9 * np.array([[1, 1, 1],
                         [1, 1, 1],
                         [1, 1, 1]])

filtered_image = ndimage.convolve(image, kernel)

fig = plt.figure(figsize=(15,7))

a=fig.add_subplot(1,2,1)
plt.imshow(image, cmap = 'gray')
plt.title('original')
plt.axis('off')

a=fig.add_subplot(1,2,2)
plt.imshow(filtered_image, cmap = 'gray')
plt.title('box filter')
plt.axis('off')

plt.show()

### b) Gaussian filter

You may try different filter sizes.

- it's also averaging, but the central pixels are playing a stronger role compared to the rest (pixels that are farther away will not be taken into account as much)
- a.k.a. gaussian blur / smoothing
- reduce image noise (at cost of sharpness)
- reduce detail
- pre-processing for other algorithms
- no artifacts, but blurring is not as strong as with the box filter


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from imageio import imread
from skimage import data

# Load an image
#image = imread('some_file.png', pilmode = 'F')
image = data.coins().astype(np.float32)

# YOUR CODE HERE

# binomials are approximations of a gaussian distribution (the larger the better)
# you can use pascal's triangle to generate binomial filters which can be used as approximations of gaussian filters

# 2d binomial filters
kernel_three = np.array([[1/16, 1/8, 1/16],
                         [1/8, 1/4, 1/8],
                         [1/16, 1/8, 1/16]])

# not so good, because no central pixel
kernel_four = np.array([[1/64, 3/64, 3/64, 1/64],
                        [3/64, 9/64, 9/64, 3/64],
                        [3/64, 9/64, 9/64, 3/64],
                        [1/64, 3/64, 3/64, 1/64]])

kernel_five = np.array([[1/256, 4/256, 6/256, 4/256, 1/256],
                        [4/256, 16/256, 24/256, 16/256, 4/256],
                        [6/256, 24/256, 36/256, 24/256, 6/256],
                        [4/256, 16/256, 24/256, 16/256, 4/256],
                        [1/256, 4/256, 6/256, 4/256, 1/256]])

fig = plt.figure(figsize=(30,14))

fig.add_subplot(1,4,1)
plt.imshow(image, cmap = 'gray')
plt.title('original')
plt.axis('off')

fig.add_subplot(1,4,2)

plt.imshow(ndimage.convolve(image, kernel_three), cmap = 'gray')
plt.title('3x3 gaussian')
plt.axis('off')

fig.add_subplot(1,4,3)
plt.imshow(ndimage.convolve(image, kernel_four), cmap = 'gray')
plt.title('4x4 gaussian')
plt.axis('off')

fig.add_subplot(1,4,4)
plt.imshow(ndimage.convolve(image, kernel_five), cmap = 'gray')
plt.title('5x5 gaussian')
plt.axis('off')

plt.show()

### c) Sobel filter

Try horizontal, vertical, and diagonal sobel filters.

- edge detection $\rightarrow$ emphasizes edges (horizontally, vertically, diagonally)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from imageio import imread
from skimage import data

# Load an image
#image = imread('some_file.png', pilmode = 'F')
image = data.coins().astype(np.float32)

# YOUR CODE HERE

sobel_horizontal = np.array([[-1, -2, -1],
                             [0, 0, 0],
                             [1, 2, 1]])

sobel_vertical = np.array([[-1, 0, 1],
                           [-2, 0, 2],
                           [-1, 0, 1]])

sobel_diag_one = np.array([[0, 1, 2],
                           [-1, 0, 1],
                           [-2, -1, 0]])

sobel_diag_two = np.array([[-2, -1, 0],
                           [-1, 0, 1],
                           [0, 1, 2]])

fig = plt.figure(figsize=(30,14))

a=fig.add_subplot(1,5,1)
plt.imshow(image, cmap = 'gray')
plt.title('original')
plt.axis('off')

a=fig.add_subplot(1,5,2)
plt.imshow(ndimage.convolve(image, sobel_horizontal, mode='constant', cval=0.0), cmap = 'gray')
plt.title('sobel - horizontal')
plt.axis('off')

a=fig.add_subplot(1,5,3)
plt.imshow(ndimage.convolve(image, sobel_vertical), cmap = 'gray')
plt.title('sobel - vertical')
plt.axis('off')

a=fig.add_subplot(1,5,4)
plt.imshow(ndimage.convolve(image, sobel_diag_one), cmap = 'gray')
plt.title('sobel - diagonal1')
plt.axis('off')

a=fig.add_subplot(1,5,5)
plt.imshow(ndimage.convolve(image, sobel_diag_two), cmap = 'gray')
plt.title('sobel - diagonal2')
plt.axis('off')

plt.show()

### d) Unsharp Mask

One method to sharpen images is Unsharp Mask in which a negative unsharp mask is added to the original image as follows:

$$\text{Sharpened Image} = \text{Original Image} + (\text{Original Image} - \text{Unsharp Image}) \cdot \text{Amount}$$

The unsharp image can be computed by convolution with a Gaussian Kernel. Implement unsharp masking with a $5\times5$ Gaussian Kernel and a sharpening amount of $1.5$. Use the allready defined gaussian kernel "gauss_5".

Hint: To get good results the final images needs to be clipped to values between $0$ and $255$, i.e. all negative values are set to zero and all values bigger than $255$ are set to $255$.

You may experiment with large or negative sharpening amounts.

* Why is Unsharp Masking sharpening an image?
* What is the difference between normalizing and clipping an image?


- When we subtract the unsharp image from the original, we basically subtract the blurriness and end up with the parts that are high-contrast (difference).  
If we add that (weighted with the amount), we basically strengthen the sharp parts.

- Normalizing an image means that the range of pixel intensities gets changed. Clipping on the other hand is just the idea of keeping every pixel value inside the minimum and maximum intensity that can be represented, i.e. $[0, 255]$.

notes:
- increasing the sharpening amount only leads to good results up to a certain point (artifacts etc.)
- negative sharpening amounts: blurring with artifacts (weakening the sharp parts)




In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from imageio import imread
from skimage import data

# Load an image
#image = imread('some_file.png.jpg', mode='F')
image = data.coins().astype(np.float32)

# Define sharpening amount
amount = 1.5

# Define the filters:
gauss_5 = 1/256 * np.asarray([[1,4,6,4,1],[4,16,24,16,4],[6,24,36,24,6],[4,16,24,16,4],[1,4,6,4,1]])

# YOUR CODE HERE

unsharped_mask_image = ndimage.filters.convolve(image, gauss_5)
diff_img = image - unsharped_mask_image
np.clip(diff_img, 0, 255, diff_img)
sharpened_img = image + diff_img * amount
np.clip(sharpened_img, 0, 255, sharpened_img)

fig = plt.figure(figsize=(30,14))

a=fig.add_subplot(1,4,1)
plt.imshow(image, cmap = 'gray')
plt.title('original')
plt.axis('off')

a=fig.add_subplot(1,4,2)
plt.imshow(unsharped_mask_image, cmap = 'gray')
plt.title('unsharp mask')
plt.axis('off')

a=fig.add_subplot(1,4,3)
plt.imshow(diff_img, cmap = 'gray')
plt.title('diff')
plt.axis('off')

a=fig.add_subplot(1,4,4)
plt.imshow(sharpened_img, cmap = 'gray')
plt.title('sharpened image')
plt.axis('off')

plt.show()

## Assignment 3: Implementing Convolution [8 Points]

Now implement your own 2-dimensional convolution function. The function should take an image and a kernel as argument and return an image of the same size, containing the result of convolving the image with the kernel.

You may notice a problem at the boundaries of the image. Describe the problem and possible solutions. Implement at least one of them.

Then apply your function with different kernels. Compare the results with [Assignment 2](#Assignment-2:-Applying-Convolution-[4-Points]).


**Boundary Problem**  

The problem occurs when applying the kernel to the edge pixels where the kernel overlaps the image. A common solution for the problem is to extend the image beyond its boundaries.
There are several common approaches for that, e.g.:

- reflect: input is extended by reflecting about the edge of the last pixel
- constant: input is extended by filling all values beyond the edge with the same constant value
- nearest: input is extended by replicating the last pixel
- mirror: input is extended by reflecting about the center of the last pixel
- wrap: input is extended by wrapping around to the opposite edge

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from skimage import data

def my_convolve2d(img, kern):
    """Convolve an image with a kernel.

    Args:
        img (np.ndarray): The image, provided as a two-dimensional array.
        kern (np.ndarray): The kernel, also a two-dimensional array.
        
    Returns:
        result (np.ndarray): The convolved image. 
        
    """
    
    # store the image size for easier access
    M,N = img.shape
    # store the kernel size
    m,n = kern.shape
    # and also the half kernel size
    mh, nh = (m//2, n//2)
    
    # Initialize the result matrix
    result = np.zeros((M,N))
    
    # Compute the convolution
    # YOUR CODE HERE

    for x in range(M):
        for y in range(N):
            # (x,y) is current pixel in original image to apply kernel to
            pixel = 0
            for i in range(-mh, mh + 1):
                for j in range(-nh, nh + 1):
                    if x + i > 0 and y + j > 0 and x + i < M and y + j < N:
                        pixel += kern[i + mh][j + nh] * img[x + i][y + j]
                    else:
                        # handle array borders: mode: 'constant', cval: 0.0
                        pixel += kern[i + mh][j + nh] * 0.0
            result[x][y] = pixel

    return result

# Apply your function to an image:
# Try different filters, compare the results with Assignment 2

# Load the image
image = data.coins().astype(np.float32)

box_filter = 1/9 * np.array([[1, 1, 1],
                            [1, 1, 1],
                            [1, 1, 1]])

gaussian_filter = np.array([[1/16, 1/8, 1/16],
                            [1/8, 1/4, 1/8],
                            [1/16, 1/8, 1/16]])

sobel_horizontal = np.array([[-1, -2, -1],
                             [0, 0, 0],
                             [1, 2, 1]])

fig = plt.figure(figsize=(30,14))

a=fig.add_subplot(1,4,1)
plt.imshow(image, cmap = 'gray')
plt.title('original')
plt.axis('off')

a=fig.add_subplot(1,4,2)
plt.imshow(my_convolve2d(image, box_filter), cmap = 'gray')
plt.title('box filter')
plt.axis('off')

a=fig.add_subplot(1,4,3)
plt.imshow(my_convolve2d(image, gaussian_filter), cmap = 'gray')
plt.title('gaussian filter')
plt.axis('off')

a=fig.add_subplot(1,4,4)
plt.imshow(my_convolve2d(image, sobel_horizontal), cmap = 'gray')
plt.title('horizontal sobel')
plt.axis('off')

plt.show()