Osnabrück University - Computer Vision (Winter Term 2020/21) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack, Axel Schaffland, Ludwig Schallner, Artem Petrov

# Recap I

This sheet is a recap of the first half of the term. Neither do you have to present it to your tutors nor will it count to the number of passed sheets required for the exam. I.e. you do not have to complete this sheet but we highly recommend that you solve the assignments as part of your preparations for the exam. 

## Introduction

**a)** What are the goals of *computer vision* and *image processing*? Name some subtasks. Give one example problem and describe how to solve it with the algorithms presented in this course. 

**Goal of CV**: Recognition of the image by the computer
- detection of regions of interest
- boundary detection
- feature extraction
- classification of colors, shapes, objects
- 3D representations of real scenes
- reconstruction of 3D surfaces
- motion detection
    - object / background separation
    - direction and velocity computation
    - object tracking

**Goal of image processing**: Enhance images to facilitate analysis by a human
- repair corrupted images
- compensation of bad acquisition conditions (e.g. contrast enhancement)
- improve perceptibility (e.g. contrast enhancement)
- 'highlight' information

**Example problem: Basic object recognition**

A simple approach for object recognition is **template matching**:
- construct a template (prototypical model of the object you'd like to find in the image)
- search for template in image by computing  similarity between template and underlying image patch
    - two similarity measures
        - mean absolute difference (MAD)
        - correlation coefficient (better)

**b)** Describe the difference between *top down* and *bottom up* strategies. From another perspective they are called?

Those are processing strategies:

**bottom up**:
- starting from data
- looking for increasingly complex features and connections until they match the model
- aka **data driven**

**top down**:
- try to find model within data
- aka **model driven**

Commonly a mixture of both is used!

**c)** What is the semantic gap?

The **semantic gap** refers to the hope for a correlation between low level features and high level concepts.

## Image Acquisition

**a)** Draw (on paper) the concept of a pinhole camera. Draw at least an object, rays, the pinhole, and the image plane.

<img src="img/pinhole.png" width="600"/>

**b)** Explain how human color vision works.

- visible wavelengths: $\approx 380 nm - 750 nm$
- the eye contains three types of receptors with different spectral sensitivities (RGB)
- arranged side by side in the retina
- so we reduce the incoming spectrum to just three stimuli $\infty \rightarrow 3$ dimensions

**c)** Is a Bayer-Filter a local operator? Explain your answer!

I'd say it's global, because it affects the whole image.  
TODO: Check that.

**d)**  What is the smallest distance between two pixels under 4-/8-neighborhood?

$4$-neighborhood: **manhattan distance**: $|x_1 - x_2| + |y_1 - y_2|$  
$\rightarrow$ In $4$-neighborhood, you can move only up/down/left/right, that's manhattan distance

$8$-neighborhood: **chessboard distance**: $\max (|x_1 - x_2|, |y_1 - y_2|)$  
$\rightarrow$ In $8$-neighborhood, you can move as the king in chess

**e)** Name the two types of loss of information and give an example for each.

**Stochastic loss:** E.g. noise

**Deterministic loss:** E.g. projection and sampling, bad camera parameters (over-/underexposure, bad focus), motion blur

## Basic Operators

**a)** What is a *point operator*, a *local operator*, a *global operator*? Provide examples for each of them. Which are *linear* which are not? Give an example for a *non-homogenous* operator. Describe application scenarios for different operators. What is a *rank filter*?

**point operator:** $g'(x, y) = O(g(x, y))$ $\rightarrow$ result pixel depends only on input pixel
- e.g. **thresholding**: $g'(x, y) = \Theta (g(x, y) - \vartheta)$ with threshold $\vartheta$ and $\Theta (x) = 0$ for $x < 0$ and $\Theta (x) = 1$ otherwise
    - non-linear, it does matter whether you first multiply each pixel by scalar and then apply or vice versa
    - application example: binarization of a bimodal distribution
- another example would be a **linear transform**, e.g. $g'(x, y) = a \cdot g(x, y) + b$
    - obviously linear
    - application example: luminance- and contrast-enhancement

**local operator:** $g'(x, y) = O(g(x, y), g(surroundings(x, y)))$ $\rightarrow$ result pixel depends on input pixel + surrounding pixels
- e.g. **convolution** defined by filter kernel: $g'(x, y) = \sum_{i \in [-m, m]} \sum_{j \in [-n, n]} k(i+m, j+n) \cdot g(x+i, y+j)$ (scalar product of kernel and image patch)
    - linear (scalar product, we had to prove that)
    - application example: smoothing, edge detection

**global operator:** $g'(x, y) = O(g(all pixels))$ $\rightarrow$ result pixel depends on all pixels of the input image
- e.g. **Fourier transform**: Transforms image $g$ from the spatial domain to the frequency domain
    - linear (additivity and homogeneity hold)
    - application example: 
        - fast computation of convolution in Fourier space (just a multiplication)
        - detect texture in images
        - compression

TODO: Not sure about a non-homogeneous operator - would depend explicitly on the location - did we discuss such an operator in the lecture?

**Rank filter:**
- non-linear (can not be implemented as convolution)
- sort gray values covered by kernel
- select gray value from sorted list that replaces the current pixel (result)
- the selection of the position determines the type of rank filter:
    - **min filter**: select min gray value (first position)
    - **median filter**: select center of the list
    - **max filter**: select max gray value (last position)

**b)** Load an image and apply different local operators (convolution, nonlinear smoothing, morphological) and display the results. Explain their effects and possible applications.

In [None]:
%matplotlib inline
from skimage import filters, morphology
import matplotlib.pyplot as plt
import numpy as np
import random

def get_test_img():
    return plt.imread('img/test.JPG')

def get_test_img_gray():
    img = plt.imread('img/test.JPG')
    black_white = np.zeros((img.shape[0], img.shape[1]))
    for x in range(img.shape[0]):
        for y in range(img.shape[1]):
            black_white[x][y] = np.sum(img[x][y]) / 3
    return black_white / black_white.max()

def generate_noisy_img(img):
    prob = 0.5
    noisy_img = img.copy()

    for x in range(1, noisy_img.shape[0] - 1, 2):
        for y in range(1, noisy_img.shape[1] - 1, 4):
            # add noise
            if random.random() < prob:
                if random.choice([0, 1]) == 0:
                    noisy_img[x][y] = 0
                    noisy_img[x - 1][y] = 0
                    noisy_img[x + 1][y] = 0
                else:
                    noisy_img[x][y] = 1
                    noisy_img[x - 1][y] = 1
                    noisy_img[x + 1][y] = 1
    return noisy_img

def apply_laplace(img):
    # TODO: to be fixed
    img = img.astype(float)
    return filters.laplace(img, ksize=3)

def get_binarized_img(img, thresh):
    tmp = img.copy()
    for x in range(img.shape[0]):
        for y in range(img.shape[1]):
            if tmp[x][y] < thresh:
                tmp[x][y] = 1
            else:
                tmp[x][y] = 0
    return tmp

plt.figure(figsize=(20, 12))

img = get_test_img_gray()

plt.subplot(2, 5, 1); plt.title('original image'); plt.imshow(img, cmap='gray')
plt.subplot(2, 5, 2); plt.title('conv gaussian'); plt.imshow(filters.gaussian(img, sigma=2.5), cmap='gray')
plt.subplot(2, 5, 3); plt.title('conv sobel'); plt.imshow(filters.sobel(img), cmap='gray')
plt.subplot(2, 5, 4); plt.title('noisy img'); plt.imshow(generate_noisy_img(img), cmap='gray')
plt.subplot(2, 5, 5); plt.title('conv median of noisy img'); plt.imshow(filters.median(generate_noisy_img(img)), cmap='gray')
plt.subplot(2, 5, 6); plt.title('conv max'); plt.imshow(img, cmap='gray')
plt.subplot(2, 5, 7); plt.title('binarized img'); plt.imshow(get_binarized_img(img, 0.2), cmap='gray')
plt.subplot(2, 5, 8); plt.title('erosion of binarized img'); plt.imshow(morphology.binary_erosion(get_binarized_img(img, 0.2)), cmap='gray')
plt.subplot(2, 5, 9); plt.title('dilation of binarized img'); plt.imshow(morphology.binary_dilation(get_binarized_img(img, 0.2)), cmap='gray')
plt.subplot(2, 5, 10); plt.title('binary opening'); plt.imshow(morphology.binary_opening(get_binarized_img(img, 0.2)), cmap='gray')

# TODO: non-linear smoothing

plt.show()

YOUR ANSWER HERE

**c)**
With pen and paper: Generate a random  $5 \times 5$ image and smooth this image by a $3 \times 3$ laplace filter. Select a border handling mode of your choice.

In [None]:
from scipy import signal

def normalize(img):
    img = (img - np.min(img)) / np.ptp(img) * 255
    return img.astype(int)

img = np.array([[0, 100, 200, 100, 0], [0, 0, 200, 0, 0], [0, 0, 200, 0, 0], [0, 0, 100, 0, 0], [0, 0, 100, 0, 0]])
laplace_kernel = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])
laplace = signal.convolve2d(img, laplace_kernel, fillvalue=0)

test_img = get_test_img_gray() * 255
test_img = test_img.astype(int)[100:400][100:500]
laplace_test = signal.convolve2d(test_img, laplace_kernel, fillvalue=0)

plt.figure(figsize=(20, 20))
plt.subplot(1, 4, 1); plt.title('original img'); plt.imshow(img, cmap='gray')
plt.subplot(1, 4, 2); plt.title('laplace filtered'); plt.imshow(normalize(laplace), cmap='gray')
plt.subplot(1, 4, 3); plt.title('original test'); plt.imshow(test_img, cmap='gray')
plt.subplot(1, 4, 4); plt.title('laplace filtered'); plt.imshow(normalize(laplace_test), cmap='gray')

plt.show()

**d)** Give an example $3\times3$ kernel for the following filters and briefly explain their use:
* Box
* Binomial
* Sobel (one direction of your choice)
* Laplace

**Box Filter**

$\frac{1}{9} \cdot
\begin{matrix}
1 & 1 & 1 \\ 
1 & 1 & 1 \\ 
1 & 1 & 1 \\ 
\end{matrix}$

**Binomial Filter**

$\frac{1}{16} \cdot
\begin{matrix}
1 & 2 & 1 \\ 
2 & 4 & 2 \\ 
1 & 2 & 1 \\ 
\end{matrix}$

**Sobel Filter**

$\frac{1}{4} \cdot
\begin{matrix}
1 & 0 & -1 \\ 
2 & 0 & -2 \\ 
1 & 0 & -1 \\ 
\end{matrix}$

**Laplace Filter**

$\begin{matrix}
0 & 1 & 0 \\ 
1 & -4 & 1 \\ 
0 & 1 & 0 \\ 
\end{matrix}$


**e)** What are separable filter kernels?

YOUR ANSWER HERE

## Image Enhancement

**a)**  What is the histogram of an image? What is a gradient image and how is it computed? What is a histogram of gradients? Name some applications.

YOUR ANSWER HERE

YOUR ANSWER HERE

**b)** Give formulae for information content and average information content. What do information content and entropy measure? On the slides $\log_n$ is used for information content and $\log_2$ is used for entropy. Why?

YOUR ANSWER HERE

**c)** Discuss histogram equalization. Name some problems and explain how they can be addressed.

YOUR ANSWER HERE

YOUR ANSWER HERE

## Morphological operators

**a)** What is a structuring element? How is it applied in erosion and dilation?

YOUR ANSWER HERE

**b)** Give pseudocode for the distance transform using morphological operators

YOUR ANSWER HERE

## Color

**a)** Which of the follwoing use additive color mixing and which use subtractive color mixing:
* Printer
* Cathode ray tub (Old screens)
* LCD Screen
* Van Googh
* Analog Cinema Projector
* Digital Projector (DLP)

YOUR ANSWER HERE

**b)** Name two color spaces and list their advantages.

YOUR ANSWER HERE

## Segmentation

**a)** Explain *region based* and *edged based* *segmentation*. What are the differences between *split and merge* and *region merging*? What is the idea of *color segmentation* and does it give any advantage?

YOUR ANSWER HERE

YOUR ANSWER HERE

YOUR ANSWER HERE

**b)**  Provide pseudocode for the $k$-means clustering algorithm in color space.

YOUR ANSWER HERE

**c)** Give two examples for interactive segmentation and discuss them.

YOUR ANSWER HERE

YOUR ANSWER HERE

YOUR ANSWER HERE

YOUR ANSWER HERE

## Hough Transform

**a)** What is the idea of *Hough transform*? What is an *accumulator space*? How to determine its dimensionality? Can you interpret the linear Hough space? How many dimensions has the accumulator space for circular Hough transform?

YOUR ANSWER HERE

# Recap II

This sheet is a recap of the second half of the term. Neither do you have to present it to your tutors nor will it count to the number of passed sheets required for the exam. I.e. you do not have to complete this sheet but we highly recomment that you solve the assignments as part of your preparations for the exam. We will discuss the results in the last practice session on February 11. Also, if you have questions on any of the topics, please send them to us and we will discuss them in that session.

## Fourier Transform

**a)** What is the idea of *Fourier Transform*, and why is it useful for image processing? Can you provide a formula? Why is it called an orthogonal transformation? Which aspects of an image can be recognized in its Fourier transform?

YOUR ANSWER HERE

## Template Matching

**a)** Explain the principle of template matching.

YOUR ANSWER HERE

**b)** When and why does the correlation coefficient perform better than the mean absolute distance?

YOUR ANSWER HERE

## Pattern Recognition

**a)** What are the principle components of a 2-dimensional data distribution. What are the principle components when of an image?

YOUR ANSWER HERE

## Local Features

**a)** Describe the *Moravec* and the *Harris corner detectors*. What are the differences?

YOUR ANSWER HERE

**b)** What are *local features* and what are they used for? Name some examples? Describe the main steps of SIFT and explain how invariances of the features are achieved.

YOUR ANSWER HERE

## Compression

**a)** How does Huffman-Coding work?

YOUR ANSWER HERE

**b)** What is the Gray code and what is its relation to run length encoding?

YOUR ANSWER HERE

## Understanding the Wireframe-Model


**a)** Explain in your own words the functions on slide  (CV-12 slide 9). Also explain when and why it may make sense to use $m$ instead of $m'$.

YOUR ANSWER HERE