Osnabrück University - Computer Vision (Winter Term 2020/21) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack, Axel Schaffland, Ludwig Schallner, Artem Petrov

# Exercise Sheet 10: Model Based Recognition / Motion¶

## Introduction

This week's sheet should be solved and handed in before the end of **Saturday, January 23, 2021**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 1: Understanding the Wireframe-Model [5 points]

This exercise addresses the matching procedure described on (CV-12 slides 9-17)

**a)** Explain in your own words the functions on slide  (CV-12 slide 9). Also explain when and why it may make sense to use $m$ instead of $m'$.


From the initial pose, the wire frame model is iteratively adapted to the image based on gradients.  

$m$: magnitude, $\beta$: orientation, $g(x, y)$: image  

**$x$-gradient**: $\Delta_x g = g(x+1, y) - g(x-1, y) \rightarrow$ for a fixed $y$, it's the difference between the pixels to the left and to the right  
**$y$-gradient**: $\Delta_y g = g(x, y+1) - g(x, y-1) \rightarrow$ for a fixed $x$, it's the difference between the pixels to above and below  
**gradient magnitude**: $m'(x, y) = \sqrt{\Delta_x g^2 + \Delta_y g^2}$  
**orientation:** Use the inverse tangent: $\beta(x, y) = arctan(\frac{\Delta_y g}{\Delta_x g})$  

For the gradient magnitude, there's an alternative computation which is thresholded:  
$m(x, y) = \Theta(m'(x, y) - T)$ (only takes magnitudes that are sufficiently large)  

**TODO**: When and Why $m$ instead of $m'$?

**b)** Explain the fitness score $E_{S_i}$ and $E_l$. What do the arrows (CV-13 slide 11), e.g. $\beta_j$ and $S_j$, indicate? What is the idea of $G(d)$?

$E_{S_i}$ - Fitness score for pixel $S_i$ in search rectangle $R$: $E_{S_i} = |m(x, y) \cdot (\sin (\beta(x, y) - \alpha))|$
- fitness must be proportional to the gradient magnitude $m(x, y)$ (only strong gradients should contribute, not noise)
- $\beta$ (gradient orientation), $\alpha$ (direction of the line) - we take their difference
    - if the difference is $0$, it's bad, so we take the $\sin$ and get a fitness score of $0$
    - it would be perfect if they are perpendicular ($90°$), then the $\sin$ is $1$


$E_l$ - Total fitness score of line segment $l$: $E_l = \sum_{S_i \in R} E_{S_i} \cdot G_{\mu = 0, \sigma = W} (d_i)$
- sum of fitness scores over all pixels $S_i$ of search rectangle $R$
- weighted by Gaussian distance function
- why the search rectangle?
    - what we want is that pixels are on the line segments and that these belong to the edge
    - but we could not guide the search then
    - we have to look to the left and right of the line segment
    - pixels to the left and right contribute a little, but not as much as the ones perfectly on the line
    - that's why we have this Gaussian weighting ($\sigma$ is chosen appropriately based on the width of the rectangle)
    - if we have a certain fitness, the fitness score $E_{S_j}$ of pixel $S_j$ can be improved by moving the line segment in the direction $\beta_j$
- the Gaussian distance function is there to guide the search in the parameter space

**c)** Explain the goal of EDA (Estimation of Distribution Algorithm) and how it is performed in the context of the matching procedure.

The goal is to optimize the $15$ shape, position, and pose parameters to fit the local gradients.

We start with a generation of individuals (several wireframe models) which have different position, pose, and shape parameters.  
This generation needs to be optimized. Each of the individuals is just a point in the 15D space,
so we start with a point cloud in the 15D space.

Now, the next generation will be sampled from this distribution and with a random process new individuals will be produced.  
We project those into the plane and compute the fitness scores.  
Finally, the selection comes into play: We choose a certain number of individuals with highest fitness scores  
and only from these compute the next generation (survival of the fittest).

Generation of new individuals: Gaussian density estimation of remaining point cloud.  
We use the density estimation as a biased
random number generator to produce new points. The parents give a bias to the offspring,  
but the offspring does not perfectly confirm to that. There is a chance for the next generation to be different (better).

The process works Iteratively until the specified stop criterion is met.

## Assignment 2: Histogram of Oriented Gradients (HOG) [5 points]

The *Histogram of Oriented Gradients (HOG)* applied in the initial step of the wireframe matching procedure is also applied in other computer vision algorithms, especially in the context of object recognition. This exercise will examine this tool in a bit more detail.

**a)** Explain the idea of the histogram of oriented gradients. How can it be applied to analyze images? Think how this idea may be used to recognize objects?

The technique counts occurrences of gradient orientations in an image, it's basically a feature descriptor that can be used for object detection.  

Gradients ($x$ and $y$ derivatives) of an image are useful because the magnitude of gradients is large around edges and corners  
(regions of abrupt intensity changes) and edges and corners provide a lot more information about object shape than flat regions.  

To calculate a HOG descriptor, we need to first calculate the horizontal and vertical gradients.  
This is easily achieved by filtering the image with the horizontal and vertical Sobel filters $[-1, 0, 1]$ and $[-1, 0, 1]^T$.  
Afterwards, we can get the magnitude and direction of the gradient using the formulas from the previous task.

At every pixel, the gradient has a magnitude and a direction. The next step is to create a histogram of gradients.  
A bin is selected based on the direction, and the vote (the value that goes into the bin) is selected based on the magnitude. 

Finally, we have all the information we need in a kind of compressed way. Such a histogram of oriented gradients can be  
further processed and become a feature vector that can for example be used in classification.  

**b)** The Scikit-image library provides the function [`hog`](https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_hog.html) that can compute histograms of oriented gradients and offers also an option to construct a visualization. Run the following code cell and then describe your observations in the text cell below:

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt
from skimage.feature import hog
from skimage import data, exposure
from skimage.transform import resize
import numpy as np
import imageio

image = imageio.imread('./images/truck.jpeg')
image = resize(image,(700,1000),preserve_range=True).astype(np.uint8)

fd, hog_image = hog(image,feature_vector=False,visualize=True, multichannel=True)

# Display the result
fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(8, 12))

ax1.axis('off')
ax1.imshow(image, cmap=plt.cm.gray)
ax1.set_title('Input image')

# Rescale histogram for better display
hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 10))

ax2.axis('off')
ax2.imshow(hog_image_rescaled, cmap=plt.cm.gray)
ax2.set_title('Histogram of Oriented Gradients')
plt.show()

bars = ax3.bar(np.linspace(0,180,fd.shape[-1]),fd[0,0,0,0],width=(180/fd.shape[-1]))

plt.tight_layout()
plt.show()

def on_press(event):
    """Mouse button press event handler
    Args:
    event: The mouse event
    """
    x, y = int(event.xdata)//8, int(event.ydata)//8
    
    cell_x = x - fd.shape[1] if x >= fd.shape[1] else 0
    x = min(x,fd.shape[1]-1)
    cell_y = y - fd.shape[0] if y >= fd.shape[0] else 0
    y = min(y,fd.shape[0]-1)
    ax3.clear()
    ax3.set_title(f"x={x} [{cell_x}], y={y} [{cell_y}], {fd.shape}")
    ax3.bar(np.linspace(0,180,fd.shape[-1]),fd[y,x,cell_y,cell_x],width=(180/fd.shape[-1]))
    fig.canvas.draw()

cid = fig.canvas.mpl_connect('button_press_event', on_press)

YOUR ANSWER HERE

**c)** Implement your own version of the histogram of oriented gradients function. You may proceed in the following steps:
1. Compute the gradient image and determine magnitude and direction of gradients.
2. Divide the image into cells and compute a weighted histogram for each cell.
3. Use the function to [`plt.quiver`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.quiver.html) to display your results.

In [None]:
# Step 1: compute gradients

%matplotlib notebook
import numpy as np
import imageio
import matplotlib.pyplot as plt
from skimage.color import rgb2gray


def image_gradients(image):
    """Compute gradient magnitudes and directions for a given image.
    
    Input:
        image: an numpy.ndarray of shape (HEIGHT, WIDTH)
    Result:
        magnitude, direction: two numpy.ndarrays of the same shape as image,
        holding gradient magnitudes and directions, respectively.
    """
    # Hint: you may ues the sobel function to obtain x- and y- gradients
    magnitude = np.zeros_like(image, dtype=np.float32)
    direction = np.zeros_like(image, dtype=np.float32)
    # YOUR CODE HERE
    
    
    
    
    return magnitude, direction

image = rgb2gray(imageio.imread('./images/car.png').astype(np.uint8))
magnitude, direction = image_gradients(image)

plt.figure(figsize=(8,3))
plt.gray()
plt.subplot(1,2,1); plt.title("Image")
plt.imshow(image)
plt.subplot(1,2,2); plt.title("Gradient magnitude")
plt.imshow(magnitude)
plt.show()

In [None]:
# Step 2: compute the histograms

def histogram_of_oriented_gradients(image, cell_size=(16,16), bins=9):
    """Compute histograms of oriented gradients for an image.
    Input:
        image: image: an numpy.ndarray of shape (HEIGHT, WIDTH)
        cell_size: the size of individual cells into which the image is divided
        bins: the number of bins per histogram
    Result:
        An np.ndarray of shape (CELL_ROWS, CELL_COLUMNS, BINS) containing
        the histograms for the individual cells
    """
    # Hint: you may use np.histogram() here
    rows, columns = image.shape[0]//cell_size[0], image.shape[1]//cell_size[1]
    hog = np.zeros((rows, columns, bins))
    magnitude, direction = image_gradients(image)
    # YOUR CODE HERE
    return hog

hog = histogram_of_oriented_gradients(image)

In [None]:
# Step 3: display your results

%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np

cell_size=(8,8)
hog = histogram_of_oriented_gradients(image, cell_size=cell_size)

plt.figure(figsize=(12,12))
plt.gca().set_aspect('equal')
plt.gca().invert_yaxis()
# YOUR CODE HERE
plt.show()

## Assignment 3: Understanding Optical Flow [4 Points]

**a)** What is *optical flow*? Explain the concept on an intuitive level. Contrast it with physical movement and visual displacement.

Optical flow is about understanding how things are moving in an image or a sequence of images (at the pixel level).  
It's essentially a vector that describes the detected motion for every pixel, not necessarily reflecting true motion in the real world.

The true displacement can not be detected, e.g. due to the aperture problem.

**b)** Explain the optical flow equation. What is that line depicted on (CV-13 slide 21)? What do different points on this line have in common?

$v_x \cdot g_x + v_y \cdot g_y + g_t = 0$

**c)** What is the aperture problem?

The aperture problem arises when you have not enough information to accurately figure out how the object is moving (e.g. when you can't see the edges).  
A typical example for the aperture problem ist the 'barber pole illusion'.

**d)** Execute the following demo. Vary the value for `direction` (valid values are `None`, `'horizontal'`, and `'vertical'`). What do you see? Discuss your observations in the text field below.

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np

# Choose one of the three directions:
direction = 'horizontal'
#direction = 'vertical'
#direction = None

image = np.ndarray((100,100,3), dtype=np.uint8)

def barbers_pole(image, time=0, direction=None):
    image[:,:] = (255,255,255)
    height, width = image.shape[:2]
    strip = width//4
    xx, yy = np.meshgrid(range(strip), range(height))
    image[yy,(xx + yy + time) % width] = (255,0,0)
    image[yy,(xx + yy + time + 2*strip)% width] = (0,0,255)
    if direction == 'vertical':
        image[:,:strip] = 0
        image[:,3*strip:] = 0
    elif direction == 'horizontal':
        image[:strip] = 0
        image[3*strip:] = 0

barbers_pole(image)

fig, ax = plt.subplots()
im = ax.imshow(image)
plt.show()

for i in range(500):
    ax.set_title(f"frame={i}")
    barbers_pole(image, i, direction=direction)
    im.set_data(image)
    fig.canvas.draw()

plt.close()

YOUR ANSWER HERE

## Assignment 4: Implementing Optical Flow [6 Points]

This exercise aims at obtaining the optical flow from a video. The following two cells provide code to create simple demo videos. You may use either of these cells, but be aware that the second video may result in large movies, requiring heavy computation (you may reduce duration or frame size). Hence you are recommended to start developing your code using the first video.

In [None]:
import numpy as np

def make_movie1(size=(20, 20), duration=50):
    """Create a small movie showing a moving dot.
    
    Result:
        a numpy.ndarray of shape (FRAMES, HEIGHT, WIDTH)
    """
    foreground = 0
    background = 1

    movie = np.ones(shape=(duration,)+size, dtype=np.float32) * background

    for t in range(duration):
        position = (t%size[0], t%size[1])
        movie[t, position[0], position[1]] = foreground
    return movie

In [None]:
%matplotlib notebook
import numpy as np
import imageio
import matplotlib.pyplot as plt
from skimage.color import rgb2gray

def make_movie2(duration = 400, show=False):
    """A simple movie created from the example from the lecture slides.

    Result:
        a numpy.ndarray of shape (FRAMES, HEIGHT, WIDTH)
    """
    image = imageio.imread('images/movie.png')
    car = image[200:300,340:480].copy()
    image[200:300,340:480] = 255

    if show:
        fig, ax = plt.subplots()
        plt.title(f"{image.shape}")
        im = ax.imshow(image)
        plt.show()

    movie = np.ndarray((duration,)+image.shape[:2], dtype=np.float32)

    for t in range(duration):
        # do not clear everything, just adapt the artists
        frame = image.copy()
        x, y = t, 200
        box = frame[y:y+car.shape[0],x:x+car.shape[1]] 
        box[car!=255] = car[car!=255]
        movie[t] = rgb2gray(frame)
        if show:
            ax.set_title(f"Creating movie frame={t+1}/{duration}")
            im.set_data(frame)
            fig.canvas.draw()

    return movie

In [None]:
# Select and display the video
%matplotlib notebook
import matplotlib.pyplot as plt
import time

def show_movie(movie, delay=0.1):
    """Show a movie using matplotlib.
    Arguments:
        movie: a numpy.ndarray of shape (FRAMES, HEIGHT, WIDTH)
        delay: time to sleep between frames (in seconds)
    """
    fig, ax = plt.subplots()
    plt.axis('off')
    plt.gray()
    im = ax.imshow(movie[0])

    for t, frame in enumerate(movie):
        ax.set_title(f"frame={t}")
        im.set_data(frame)
        fig.canvas.draw()
        time.sleep(0.1)
    plt.close()

movie = make_movie1()
show_movie(movie)

**a)** Explain the idea of the Horn-Schunck algorithm. What are the *intensity constancy assumption* and the *spatial motion constancy assumption* and how do they enter into the algorithm? Explain the the ideas and the individual steps for computing the optical flow. Then provide an implementation in the code cell below.

The Horn-Schunck algorithm is a basic method for optical flow detection from an image sequence.
It's a global method which introduces a global constraint of smoothness to solve the aperture problem.

**Intensity constancy assumption**  
If during the (small) time $t$ between frames $t$ and $\Delta t+1$ a pixel moves from $(x,y)$ to $(x+ \Delta x, y + \Delta y)$, then its intensity remains constant: $g(x, y, t) = g(x + \Delta x, y + \Delta y, t + \Delta t)$.
That means, we assume there is no change in intensity due to changes of illumination etc.

**Spatial motion constancy assumption**  
Adjacent pixels have the same optical flow. This holds for most pixels, because the area of moving edges is usually smaller than the area of objects.

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt


def horn_schunck(movie, iterations=10, λ=0.5):
    """The Horn-Schunck algorithm. 
    Input:
         movie: a numpy.ndarray of shape (FRAMES, HEIGHT, WIDTH)
         iterations: number of iterations to run
         λ: the lambda parameter of the algorithm (0<λ<=1).
    Output:
         v_x, v_y: two movies of the same shape as `movie`, 
                   providing the x and y component of the optical flow
    """
    v_x = np.zeros_like(movie)
    v_y = np.zeros_like(movie)

    # YOUR CODE HERE

    return v_x, v_y


movie = make_movie1()
v_x, v_y = horn_schunck(movie)

fig, (ax1, ax2) = plt.subplots(1,2,figsize=(8,4))
plt.gray()

ax1.set_title("Movie")
im_frame = ax1.imshow(movie[0], cmap='gray')

ax2.set_title("Optical flow")
ax2.set_aspect('equal')
ax2.invert_yaxis()
flow = ax2.quiver(np.arange(movie.shape[2]), np.arange(movie.shape[1]), -v_x[0], v_y[0], scale=2.0)

plt.tight_layout()
plt.show()

for t, frame in enumerate(movie):
    fig.suptitle(f"frame={t}")
    im_frame.set_data(movie[t])    
    flow.set_UVC(-v_x[t], v_y[t])
    fig.canvas.draw()
    time.sleep(0.1)
plt.close()

**b)** What is the idea of the Lucas-Kanade algorithm? Point out differences to the Horn-Schunck algorithm. Explain why the problem of overdetermination does occur and how the algorithm deals with that problem? Then implement the algorithm in the code cell below

YOUR ANSWER HERE

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt


def lucas_kanade(movie):
    """The Lucas-Kanade algorithm. 
    Input:
         movie: a numpy.ndarray of shape (FRAMES, HEIGHT, WIDTH)
    Output:
         v_x, v_y: two movies of the same shape as `movie`, 
                   providing the x and y component of the optical flow
    """
    v_x = np.zeros_like(movie)
    v_y = np.zeros_like(movie)

    # YOUR CODE HERE

    return v_x, v_y


movie = make_movie1()
v_x, v_y = lucas_kanade(movie)

fig, (ax1, ax2) = plt.subplots(1,2 ,figsize=(8,8))
plt.gray()

ax1.set_title("Movie")
im_frame = ax1.imshow(movie[0], cmap='gray')

ax2.set_title("Optical flow")
ax2.set_aspect('equal')
ax2.invert_yaxis()
flow = ax2.quiver(np.arange(movie.shape[2]), np.arange(movie.shape[1]), -v_x[0], v_y[0], scale=2.0)

plt.tight_layout()
plt.show()

for t, frame in enumerate(movie):
    fig.suptitle(f"frame={t}")
    im_frame.set_data(movie[t])    
    flow.set_UVC(-v_x[t], v_y[t])
    fig.canvas.draw()
    time.sleep(0.1)
plt.close()