# Multi-view Geometry

## Camera Intrinsics and Extrinsics

### Basics
The projection is shown as below, where $S$ is the sensor frame, $I$ is the image frame, $C$ is the camera frame, and $W$ is the world frame. $I$ frame normally has only a constant offset to $C$, which is so called camera constant.

$$
\begin{align}
\begin{bmatrix} u \\ v \\ 1
\end{bmatrix} = T_{SC} T_{IC} T_{CW} \begin{bmatrix} X \\ Y \\ Z \\ 1
\end{bmatrix}
\end{align}
$$

Extrinsic parameter is $T_{CW}$ which typically has 6 DoF - 3 for position and 3 for orientation. $O^W_C$ is the world origin expressed in the camera frame $C$. $O^C_W$ is the camera origin expressed in the world frame $W$.

$$
\begin{align}
T_{CW} = 
\begin{bmatrix}
R_{CW},& O^W_{C} \\
0,& 1
\end{bmatrix}
\end{align}
$$

$$
\begin{align}
T_{CW} = 
\begin{bmatrix}
R_{CW},& -R_{CW} O^C_W \\
0,& 1
\end{bmatrix}
\end{align}
$$

Intrinsic parameter is $T_{SC} T_{IC}$ without considering non-linear distortions. If we consider image plane is at the negative z direction. Then camera constant $c$ will be negative. $x_H$ and $y_H$ are translation from image frame center to sensor frame center. 

$$
\begin{align}
T_{SC} T_{IC} & = 
\begin{bmatrix}
1,& 0, & x_H \\
0,& 1, & y_H \\
0,& 0, & 1
\end{bmatrix}\begin{bmatrix}
c,& 0, & 0 \\
0,& c, & 0 \\
0,& 0, & 1
\end{bmatrix} \\ 
& = \begin{bmatrix}
1,& s, & x_H \\
0,& 1+m, & y_H \\
0,& 0, & 1
\end{bmatrix}\begin{bmatrix}
c,& 0, & 0 \\
0,& c, & 0 \\
0,& 0, & 1
\end{bmatrix}
\end{align}
$$

$$
\begin{align}
K = T_{SC} T_{IC} & = 
\begin{bmatrix}
c_x,& s_{xy}, & x_H \\
0,& c_y, & y_H \\
0,& 0, & 1
\end{bmatrix}
\end{align}
$$

Non-linear distortion could be caused by non-perfect lens so that each pixel projected onto sensor plane is shifted a little based on its position. $q$ are. the parameters for distortion models, such as barrel distortion, tangent distortion, etc.

$$
\begin{align}
x_a =& x_s + \Delta(x_s, q) \\
y_a =& y_s + \Delta(y_s, q)
\end{align}
$$

$$
\begin{align}
x_a = H(x_s)x_s
\end{align}
$$

### Mapping

#### Inverse map from $x_a$ to $x_s$

$$
\begin{align}
[x_{s}]_{i+1}= [H([x_{s}]_i)]^{-1}x_a
\end{align}
$$

#### Inverse map from $x_s$ to $X_c$

$$
\begin{align}
X_c = \lambda K^{-1}\begin{bmatrix} u \\ v \\ 1
\end{bmatrix}
\end{align}
$$
$$
\begin{align}
X_W = O_W^C + \lambda R_{CW}^{-1}K^{-1}\begin{bmatrix} u \\ v \\ 1
\end{bmatrix}
\end{align}
$$
Where $\lambda$ is the depth. 

In [37]:
# Mapping and inverse mapping

import numpy as np

print('Map world points to sensor frame')
p_w = np.array([1, 0, -5, 1])
print(f'p_w: {p_W}')
T_cw = np.array([[1, 0, 0, 1], 
                [0, 1, 0, 0],
                [0, 0, 1, 0],
                [0, 0, 0, 1]])
R_cw = T_cw[0:3, 0:3]
O_w = -R_cw.T.dot(T_cw[0:3, 3])

p_c = T_cw.dot(p_w)

depth = p_c[2]
p_c = p_c[0:3]                
print(f'p_c: {p_c}', f'depth: {depth}')

K = np.array([[-500, 0, 200], [0, -500, 200], [0, 0, 1]])
p_i = K.dot(p_c)
uv = np.array([p_i[0]/p_i[2], p_i[1]/p_i[2], 1])
print(f'p_s: {uv}')
print('\nMap sensor points to world frame')
ray = np.linalg.inv(K).dot(uv)
print(f'ray: {ray}')

p_c = depth * ray
print(f'p_c: {p_c}')

p_W = O_w + depth * R_cw.T.dot(ray)
print(f'p_w: {p_W}')
#

Map world points to sensor frame
p_w: [ 1.  0. -5.]
p_c: [ 2  0 -5] depth: -5
p_s: [400. 200.   1.]

Map sensor points to world frame
ray: [-0.4  0.   1. ]
p_c: [ 2. -0. -5.]
p_w: [ 1.  0. -5.]


### Calibration

#### DLT: Direct Linear Transform

Key ideas:
- Projection matrix $P = T_{SC} T_{IC} T_{CW}$ has 11 DoF.
- Given a 2D-to-3D correspondence, we can get two constraints for solving P.
- We need at least 6 correspondences so that we will have 12 constraints to solve $P$ with 11 DoF.
- Use SVD to find the solution - right singular vector with least singular value.
- $ P = [KR| -KRO_W^C]$, so once we get $P$, we can get $KR$ and $O_C^C$
- Use QR decomposition on $(KR)^{-1}$ to find $K$ and $R$

#### Zhang's homography approach

Key ideas:
- Use planer object so that $Z_W$ is always 0. 
- Instead of getting $P$, we are solving equations to get homography matrix $H$. 
- We need at least 4 points on each image to solve $H$.
- Each homography solution gives 2 constraints on $K^TK$.
- $K^TK$ is a symmetric matrix with 6 DoF. So we need at least 3 images with each giving us 2 constraints on $K^TK$.
- Find $K^TK$, then do Choleskey decomposition to find $K$.

#### Non-linear Optimization with Gauss-Newton or LM 

Key ideas:
- Use Zhang's approach for initialization.
- $K, q, R_i, t_i = \underset{K, q, R_i, t_i}{\operatorname{argmin}} {\sum_n\sum_i}\|x_{ni} - \hat{x}(K, q, R_i, t_i, X_{ni})\|^2$.
- Note that we can set the planar object corner as world origin (0, 0, 0).