# Vision

Using 2D projection to build 3D geometry

## World coordinate system

One point $\mathbf{P}$ the world coordinate is represented by its coordinates:

$\mathbf{P} = (X_w,Y_w,Z_w)$

<img src="https://learnopencv.com/wp-content/uploads/2020/02/world-camera-image-coordinates.png">

[from https://learnopencv.com]

## Camera coordinate system

The same point $\mathbf{P}$ can also be represented in a coordinate system linked to the camera

$\mathbf{P} = (X_c,Y_c,Z_c)$

The camera is considered as a solid, with a center of projection $O_c$, its relative position is defined by a translation $\mathbf{T}$ and a rotation $\mathbf{R}$

$\mathbf{T} = (t_X,t_Y,t_Z)$

The rotation $\mathbf{R}$ in the 3D space is defined by three values (e.g.Euler's angles) but is represented in the form of a 3x3 matrix. All 3D rotation can be represented as a 3x3 matrix but not all 3x3 matrix are rotations. 


example:

$\mathbf{R} = R_z(\alpha) \, R_y(\beta) \, R_x(\gamma)$

$R_z(\alpha) = \begin{bmatrix}
    \cos \alpha & -\sin \alpha & 0 \\
    \sin \alpha &  \cos \alpha & 0 \\
              0 &            0 & 1 \\
  \end{bmatrix}$
  
$R_y(\beta)=\begin{bmatrix}
     \cos \beta & 0 & \sin \beta \\
              0 & 1 &          0 \\
    -\sin \beta & 0 & \cos \beta \\
  \end{bmatrix}$
  
$R_x(\gamma)=\begin{bmatrix}
    1 &  0          &            0 \\
    0 & \cos \gamma & -\sin \gamma \\
    0 & \sin \gamma &  \cos \gamma \\
  \end{bmatrix}$
  
$\mathbf{R} = \begin{bmatrix}
        \cos\alpha\cos\beta &
          \cos\alpha\sin\beta\sin\gamma - \sin\alpha\cos\gamma &
          \cos\alpha\sin\beta\cos\gamma + \sin\alpha\sin\gamma \\
        \sin\alpha\cos\beta &
          \sin\alpha\sin\beta\sin\gamma + \cos\alpha\cos\gamma &
          \sin\alpha\sin\beta\cos\gamma - \cos\alpha\sin\gamma \\
       -\sin\beta & \cos\beta\sin\gamma & \cos\beta\cos\gamma \\
\end{bmatrix}$

One can express indeferently the coordinate of a point in both world or camera coordinate system using:

$\begin{bmatrix}
    X_c  \\
    Y_c  \\
    Z_c  \\
  \end{bmatrix} = \mathbf{R} \begin{bmatrix}
    X_w  \\
    Y_w  \\
    Z_w  \\
  \end{bmatrix} + \mathbf{T}$

## Extrinsic parameters

The combination of translation and rotation is called the **extrinsic parameters** of the camera. Using the homogeneous coordinates:


$\begin{bmatrix}
    X_c  \\
    Y_c  \\
    Z_c  \\
    1
  \end{bmatrix} = \begin{bmatrix}
\mathbf{R} | \mathbf{T}\end{bmatrix}
    \begin{bmatrix}
    X_w  \\
    Y_w  \\
    Z_w  \\
    1
  \end{bmatrix}$
  
  and
  
 $\begin{bmatrix}
\mathbf{R} | \mathbf{T}\end{bmatrix} = \begin{bmatrix}\mathbf{R}_{3 \times 3} & \mathbf{T}_{3 \times 1} \\
0_{1 \times 3} & 1\end{bmatrix}_{4 \times 4}$

## Image coordinate system

Once projected on the sensor, the light coming from the scene is projected on a 2D surface. The camera has its optical center in $O_c$, and is looking at $\mathbf{P}=(X_c,Y_c,Z_c)$. The optical axis of the camera is arbitrarily pointing in the $Z_c$ direction.




### The pinhole camera model

<img src="https://upload.wikimedia.org/wikipedia/en/f/f8/Pinhole2.svg">

[from wikimedia]

The $(x,y)$ position of the pont $\mathbf{P}$ projected on the sensor is given by:

$x=f\frac{X_c}{Z_c}$

$y=f\frac{Y_c}{Z_c}$

where $f$ is the focal length of the camera optics. 
In the matrix form it can be written as:

$\begin{bmatrix}
    x'  \\
    y' \\
    z'  
  \end{bmatrix}=\mathbf{K}\begin{bmatrix}
    X_c  \\
    Y_c \\
    Z_c  
  \end{bmatrix}$
  
  with $x = \frac{x'}{z'}$ and $y = \frac{y'}{z'}$
  
  and 
  
  $\mathbf{K}=\begin{bmatrix}
    f &  0          &            0 \\
    0 & f & 0 \\
    0 & 0 & 1 \\
  \end{bmatrix}$
  
  The above 3x3 matrix is called **intrinsic matrix** of the camera, in this case for a *pinhole camera*.

## Intrinsic parameters

Of course not all the camera are ideal pinhole camera. In general pixels might not be squared, meaning that two different focal length should be considered, $\mathbf{K}$ becomes:

$\mathbf{K}=\begin{bmatrix}
    fx &  0          &            0 \\
    0 & fy & 0 \\
    0 & 0 & 1 \\
  \end{bmatrix}$

and of course the $(0,0)$ coordinate on the sensor might not be equal to the intersection of the $Z_c$ axis and the projection plane, if the optical center is $(c_x,c_y)$, the matrix becomes:

$\mathbf{K}=\begin{bmatrix}
    fx &  0          &            c_x \\
    0 & fy & c_y \\
    0 & 0 & 1 \\
  \end{bmatrix}$

and the camera sensor might also have a skew, between the x and y axis.
The intrinsic matrix becomes:

$\mathbf{K}=\begin{bmatrix}
    fx &  \gamma          &            c_x \\
    0 & fy & c_y \\
    0 & 0 & 1 \\
  \end{bmatrix}$


## From the world coordinate to the pixel

So wrapping all together, we have:

$\begin{bmatrix}
x'\\
y'\\
z'\end{bmatrix}=K\, \begin{bmatrix}
R | T\end{bmatrix}\begin{bmatrix}
X_{w}\\
Y_{w}\\
Z_{w}\\
1\end{bmatrix}
=M \begin{bmatrix}
X_{w}\\
Y_{w}\\
Z_{w}\\
1\end{bmatrix}$

where $M = K\, \begin{bmatrix} R | T\end{bmatrix}$

$K=\begin{bmatrix}
f_{x} & \gamma & c_{x} & 0\\
0 & f_{y} & c_{y} & 0\\
0 & 0 & 1 &0 \end{bmatrix}$

is the **intrinsic matrix** using homogeneous coordinates

and $ \begin{bmatrix} R | T\end{bmatrix}$ the **extrinsic parameters**



## Camera calibration

Camera calibration is important to know the intrinsic parameters. Furthermore other distortions, due to imperfection of the sensor or the optics should be taken into account.  


<img src="https://upload.wikimedia.org/wikipedia/commons/8/80/Nikon_1_V1_%2B_Fisheye_FC-E9_01.jpg">

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Circular_fisheye_view_of_Oude_Kerk_Amsterdam_Daniel_D._Teoli_Jr..jpg/1024px-Circular_fisheye_view_of_Oude_Kerk_Amsterdam_Daniel_D._Teoli_Jr..jpg">

[from wikimedia]

Camera calibration can be done by acquiring known geometry from several angle, e.g. using a chessboard.

demo/calibration