# Camera Calibration

## Image Formation: From 3D point to 2D pixel position

### Homogenous Coordinates 

For reasons that will be explained later, we will use homogenous coordinates to express the pixel coordinates and 3D object space coordinates. This means we will simply add a 1 to the bottom of the column for each coordinate. This has to do with ambiguity in the relation from a pixel coordinate to the point in 3D object space. We don't know exactly how far along the ray direction from the camera to go; we only know the direction of the ray. 

In pixel space, we write this as: 
$[x_{p}, y_{p}, 1]^{T}$

In object space, we write this as: 
$[X_{c}, Y_{c}, Z_{c}, 1]^{T}$ 

We use the subscript $p$ for pixel space, and the subscript $c$ for camera object space. 

### Geometry of Image Formation: Focal Length 

We can look at the figure of how a pixel position appears in image space compared to where it appears in 3D object space. 

Using similar triangles, we see the pixel position is related to the focal length and distance from the camera. 

We have 

$x_{p} = f \frac{X_{c}}{Z_{c}}$ and $ y_{p} = f \frac{Y_{c}}{Z_{c}}$. 

This is unfortunately a non-linear transformation since we are dividing by $Z_c$. In order to have a linear transformation, we use our homogeneous coordinate representation. We can then use express this in matrix form. 

We can express this in matrix form as: 

\begin{equation}
\begin{bmatrix}
x'_{p} \\
y'_{p} \\
Z_{c} 
\end{bmatrix} = 
\begin{bmatrix}
f & 0 & 0 & 0 \\
0 & f & 0 & 0 \\ 
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
X_{c} \\
Y_{c} \\
Z_{c} \\
1 
\end{bmatrix}
\end{equation}

In order to recover the point in Euclidean image space, we divide by the $Z_{c}$, taking us to a homogenous form: 

\begin{equation}
\begin{bmatrix}
x'_{p} \\
y'_{p} \\
Z_{c} 
\end{bmatrix} \rightarrow 
\begin{bmatrix}
x_{p} \\
y_{p} \\
1 
\end{bmatrix}
\end{equation} 

where we have $x_{p} = \frac{x'_{p}}{Z_{c}}$, etc. 

### Adjusting for Principal Point

We need to adjust for the fact that the center of the image is not where we want to have the origin, we want the origin to be at the bottom left corner of the image. To adjust for this, we express

$x_{p} = f \frac{X_{c}}{Z_{c}} + c_{x}$ and $ y_{p} = f \frac{Y_{c}}{Z_{c}}+c_{y}$

where $c_{x}$ and $c_{y}$ are offsets in image space to adjust for origin. Our matrix expression now takes the form: 


\begin{equation}
\begin{bmatrix}
x'_{p} \\
y'_{p} \\
Z_{c} 
\end{bmatrix} = 
\begin{bmatrix}
f & 0 & c_{x} & 0 \\
0 & f & c_{y} & 0 \\ 
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
X_{c} \\
Y_{c} \\
Z_{c} \\
1 
\end{bmatrix}
\end{equation}

Notice this would now give
$x'_{p} = f X_{c} + c_{x} Z_{c} $, but when dividing by $Z_{c}$ to get our homogenous form, we'd obtain
$x'_{p} = f \frac{X_{c}}{Z_{c}} + c_{x} $ which is what we expect in image space. 

### Adjusting for Meter to Pixel factor

Note that every value in object space is in meters. We need values in pixel coordinates. We need a conversion from meters to pixels. This can be handled as follows. 

$x_{p} = f k \frac{X_c}{Z_c} + c_{x} $ 

Note we first convert the meters to pixels, and then add on the pixel offset. The conversion factor $k$ has dimensions of $pixel/meter$. We will have a similar conversion in the y direction. The conversion factor need not be the same in both x and y. 

We can now account for this conversion in matrix formation as follows. 

\begin{equation}
\begin{bmatrix}
x'_{p} \\
y'_{p} \\
Z_{c} 
\end{bmatrix} = 
\begin{bmatrix}
\alpha & 0 & c_{x} & 0 \\
0 & \beta & c_{y} & 0 \\ 
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
X_{c} \\
Y_{c} \\
Z_{c} \\
1 
\end{bmatrix}
\end{equation}

where $\alpha = f k $ and $\beta$ is defined in a similar manner. 

### Adjusting for Skew 

In some cases, there may be skew to be accounted for. For instance, the camera plane may not be aligned properly. We will not get into details, but this can be expressed in matrix form as follows. 

\begin{equation}
\begin{bmatrix}
x'_{p} \\
y'_{p} \\
Z_{c} 
\end{bmatrix} = 
\begin{bmatrix}
\alpha & s & c_{x} & 0 \\
0 & \beta & c_{y} & 0 \\ 
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
X_{c} \\
Y_{c} \\
Z_{c} \\
1 
\end{bmatrix}
\end{equation}

### Intrinsics 

We finally can define our camera matrix as 

\begin{equation}
K = \begin{bmatrix}
\alpha & s & c_{x} & 0 \\
0 & \beta & c_{y} & 0 \\ 
0 & 0 & 1 & 0
\end{bmatrix}
\end{equation}

This matrix represents the intrinsic parameters of the camera. 

Our pixel position will be defined by dividing by the $Z_c$ value. 

## Coordinate Transformations: World to Camera System

### Transform World Point to Camera System

We need to understand that we may not necessarily and most likely will not define 3D object points in terms of the camera coordinate system, but rather a different world coordinate system. 

We can express the transformation as 

\begin{equation}
\mathbf{X}_{c} = 
\begin{bmatrix}
\mathbf{R} & \mathbf{T} \\ 
0 & 1
\end{bmatrix}
\mathbf{X}_{w} 
\end{equation}

where $R$ is a $3\times 3$ rotation matrix and $T$ is a $3 \times 1$ column vector. The subscript $w$ refers to world coordinate system. Reference image for clarity. 


### Extrinsics 

This matrix that includes the rotation and translation referes to extrinsics of our system. 

## Full Projection Matrix 

We can now define our full projection matrix to take us from a world coordinate system to image pixel space. 

\begin{equation}
\mathbf{x}'_{p} = \mathbf{K} \begin{bmatrix}
\mathbf{R} & \mathbf{T} \\ 
0 & 1
\end{bmatrix} 
\mathbf{X}_{w}
\end{equation}

We divide $\mathbf{x}'_{p}$ by the last element in the column to get our pixel position, $\mathbf{x}_{p}$. 

The matrix here is termed the projection matrix. 
\begin{equation}
\mathbf{P}= \mathbf{K} \begin{bmatrix}
\mathbf{R} & \mathbf{T} \\ 
0 & 1
\end{bmatrix} 
\end{equation}

## Homography 

### World Points on a Plane

### Number of Correspondences 