# Camera Extrinsics and Intrinsics

This is from [this](https://youtu.be/DX2GooBIESs) video.

## Motivation

For estimating the geometry of the scene based on images, we need to understand the image acquisition.

## Coordinate Systems
- **World/Object coordinate system**, $S_{O}$

written as: $[X, Y, Z]^{T}$
- **Camera coordinate system**, $S_{k}$

written as: $[^{k}X,\,^{k}Y,\,^{k}Z]^{T}$
- **Image (Plane) coordinate system**, $S_{c}$

written as: $[^{c}x,\,^{c}y]^{T}$
- **Sensor coordinate system**, $S_{s}$

written as: $[^{s}x,\,^{s}y]^{T}$

## Transformation

We want to compute the mapping,
$$
\begin{bmatrix}^{s}x\\^{s}y\\1\end{bmatrix}=\,^{s}H_{c}\,^{c}H_{k}\,^{k}H_{O}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}\text{,}
$$
where the left hand side of the equation is the sensor system, $S_{s}$, and the right hand side of the equation is constructed with the object coordinate system, $S_{O}$, and different mappings: the image coordinate system to the sensor coordinate system, $^{s}H_{c}$, the camera coordinate system to the image coordinate system, $\,^{c}H_{k}$, and the object coordinate system to the camera coordinate system $\,^{k}H_{O}$.

![CameraIntrinsics-01](assets/CameraIntrinsics-01.png)

After looking at the image, we see that the *directions* of $x$ and $y$ in the camera and image coordinate systems are the same. The only difference is that the image coordinate system has a different origin. This origin let's us explain that the image coordinates are some distance, $c$, away from the camera:
$$
^{k}O_{c}=\,^{k}[0, 0, -c]^{T}\text{.}
$$
![CameraIntrinsics-02](assets/CameraIntrinsics-02.png)

    TODO
    - Create block diagrams for the different transformations

## Extrinsic and Intrinsic Parameters

**Extrinsic parameters** describe the pose of the camera in the real world.

**Intrinsic parameters** describe the mapping of the scene in front of the camera to the pixels in the image.

## Extrinsic Parameters

The extrinsic parameters express the pose of the camera in the real world. This pose consists of the position and heading (direction) of the camera with respect to the world. This can be expressed as a rigid body transformation, and this transformation is invertible.

We can express the transformation with 6 variables: 3 for the position and 3 for the heading.

A point, $\mathcal{P}$, can be expressed with coordinates in the world coordinates,
$$\boldsymbol{X}_{\mathcal{P}} = [X_{\mathcal{P}}, Y_{\mathcal{P}}, Z_{\mathcal{P}}]^{T}\text{,}$$
while the origin of the camera frame, $O$, can be expressed in the world coordinates,
$$
\boldsymbol{X}_{O} = [X_{O}, Y_{O}, Z_{O}]^{T}\text{.}
$$

## Transformation

The camera coordinate system can be transformed into the object coordinate system. This transformation has both a translation and a rotation. The translation is between the origin of the world frame and the camera frame:
$$
\boldsymbol{X}_{O} = [X_{O}, Y_{O}, Z_{O}]^{T}\text{.}
$$
The rotation, $R$, is from the object coordinate system, $S_{O}$, to the camera coordinate system, $S_{k}$. Using Euclidean coordinates, this yields
$$
^{k}\boldsymbol{X}_{\mathcal{P}} = R(\boldsymbol{X}_{\mathcal{P}} - \boldsymbol{X}_{O})\text{.}
$$

So the point, $\boldsymbol{X}_{\mathcal{P}}$, is translated from the camera origin, $\boldsymbol{X}_{O}$, and rotated some amount. This allows us to map the point from the object coordinate system, $S_{O}$, to the camera coordinate system, $S_{k}$.

## Transformation in Homogeneous Coordinates

We can express the Euclidean transformation in homogeneous coordinates:
$$
\begin{align}
\begin{bmatrix}^{k}\boldsymbol{X}_{\mathcal{P}}\\\boldsymbol{1}\end{bmatrix} =& \begin{bmatrix}R&\boldsymbol{0}\\\boldsymbol{0}^{T}&1\end{bmatrix}
\begin{bmatrix}I_{3}&-\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix} \begin{bmatrix}\boldsymbol{X}_{\mathcal{P}}\\\boldsymbol{1}\end{bmatrix}\\
=& \begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix} \begin{bmatrix}\boldsymbol{X}_{\mathcal{P}}\\\boldsymbol{1}\end{bmatrix}\text{.}
\end{align}
$$

Another way to write this equation would be 
$$^{k}\boldsymbol{\mathrm{X}}_{\mathcal{P}} =\,^{k}\mathcal{H}\,\boldsymbol{\mathrm{X}}_{\mathcal{P}}$$
with
$$^{k}\mathcal{H} = \begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix}\text{.}
$$

Note that the left hand side of the equation is in homogeneous coordinates.

## Intrinsic Parameters

    TODO
    - Create block diagrams for the different transformations

## Ideal Perspective Projection

The mapping can be split into 3 steps:
1. Ideal perspective projection to the image plane
2. Mapping to the sensor coordinate frame (pixels)
3. Compensation for the fact that the two previous maps are idealized

## Coordinate Frame

There are two ways to explain the perspective the camera has with respect to the object in the object coordinate system. The first is the physically motivated coordinate frame where the distance, $c$, is positive.
![CoordinateFrame-01](assets/CoordinateFrame-01.png)

The other framing of the coordinates is where the distance, $c$, is negative.
![CoordinateFrame-02](assets/CoordinateFrame-02.png)

The coordinate frame where the distance is negative is the most commonly. Both use the same methods, but it is important to show this in order to get a firm understanding of how the point is framed with respect to the camera.


## Ideal Perspective Projection

We have many assumptions to idealize the camera's perspective. The first assumption is that we are using a distortion-free lens. This allows us to assume that the camera's coordinate system is consistent. The second assumption is that the focal point, $\mathcal{F}$, and the principal point, $\mathcal{H}$, are on the optical axis. The last assumption is that the distance from the camera origin to the image plane is constant, $c$.

We can find the projected point, $\overline{\mathcal{P}}$, through the intercept theorem. The intercept theorem uses the image plane spanned by the coordinates $^{c}x_{\overline{\mathcal{P}}}$ and $^{c}x_{\overline{\mathcal{P}}}$:
$$
\begin{align}
^{c}x_{\overline{\mathcal{P}}}:=\,^{k}X_{\overline{\mathcal{P}}} =& c\frac{^{k}X_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}\\
^{c}y_{\overline{\mathcal{P}}}:=\,^{k}Y_{\overline{\mathcal{P}}} =& c\frac{^{k}Y_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}
\end{align}
$$
where
$$c =\,^{k}Z_{\overline{\mathcal{P}}}= c\frac{^{k}Z_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}\text{.}
$$

## In Homogeneous Coordinates

$$
\begin{bmatrix}
^{k}U_{\overline{\mathcal{P}}}\\
^{k}V_{\overline{\mathcal{P}}}\\
^{k}W_{\overline{\mathcal{P}}}\\
^{k}T_{\overline{\mathcal{P}}}
\end{bmatrix}=
\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&c&0\\
0&0&0&1
\end{bmatrix}=
\begin{bmatrix}
^{k}X_{\mathcal{P}}\\
^{k}Y_{\mathcal{P}}\\
^{k}Z_{\mathcal{P}}\\
1
\end{bmatrix}
$$

$$
^{c}\mathrm{x}_{\overline{\mathcal{P}}}=
\begin{bmatrix}
^{k}u_{\overline{\mathcal{P}}}\\
^{k}v_{\overline{\mathcal{P}}}\\
^{k}w_{\overline{\mathcal{P}}}
\end{bmatrix}=
\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}=
\begin{bmatrix}
^{k}X_{\mathcal{P}}\\
^{k}Y_{\mathcal{P}}\\
^{k}Z_{\mathcal{P}}\\
1
\end{bmatrix}
$$

Thus, we can write for any point
$$
^{c}x_{\overline{\mathcal{P}}} =\,^{c}P_{k}
\,^{k}\boldsymbol{\mathrm{X}}_{\mathcal{P}}
$$
with
$$
^{c}P_{k}=\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}
$$

After making all of the assumptions of the "ideal camera," we can map the different coordinates using both the intrinsic and extrinsic parameters:
$$^{c}\boldsymbol{\mathrm{x}}=\,^{c}\mathrm{P}\,\boldsymbol{\mathrm{X}}$$
with
$$
^{c}\mathrm{P}=\,^{c}\mathrm{P}_{k}\,^{k}\mathrm{H}=\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}
\begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix}
$$

## Calibration Matrix

This now leads us to the **calibration matrix** of an ideal camera:
$$^{c}\mathrm{K}=\begin{bmatrix}c&0&0\\0&c&0\\0&0&1\end{bmatrix}\text{.}$$

This **calibration matrix** can be used to map the different coordinate systems. The overall mapping is
$$
^{c}\mathrm{P}=\,^{c}\mathrm{K}[R|-R\boldsymbol{X}_{O}]=\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]
$$
where the result is a $3\times4$ matrix.

So the projection, $$^{c}\mathrm{P}=\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\text{,}$$
helps us map the point in the object coordinate system to the point in the image plane:
$$^{c}\mathrm{x}\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\boldsymbol{\mathrm{X}}\text{.}$$

The process yields the coordinates of the point, $^{c}\boldsymbol{\mathrm{x}}$:
$$
\begin{bmatrix}^{c}u^{\prime}\\^{c}v^{\prime}\\^{c}w^{\prime}\end{bmatrix} = \begin{bmatrix}c&0&0\\0&c&0\\0&0&1\end{bmatrix} 
\begin{bmatrix}r_{11}&r_{12}&r_{13}\\r_{21}&r_{22}&r_{23}\\r_{31}&r_{32}&r_{33}\end{bmatrix} 
\begin{bmatrix}X-X_{O}\\Y-Y_{O}\\Z-Z_{O}\end{bmatrix}\text{.}
$$

## Calibration Matrix (Euclidean Coordinates)

The solution for the point's coordinates in the image coordinate system produces the **collinearity equation**:
$$
\begin{align}
^{c}x=&\,c\frac{r_{11}(X-X_{O})+r_{12}(Y-Y_{O})+r_{13}(Z-Z_{O})}{r_{31}(X-X_{O})+r_{32}(Y-Y_{O})+r_{33}(Z-Z_{O})}\\
^{c}y=&\,c\frac{r_{21}(X-X_{O})+r_{22}(Y-Y_{O})+r_{23}(Z-Z_{O})}{r_{31}(X-X_{O})+r_{32}(Y-Y_{O})+r_{33}(Z-Z_{O})}
\end{align}
$$