# Camera Extrinsics and Intrinsics

This is from [this](https://youtu.be/DX2GooBIESs) video.

## Motivation

For estimating the geometry of the scene based on images, we need to understand the image acquisition.

## Coordinate Systems
- **World/Object coordinate system**, $S_{O}$

written as: $[X, Y, Z]^{T}$
- **Camera coordinate system**, $S_{k}$

written as: $[^{k}X,\,^{k}Y,\,^{k}Z]^{T}$
- **Image (Plane) coordinate system**, $S_{c}$

written as: $[^{c}x,\,^{c}y]^{T}$
- **Sensor coordinate system**, $S_{s}$

written as: $[^{s}x,\,^{s}y]^{T}$

## Transformation

We want to compute the mapping,
$$
\begin{bmatrix}^{s}x\\^{s}y\\1\end{bmatrix}=\,^{s}H_{c}\,^{c}H_{k}\,^{k}H_{O}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}\text{,}
$$
where the left hand side of the equation is the sensor system, $S_{s}$, and the right hand side of the equation is constructed with the object coordinate system, $S_{O}$, and different mappings: the image coordinate system to the sensor coordinate system, $^{s}H_{c}$, the camera coordinate system to the image coordinate system, $\,^{c}H_{k}$, and the object coordinate system to the camera coordinate system $\,^{k}H_{O}$.

![CameraIntrinsics-01](assets/CameraIntrinsics-01.png)

After looking at the image, we see that the *directions* of $x$ and $y$ in the camera and image coordinate systems are the same. The only difference is that the image coordinate system has a different origin. This origin let's us explain that the image coordinates are some distance, $c$, away from the camera's sensor:
$$
^{k}O_{c}=\,^{k}[0, 0, -c]^{T}\text{.}
$$
![CameraIntrinsics-02](assets/CameraIntrinsics-02.png)

## From the World to the Sensor

We would like to convert the object coordinate system, $S_{O}$, into the sensor coordinate system, $S_{s}$. We can use multiple transformations to obtain these results.
![BlockDiagram-01](assets/BlockDiagram-01.png)
The first transformation is from the object coordinate system, $S_{O}$, to the camera coordinate system, $S_{k}$.
![BlockDiagram-02](assets/BlockDiagram-02.png)
The second transformation if from the camera coordinate system, $S_{k}$, to the image coordinate system, $S_{c}$. Since this is converting our 3D points to 2D points, we will need to make some assumptions.
![BlockDiagram-03](assets/BlockDiagram-03.png)
Then we can map the image coordinate system, $S_{c}$, to the sensor coordinate system, $S_{s}$, through an affine transformation.
![BlockDiagram-04](assets/BlockDiagram-04.png)
Eventually, we will find some non-linear errors. We will need to apply an additional transformation to account for these deviations.
![BlockDiagram-05](assets/BlockDiagram-05.png)

## Extrinsic and Intrinsic Parameters

**Extrinsic parameters** describe the pose of the camera in the real world.
![BlockDiagram-06](assets/BlockDiagram-06.png)

**Intrinsic parameters** describe the mapping of the scene in front of the camera to the pixels in the image.
![BlockDiagram-07](assets/BlockDiagram-07.png)

## Extrinsic Parameters

The extrinsic parameters express the pose of the camera in the real world. This pose consists of the position and heading (direction) of the camera with respect to the world. This can be expressed as a rigid body transformation, and this transformation is invertible.

We can express the transformation with 6 variables: 3 for the position and 3 for the heading.

A point, $\mathcal{P}$, can be expressed with coordinates in the world coordinates,
$$\boldsymbol{X}_{\mathcal{P}} = [X_{\mathcal{P}}, Y_{\mathcal{P}}, Z_{\mathcal{P}}]^{T}\text{,}$$
while the origin of the camera frame, $O$, can be expressed in the world coordinates,
$$
\boldsymbol{X}_{O} = [X_{O}, Y_{O}, Z_{O}]^{T}\text{.}
$$

## Transformation

The camera coordinate system can be transformed into the object coordinate system. This transformation has both a translation and a rotation. The translation is between the origin of the world frame and the camera frame:
$$
\boldsymbol{X}_{O} = [X_{O}, Y_{O}, Z_{O}]^{T}\text{.}
$$
The rotation, $R$, is from the object coordinate system, $S_{O}$, to the camera coordinate system, $S_{k}$. Using Euclidean coordinates, this yields
$$
^{k}\boldsymbol{X}_{\mathcal{P}} = R(\boldsymbol{X}_{\mathcal{P}} - \boldsymbol{X}_{O})\text{.}
$$

So the point, $\boldsymbol{X}_{\mathcal{P}}$, is translated from the camera origin, $\boldsymbol{X}_{O}$, and rotated some amount. This allows us to map the point from the object coordinate system, $S_{O}$, to the camera coordinate system, $S_{k}$.

## Transformation in Homogeneous Coordinates

We can express the Euclidean transformation in homogeneous coordinates:
$$
\begin{align}
\begin{bmatrix}^{k}\boldsymbol{X}_{\mathcal{P}}\\1\end{bmatrix} =& \begin{bmatrix}R&\boldsymbol{0}\\\boldsymbol{0}^{T}&1\end{bmatrix}
\begin{bmatrix}I_{3}&-\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix} \begin{bmatrix}\boldsymbol{X}_{\mathcal{P}}\\1\end{bmatrix}\\
=& \begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix} \begin{bmatrix}\boldsymbol{X}_{\mathcal{P}}\\1\end{bmatrix}\text{.}
\end{align}
$$

Another way to write this equation would be 
$$^{k}\boldsymbol{\mathrm{X}}_{\mathcal{P}} =\,^{k}\mathcal{H}\,\boldsymbol{\mathrm{X}}_{\mathcal{P}}$$
with
$$^{k}\mathcal{H} = \begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix}\text{.}
$$

Note that the left hand side of the equation is in homogeneous coordinates.

## Intrinsic Parameters

With the camera's extrinsic parameters out of the way, we can find the intrinsic parameters by using the projecting points from the camera frame to the camera's sensor.

![BlockDiagram-07](assets/BlockDiagram-07.png)

Here, we see that the transformation from the image coordinate system, $S_{c}$, to the sensor coordinate system, $S_{s}$, and the final deviation transformation are invertible. This means that the transformation can applied in both directions. We also see that the transformation from the camera coordinate system, $S_{k}$, to the image coordinate system, $S_{c}$, is not invertible. This means that we can only transform the points from $S_{k}$ to $S_{c}$ and not the other way around. *We lose information with this transformation.*

## Coordinate Frame

    TODO
    - Rename and add figure for "ideal camera"

There are two ways to explain the perspective the camera has with respect to the object in the object coordinate system. The first is the physically motivated coordinate frame where the distance, $c$, is positive.
![CoordinateFrame-01](assets/CoordinateFrame-01.png)

The other framing of the coordinates is where the distance, $c$, is negative.
![CoordinateFrame-02](assets/CoordinateFrame-02.png)

The coordinate frame where the distance is negative is the most commonly. Both use the same methods, but it is important to show this in order to get a firm understanding of how the point is framed with respect to the camera.


## Ideal Perspective Projection

The mapping can be split into 3 steps:
1. Ideal perspective projection to the image plane
2. Mapping to the sensor coordinate frame (pixels)
3. Compensation for the fact that the two previous maps are idealized
![BlockDiagram-08](assets/BlockDiagram-08.png)

We have many assumptions to idealize the camera's perspective. The first assumption is that we are using a distortion-free lens. This allows us to assume that the camera's coordinate system is consistent. The second assumption is that the focal point, $\mathcal{F}$, and the principal point, $\mathcal{H}$, are on the optical axis. The last assumption is that the distance from the camera origin to the image plane is constant, $c$.
![BlockDiagram-09](assets/BlockDiagram-09.png)

We can find the projected point, $\overline{\mathcal{P}}$, through the intercept theorem. The intercept theorem uses the image plane spanned by the coordinates $^{c}x_{\overline{\mathcal{P}}}$ and $^{c}x_{\overline{\mathcal{P}}}$:
$$
\begin{align}
^{c}x_{\overline{\mathcal{P}}}:=\,^{k}X_{\overline{\mathcal{P}}} =& c\frac{^{k}X_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}\\
^{c}y_{\overline{\mathcal{P}}}:=\,^{k}Y_{\overline{\mathcal{P}}} =& c\frac{^{k}Y_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}
\end{align}
$$
where
$$c =\,^{k}Z_{\overline{\mathcal{P}}}= c\frac{^{k}Z_{\mathcal{P}}}{^{k}Z_{\mathcal{P}}}\text{.}$$

## In Homogeneous Coordinates

The projected point can be expressed in terms of Homogeneous coordinates:
$$
\begin{bmatrix}
^{k}U_{\overline{\mathcal{P}}}\\
^{k}V_{\overline{\mathcal{P}}}\\
^{k}W_{\overline{\mathcal{P}}}\\
^{k}T_{\overline{\mathcal{P}}}
\end{bmatrix}=
\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&c&0\\
0&0&0&1
\end{bmatrix}=
\begin{bmatrix}
^{k}X_{\mathcal{P}}\\
^{k}Y_{\mathcal{P}}\\
^{k}Z_{\mathcal{P}}\\
1
\end{bmatrix}\text{.}
$$
We can drop the third coordinate because of the projective nature of the transformation (we don't know how far away the object is from the camera):
$$
^{c}\mathrm{x}_{\overline{\mathcal{P}}}=
\begin{bmatrix}
^{k}u_{\overline{\mathcal{P}}}\\
^{k}v_{\overline{\mathcal{P}}}\\
^{k}w_{\overline{\mathcal{P}}}
\end{bmatrix}=
\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}=
\begin{bmatrix}
^{k}X_{\mathcal{P}}\\
^{k}Y_{\mathcal{P}}\\
^{k}Z_{\mathcal{P}}\\
1
\end{bmatrix}\text{.}
$$

Now, we can transform any point in the camera coordinate system, $^{k}\boldsymbol{\mathrm{X}}_{\mathcal{P}}$, with a projective transformation, $^{c}P_{k}$, into the image coordinate system:
$$
^{c}x_{\overline{\mathcal{P}}} =\,^{c}P_{k}
\,^{k}\boldsymbol{\mathrm{X}}_{\mathcal{P}}
$$
where
$$
^{c}P_{k}=\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}\text{.}
$$

After making all of the assumptions of the "ideal camera," we can map the different coordinates using both the intrinsic and extrinsic parameters. 
$$^{c}\boldsymbol{\mathrm{x}}=\,^{c}\mathrm{P}\,\boldsymbol{\mathrm{X}}$$
with
$$
^{c}\mathrm{P}=\,^{c}\mathrm{P}_{k}\,^{k}\mathrm{H}=\begin{bmatrix}
c&0&0&0\\
0&c&0&0\\
0&0&1&0
\end{bmatrix}
\begin{bmatrix}R&-R\boldsymbol{X}_{O}\\\boldsymbol{0}^{T}&1\end{bmatrix}
$$

## Calibration Matrix

This now leads us to the **calibration matrix** of an ideal camera:
$$^{c}\mathrm{K}=\begin{bmatrix}c&0&0\\0&c&0\\0&0&1\end{bmatrix}\text{.}$$

This **calibration matrix** can be used to map the different coordinate systems. The overall mapping is
$$
^{c}\mathrm{P}=\,^{c}\mathrm{K}[R|-R\boldsymbol{X}_{O}]=\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]
$$
where the result is a $3\times4$ matrix:
$$^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}] =\,^{c}\mathrm{K}R\begin{bmatrix}1&0&0&-X_{O}\\0&1&0&-Y_{O}\\0&0&1&-Z_{O}\\\end{bmatrix}\text{.}$$

So the projection, $$^{c}\mathrm{P}=\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\text{,}$$
helps us map the point in the object coordinate system, $\boldsymbol{\mathrm{X}}$, to the point in the image plane:
$$^{c}\mathrm{x}\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\boldsymbol{\mathrm{X}}\text{.}$$

The process yields the coordinates of the point in the image plane, $^{c}\boldsymbol{\mathrm{x}}$:
$$
\begin{bmatrix}^{c}u^{\prime}\\^{c}v^{\prime}\\^{c}w^{\prime}\end{bmatrix} = \begin{bmatrix}c&0&0\\0&c&0\\0&0&1\end{bmatrix} 
\begin{bmatrix}r_{11}&r_{12}&r_{13}\\r_{21}&r_{22}&r_{23}\\r_{31}&r_{32}&r_{33}\end{bmatrix} 
\begin{bmatrix}X-X_{O}\\Y-Y_{O}\\Z-Z_{O}\end{bmatrix}\text{.}
$$

## Calibration Matrix (Euclidean Coordinates)

The solution for the point's coordinates in the image coordinate system produces the **collinearity equation**:
$$
\begin{align}
^{c}x=&\,c\frac{r_{11}(X-X_{O})+r_{12}(Y-Y_{O})+r_{13}(Z-Z_{O})}{r_{31}(X-X_{O})+r_{32}(Y-Y_{O})+r_{33}(Z-Z_{O})}\\
^{c}y=&\,c\frac{r_{21}(X-X_{O})+r_{22}(Y-Y_{O})+r_{23}(Z-Z_{O})}{r_{31}(X-X_{O})+r_{32}(Y-Y_{O})+r_{33}(Z-Z_{O})}
\end{align}
$$

This is the start of the [second video](https://youtu.be/cxB6NLk2zgk).

## Linear Errors

First, let's talk about how we can map from the image coordinate system, $S_{c}$, to the sensor coordinate system, $S_{s}$. In order to do this, we must consider where the sensor is within the camera, the size of the sensor, and how the lens(es) effect the light obtained by the sensor. 

## Location of the Principal Point

The origin of the sensor is typically in the center of the sensor while the origin of the image plane is typically in the top left of the image.

![ImageToSensor-01](assets/ImageToSensor-01.png)

We must transform the image coordinate system, $S_{c}$, to the sensor coordinate system, $S_{s}$:
$$^{s}H_{c}=\begin{bmatrix}1&0&x_{H}\\0&1&y_{H}\\0&0&1\end{bmatrix}\text{.}$$

## Sheer and Scale Difference

Since we are dealing with Homogeneous coordinates, we must scale the image coordinate system, $S_{c}$, to the sensor coordinate system, $S_{s}$, with a factor of $m$. The image may be sheered as well, so we will use a sheer compensation, $s$, to take this into consideration:
$$^{s}H_{c}=\begin{bmatrix}1&s&x_{H}\\0&1+m&y_{H}\\0&0&1\end{bmatrix}\text{.}$$

The shift of the **principal point**, the scale from $S_{c}$ to $S_{s}$, and the sheer compensation can be combined to produce the transformation from the object coordinate system, $S_{O}$, to the sensor coordinate system, $S_{s}$:
$$^{s}\boldsymbol{\mathrm{x}}=\,^{s}\mathrm{H}_{c}\,^{c}\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\boldsymbol{\mathrm{X}}\text{.}
$$

## Calibration Matrix

To simplify the equation, let's combined the transformation, $^{s}\mathrm{H}_{c}$, with the calibration matrix, $^{c}\mathrm{K}$:

$$
\begin{align}
\mathrm{K} =& ^{s}\mathrm{H}_{c}\,^{c}\mathrm{K}\\
=&\begin{bmatrix}1&s&x_{H}\\0&1+m&y_{H}\\0&0&1\end{bmatrix} \begin{bmatrix}c&0&0\\0&c&0\\0&0&1\end{bmatrix}\\ 
=&\begin{bmatrix}c&s&x_{H}\\0&c(1+m)&y_{H}\\0&0&1\end{bmatrix}\text{.}
\end{align}
$$

Here, we see that this **calibration matrix** is an affine transformation with 5 parameters:
1. Camera constant, $c$
2. Principal point ($x$), $x_{H}$
3. Principal point ($y$), $y_{H}$
4. Scale difference, $m$
5. Sheer compensation, $s$

## Direct Linear Transform (DLT)

We can use a **Direct Linear Transform** to solve for these parameters by using the relations of the points in the object coordinate system, $\boldsymbol{\mathrm{X}}$, and the points in the sensor coordinate system, $^{s}\boldsymbol{\mathrm{x}}$. We know that the point in the object coordinate system, $S_{O}$, can be transformed into the sensor coordinate system, $S_{s}$:
$$^{s}\boldsymbol{\mathrm{x}}=\mathrm{P}\boldsymbol{\mathrm{X}}$$
with
$$\mathrm{P}=\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]$$
and
$$\mathrm{K}=\begin{bmatrix}c&s&x_{H}\\0&c(1+m)&y_{H}\\0&0&1\end{bmatrix}\text{.}$$

If we know both the points in the object coordinate system, $\boldsymbol{\mathrm{X}}$, and the points in the sensor coordinate system, $^{s}\boldsymbol{\mathrm{x}}$, we can solve for the parameters in the **calibration matrix** by using a **Direct Linear Transform**.

![BlockDiagram-11](assets/BlockDiagram-11.png)

The transformation, $P$, is an **affine transformation** because it preserves lines and parallelism but loses distances and angles. So we not only have to solve for the 5 parameters of the **calibration matrix**, but we also must solve for the 6 extrinsic parameters: $R$ and $\boldsymbol{X}_{O}$. 

After solving for $\mathrm{P}$, we can use its elements to solve for the points in the sensor coordinate system, $^{s}\boldsymbol{\mathrm{x}}$:
$$
\begin{align}
^{s}x=&\frac{p_{11}X+p_{12}Y+p_{13}Z+p_{14}}{p_{31}X+p_{32}Y+p_{33}Z+p_{34}}\\
^{s}y=&\frac{p_{21}X+p_{22}Y+p_{23}Z+p_{24}}{p_{31}X+p_{32}Y+p_{33}Z+p_{34}}\text{.}
\end{align}
$$

## Non-Linear Errors

Previously, we only covered linear errors using the **Direct Linear Transform**. The real world is non-linear because of imperfect lenses, planarity of the sensor, and much more. So let's focus on the transformation that can handle these non-linear errors.
![BlockDiagram-10](assets/BlockDiagram-10.png)

## General Mapping

Finally, we need to handle the non-linear effects on the image. We have a location-dependent shift of the sensor frame. This means that the mapping of the points in the sensor coordinate system, $^{s}\boldsymbol{x}$, depend on where the points are located in the image, $\boldsymbol{x}$:
$$
\begin{align}
^{a}x=&\,^{s}x+\Delta x(\boldsymbol{x}, \boldsymbol{q})\\
^{a}y=&\,^{s}y+\Delta y(\boldsymbol{x}, \boldsymbol{q})\text{.}
\end{align}
$$
We can rewrite the **general mapping**,
$$^{a}\boldsymbol{\mathrm{x}}=\,^{a}\mathrm{H}_{s}(\boldsymbol{x})\,^{s}\boldsymbol{\mathrm{x}}\text{,}$$
with
$$
^{a}\mathrm{H}_{s}(\boldsymbol{x})=\begin{bmatrix}1&0&\Delta x(\boldsymbol{x}, \boldsymbol{q})\\0&1&\Delta y(\boldsymbol{x}, \boldsymbol{q})\\0&0&1\end{bmatrix}
$$
so that the overall mapping becomes
$$^{a}\boldsymbol{\mathrm{x}}=\,^{a}\mathrm{H}_{s}(\boldsymbol{x})\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\boldsymbol{\mathrm{X}}\text{.}$$

## General Calibration Matrix

We can combined the **calibration matrix** and the **general mapping** to produce the **general calibration matrix**:
$$
\begin{align}
^{a}\mathrm{K}(\boldsymbol{x}, \boldsymbol{q})=&\,^{a}\mathrm{H}_{s}(\boldsymbol{x}, \boldsymbol{q})\mathrm{K}\\
=&\begin{bmatrix}c&s&x_{H}+\Delta x(\boldsymbol{x}, \boldsymbol{q})\\0&c(1+m)&y_{H}+\Delta y(\boldsymbol{x}, \boldsymbol{q})\\0&0&1\end{bmatrix}\text{.}
\end{align}
$$

We can use the **general calibration matrix** to produce a generalized camera model:
$$
\begin{align}
^{a}\mathrm{x}=\,&^{a}\mathrm{P}(\boldsymbol{x}, \boldsymbol{q})\boldsymbol{\mathrm{X}}\\
&^{a}\mathrm{P}(\boldsymbol{x}, \boldsymbol{q})=\,^{a}\mathrm{K}(\boldsymbol{x}, \boldsymbol{q})R[I_{3}|-\boldsymbol{X}_{O}]\text{.}
\end{align}
$$

## Modeling Non-Linear Errors

There are many approaches in modeling non-linear errors. These approaches are focused on both physics and phenomena.

## Barrel Distortion

Wide angle lenses distort the projections of light before the rays of light touch the image sensor. We can model these distortions as **barrel distortions** with the idealized points using a pin-hole camera, $[x,y]^{T}$, the distance, $r$, of the pixel in the image with respect to the **principal point**, and additional parameters, $q_{1}$ and $q_{2}$, that consider the general mapping:

$$
\begin{align}
^{a}x=&\,x(1+q_{1}\,r^{2}+q_{2}\,r^{4})\\
^{a}y=&\,y(1+q_{1}\,r^{2}+q_{2}\,r^{4})\text{.}
\end{align}
$$

## Mapping as a Two Step Process

The mapping can be split into the affine **calibration matrix**,
$$^{s}\boldsymbol{\mathrm{x}}=\mathrm{P}\boldsymbol{\mathrm{X}}\text{,}$$
and the consideration of the non-linear errors,
$$^{a}\mathrm{x}=\,^{a}\mathrm{H}_{s}(\boldsymbol{x})^{s}\boldsymbol{\mathrm{x}}\text{.}$$

## Inversion of the Mapping

After calculating the mapping from $\boldsymbol{\mathrm{X}}$ to $^{a}\mathrm{x}$, we would like to invert the mapping so that we can find where the points are in the object coordinate system, $S_{O}$, when we know where the objects are in the sensor coordinate system, $S_{s}$. The steps are
1. $^{a}\mathrm{x}\rightarrow\,^{s}\mathrm{x}$
2. $^{s}\mathrm{x}\rightarrow\,\boldsymbol{\mathrm{X}}$

## Inversion of the Mapping (Step 1)

With the first step being the transformation of the consideration of the non-linear effects on the image to the sensor coordinate system, $S_{s}$, we know that the location of the points determine this transformation:
$$^{a}\mathrm{x}=\,^{a}\mathrm{H}_{s}(\boldsymbol{x})^{s}\boldsymbol{\mathrm{x}}\text{.}$$
This means that the transformation requires an iterative solution.

We can start with $^{a}\mathrm{x}$ as the initial guess,
$$\mathrm{x}^{(1)}=[\,^{a}\mathrm{H}_{s}(\,^{a}\mathrm{x})]^{-1}\,^{a}\mathrm{x}\text{,}$$
and iterate
$$\mathrm{x}^{(v+1)}=[\,^{a}\mathrm{H}_{s}(\mathrm{x}^{(v)})]^{-1}\,^{a}\mathrm{x}\text{.}$$

The solution converges quickly because $^{a}\mathrm{x}$ is a good initial guess.

## Inversion of the Mapping (Step 2)

The next step is the inversion of the projective mapping. We cannot reconstruct the 3D point because we lost information through the original transformations. Luckily, we can still reconstruct the direction towards the 3D point by calculating the ray from the camera to the object in the object coordinate system, $S_{O}$. With the known matrix, $\mathrm{P}$, we know,
$$
\begin{align}
\lambda\mathrm{x}=&\,\mathrm{P}\boldsymbol{\mathrm{X}}=\mathrm{K}R[I_{3}|-\boldsymbol{X}_{O}]\boldsymbol{\mathrm{X}}\\
=&\,[\mathrm{K}R|\mathrm{K}R\boldsymbol{X}_{O}]\begin{bmatrix}\boldsymbol{X}\\1\end{bmatrix}\\
=&\,\mathrm{K}R\boldsymbol{X}-\mathrm{K}R\boldsymbol{X}_{O}\text{.}
\end{align}
$$

This equation can be used to produce the direction of the ray from the camera origin $\boldsymbol{X}_{O}$ to the 3D point $\boldsymbol{\mathrm{X}}$: $\lambda(\mathrm{K}R)^{-1}\mathrm{x}$.

## Analog Cameras

There is a similar process for analog cameras. Analog cameras use fiducial markers rather than the sensor frame because additional framing is required for external measurement.