# Triangulation and Absolute Orientation
This is from a [video](https://youtu.be/qS7GaaXwW4c) by Stachniss.

## Motivation

Given the relative orientation of two images, compute the points in 3D.

## Table of Contents

**Triangulation**
1. Geometric approach
2. Stereo normal case
3. Quality of the 3D Points

**Absolute Orientation**

**Discussion of Orientation Solutions**

## Geometric Solution

![Triangulation-01](assets/Triangulation-01.png)

The rays coming out of points $P$ and $Q$ can be written as lines
$$
\begin{align}
f(\lambda) =&\, P + \lambda \cdot r \\
g(\mu) =&\, Q + \mu \cdot s
\end{align}
$$

where

$$
\begin{align}
P=&\,X_{O^{\prime}}\quad r=(R^{\prime})^{T}\,^{k}x^{\prime} \\
Q=&\,X_{O^{\prime\prime}}\quad s=(R^{\prime\prime})^{T}\,^{k}x^{\prime\prime}
\end{align}
$$

and 

$$
^{k}x^{\prime} = (x^{\prime},y^{\prime},c)^{T}\quad ^{k}x^{\prime\prime} = (x^{\prime\prime},y^{\prime\prime},c)^{T}\text{.}
$$

The shortest connection between points $F$ and $G$ requires that $\overline{FG}$ be orthogonal,
$$
(F - G) \cdot r = 0 \quad (F - G) \cdot s = 0\text{,}
$$
which leads to
$$
\begin{align}
(P + \lambda \cdot r - Q + \mu \cdot s) \cdot r =&\,0 \\
(P + \lambda \cdot r - Q + \mu \cdot s) \cdot s =&\,0\text{.}
\end{align}
$$

If the directions, $\lambda$ and $\mu$, are known, we can solve for $F$ and $G$, which yields $H$.

## Stereo Normal Case

We can use triangulation similarly in the stereo normal case.
![StereoNormal-02](assets/StereoNormal-02.png)

The triangulation will use the plane spanned by the points $x^{\prime}_{P}$, $x^{\prime\prime}_{P}$, and $P$.
![StereoNormal-01](assets/StereoNormal-01.png)

![StereoNormal-03](assets/StereoNormal-03.png)

The $Z$-coordinate can be found from the intercept theorem:
$$\frac{Z}{c} = \frac{B}{-(x^{\prime\prime} - x^{\prime})} = \frac{B}{p_{x}}$$
where
$$Z = c\frac{B}{-(x^{\prime\prime} - x^{\prime})}\text{.}$$

The $X$-coordinate can be found through the same process:
$$\frac{X}{x^{\prime}} = \frac{Z}{c}$$
where
$$X=x^{\prime}\frac{B}{-(x^{\prime\prime} - x^{\prime})}\text{.}$$

First, let's look at the $X,Z$-plane.
![Triangulation-02](assets/Triangulation-02.png)

The $Y$-coordinate can also be found by the same process,
$$\frac{Y}{X}=\frac{\frac{y^{\prime} + y^{\prime\prime}}{2}}{x^{\prime}}$$
where
$$Y=\frac{y^{\prime}+y^{\prime\prime}}{2}\frac{B}{-(x^{\prime\prime}-x^{\prime)}}\text{.}$$

## Intersection of Two Rays for the Stereo Normal Case

We now know that we can solve for the $X$, $Y$, and $Z$ coordinates when using stereo normal calibrated cameras:

$$
\begin{align}
X=&\,x^{\prime}\frac{B}{-(x^{\prime\prime} - x^{\prime})}\\
Y=&\,\frac{y^{\prime}+y^{\prime\prime}}{2}\frac{B}{-(x^{\prime\prime}-x^{\prime)}}\\
Z=&\,c\frac{B}{-(x^{\prime\prime} - x^{\prime})}\text{.}
\end{align}
$$

The $X$-parallax, $p_{x}=x^{\prime\prime}-x^{\prime}$ corresponds to the depth, $Z$, while the $Y$-parallax, $p_{y}=y^{\prime\prime}-y^{\prime}$, corresponds to the consistency of the image points in the $Y$ direction. These parallaxes are also known as disparity.

## X-Parallax

We can see that the $X$-parallax is the key element in solving for the $X$, $Y$, and $Z$ coordinates. The solutions for each coordinate is relative to the $X$-parallax and the baseline, $B$. Let's combine the ratio and call it the image scale number,
$$M=\frac{-B}{x^{\prime\prime}-x^{\prime}}=\frac{Z}{c}\text{,}$$
and rewrite our $X$, $Y$, and $Z$ coordinates:
$$
\begin{align}
X=&\,Mx^{\prime}\\
Y=&\,M\frac{y^{\prime}+y^{\prime\prime}}{2}\\
Z=&\,Mc\text{.}
\end{align}
$$

## Y-Parallax

When the $Y$-parallax is zero the solutions for the $X$, $Y$, and $Z$ coordinates is simplified:
$$
\begin{align}
X=&\,x^{\prime}\frac{B}{-p_{x}}\\
Y=&\,y^{\prime}\frac{B}{-p_{x}}\\
Z=&\,c\frac{B}{-p_{x}}\text{.}
\end{align}
$$
We can rewrite these solutions:
$$
\begin{bmatrix}X\\Y\\Z\end{bmatrix}=\begin{bmatrix}-\frac{B}{-p_{x}}&0&0\\0&-\frac{B}{-p_{x}}&0\\0&0&-\frac{B}{-p_{x}}\end{bmatrix}\begin{bmatrix}x^{\prime}\\y^{\prime}\\z^{\prime}\end{bmatrix}\text{.}
$$

When using homogeneous coordinate and the parallax as an input,
$$
\begin{bmatrix}U\\V\\W\\T\end{bmatrix}=\begin{bmatrix}B&0&0&0\\0&B&0&0\\0&0&Bc&0\\0&0&0&-1\end{bmatrix}\begin{bmatrix}x^{\prime}\\y^{\prime}\\1\\p_{x}\end{bmatrix}\text{,}
$$
we can use a set of points $\{x^{\prime}, y^{\prime}\}$ in the first image, $\{x^{\prime}, y^{\prime}, p_{x}\}$, to produce a **parallax map**. This **parallax map** yields the 3D coordinates of the point. The inner element only requires the baseline, $B$, and the camera constant, $c$.

## Quality of the 3D Points

What influence the quality of the 3D points obtained in the stereo normal case? 
1. The quality of the orientation parameters
2. The quality of the measured image coordinates

Let's take a deeper dive in the quality of the measured image coordinates.

Assuming that we measure the image coordinates in $x$ and $y$ with the uncertainty of the $x$ and $y$ image coordinates being equal, $\sigma_{x^{\prime}} = \sigma_{y^{\prime}}$, we can use the world $X$ and $Y$ coordinates,
$$
\begin{align}
X=&\,Mx^{\prime}\\
Y=&\,M\frac{y^{\prime}+y^{\prime\prime}}{2}\text{,}
\end{align}
$$
to produce the uncertainty of world's $X$ and $Y$ coordinates:
$$
\begin{align}
\sigma_{X}=&\,M\sigma_{x^{\prime}}=\frac{Z}{c}\sigma_{x^{\prime}}\\
\sigma_{Y}=&\,\frac{\sqrt{2}}{2}M\sigma_{y^{\prime}}=\frac{\sqrt{2}}{2}\frac{Z}{c}\sigma_{y^{\prime}}\text{.}
\end{align}
$$

For the point's $Z$-coordinate, we know that
$$Z=Mc$$
which yields
$$Zp_{x}=-Bc\text{.}$$
These relationships can be used to solve for the *relative* precision for the $Z$-coordinate:
$$\frac{\sigma_{Z}}{Z}=\frac{\sigma_{p_{x}}}{p_{x}}\text{.}$$

This shows us that the relative precision of the height is the relative precision of the $x$-parallax.

We can rewrite the uncertainty of the $Z$-coordinate to obtain:
$$
\sigma_{Z}=\frac{Z}{p_{x}}\sigma_{p_{x}} = \frac{cB}{p_{x}^{2}}\sigma_{p_{x}}=\frac{Z^{2}}{cB}\sigma_{p_{x}}=\frac{Z}{c \frac{B}{Z}}\sigma_{p_{x}}\text{.}
$$

The standard deviation of the world's $Z$-coordinate depends:
- on the standard deviation of the $x$-parallax, $\sigma_{p_{x}}$
- inversely quadratically on the $x$-parallax, $p_{x}$
- quadratically on the depth, $Z$
- inversely on the ratio of the base and depth, $\frac{B}{Z}$


There is a break halfway through the class [here](https://youtu.be/qS7GaaXwW4c?t=2579).

## Relative Orientation

The result of the relative orientation is the **photogrammetric model**. This contains the
- parameters of the relative orientation of both cameras
- 3D coordinates of $N$ points in a local coordinate frame
$$
^{m}X_{n} = \left(^{m}X_{n}, ^{m}Y_{n}, ^{m}Z_{n}\right)^{T}\quad n=1, \dots, N
$$
For calibrated cameras, this is known up to a **similarity transform**.

## Absolute Orientation

This **similarity transform** maps the **photogrammetric model** into the object reference frame:
$$
^{O}X_{n}=\lambda R\,^{m}X_{n}+T\text{.}
$$

This gives us seven degrees of freedom for the **similarity transform**:
- 3 for rotation
- 3 for translation
- 1 for scale
A fiducial, **control points**, are required to find the absolute orientation.

## Least Squares Solution

There is a non-linear least squares solution for the absolute orientation of the cameras. We needs at least three control points, $X$, $Y$, and $Z$ are known.

## Sketch of the Solution
We want to map corresponding **control points** in the images, $x_{n}$, and the world, $y_{n}$:
$$
y_{n}=\lambda R_{x_{n}} - T\quad n=1,\dots,N
$$
with a rotation, $R$, translation, $T$, and a scale, $\lambda$.

We can rewrite this relationship, 
$$
\begin{align}
\lambda^{\frac{1}{2}}(y_{n}-y_{O})=&\,R\lambda^{\frac{1}{2}}(x_{n}-x_{O})\\
b_{n}=&\,a_{n}\text{,}
\end{align}
$$
where the points in the world are $b_{n}$ and $a_{n}$ are the points in the local frame.
We can minimize the difference, $\Phi$, between the world's points and the points in the local frame by a least squares approach:
$$
\Phi(x_{O}, \lambda, R)=\Sigma[b_{n}-Ra_{n}]^{T}[b_{n}-Ra_{n}]p_{n}
$$
where $p_{n}$ can weigh some points that we know to be more accurate than others.

## Minimization

After computing the first derivatives,
$$
\begin{align}
\frac{\partial \Phi}{\partial x_{O}}=0&\quad\rightarrow\quad x_{O}=\frac{\Sigma x_{n}p_{n}}{\Sigma p_{n}}\\
\frac{\partial \Phi}{\partial \lambda}=0&\quad\rightarrow\quad
\lambda^{2}=\frac{
(y_{n}-y_{O})^{T}(y_{n}-y_{O})p_{n}
}{
(x_{n}-x_{O})^{T}(x_{n}-x_{O})p_{n}
}\text{,}
\end{align}
$$
we see that $x_{O}$ is just the centroid weighted with $p_{n}$ and $\lambda$ is the ratio of the spread of the points in the world, $y_{n}$, and in the images, $x_{n}$.

The rotation matrix, $R$, can be found by using **singular value decomposition**:
$$
H = \sum_{i=1}^{k}(a_{n}b_{n}^{T})p_{n},\,\texttt{SVD}(H)=UDV^{T}\quad\rightarrow\quad R=VU^{T}\text{.}
$$

## 2-Step Solution

By combining the techniques spoke of previously, we obtain:
1. **Relative Orientation** without control points and 3D location of the corresponding points in a local frame
2. **Absolute Orientation** of cameras and corresponding points through control points

## Control Points

There are different types of control points:
1. Full control points: $X$, $Y$, and $Z$ are known
2. Planimetric control points: $X$ and $Y$ are known
3. Height control points: $Z$ is known

## Other Orientation Approaches
So far, we've spoke of:
- Direct linear transform (DLT)
- Spatial Resection (P3P, RRS)
- Relative orientation
- Triangulation
- Absolute orientation

There *are* other possibilities in solving for the orientation:
1. Option 1
    - DLT for each camera using control points
    - Triangulation for all corresponding points
2. Option 2
    - P3P for each camera using control points
    - Triangulation fo all corresponding points
3. Option 3
    - One big least squares approach (bundle adjustment)


## The Best Solution

Can we say that there is an approach that is better than the others? 

## Relevant Properties

In order to find the best solution, we must ask a few questions. First, is the solution statistically optimal? When asking this, we must consider the precision of the estimated parameters, the object coordinates, and the orientation. 

We must also consider the ability we have to check the correspondence of the matching points. This is dependent on the number of points, the redundancy, $R$, on the unknowns, $U$, observations, $N$, and constraints $H$:

$$
R = N-U+H\text{.}
$$

    TODO
    - Draw the figure comparing the different points
    - Draw the flow diagrams for the different approaches