# Activity: Two-View Reconstruction

This activity has the following goal:
* Implement the *eight-point algorithm* for two-view reconstruction

Two-view reconstruction is the standard way to initialize structure-from-motion. Given feature matches from a pair of images, we want to find the relative camera pose and the relative position in 3D of the points to which the feature matches correspond.

# Theory

## Frames

We work with three different reference frames in this notebook:
* Frame $W$ is the world frame
* Frame $A$ is the camera frame from which a first image $A$ was taken
* Frame $B$ is the camera frame from which a second image $B$ was taken

We decribe (for example) the orientation of frame $A$ in frame $W$ as

$$ R^W_A \in SO(3) $$

and the position of frame $A$ in frame $W$ as

$$ p^W_A \in \R^3. $$

The inverse transformation is:

$$ \begin{align*} R^A_W &= (R^W_A)^\top \\ p^A_W &= -(R^W_A)^\top p^W_A. \end{align*} $$

The sequential transformation is:

$$ \begin{align*} R^B_A &= (R^W_B)^\top R^W_A \\ p^B_A &= -(R^W_B)^\top \left( p^W_A - p^W_B \right). \end{align*}$$

## Points in the world

We assume there are $n$ points

$$ \mathrm{p}_1, \dotsc, \mathrm{p}_n $$

in the world that are visible in both images (i.e., that there are $n$ matches between these two images). The coordinates of these points in frame $W$ (for example) are

$$ p^W_1, \dotsc, p^W_n \in \R^3. $$

Given the coordinates $p^W_i$ of $i$'th point $\mathrm{p}_i$ in frame $W$, the coordinates $p^A_i$ of this same point in frame $A$ (for example) are

$$ p^A_i = R^A_W p^W_i + p^A_W. $$

## Points in the images

### Camera models

The projection of the points

$$ \mathrm{p}_1, \dotsc, \mathrm{p}_n $$

into image $A$ is

$$ a_1, \dotsc, a_n \in \mathbb{R}^2. $$

Our camera model tells us that

$$ \lambda_{a_i} \begin{bmatrix} a_i \\ 1 \end{bmatrix} = K p^A_i $$

where $\lambda_{a_i}$ is the $z$ coordinate (i.e., the third element) of $p^A_i$. That is,

$$ p^A_i = \begin{bmatrix} x \\ y \\ \lambda_{a_i} \end{bmatrix} $$

for some $x, y \in \mathbb{R}$. This coordinate $\lambda_{a_i}$ is also called "a scale factor" or "the depth of point $i$ in image $A$." In any case, we emphasize that the camera model is not "equivalence" but is a strict equality if we write it as we have done here. The projection of the same $n$ points into image $B$ is, similarly,

$$ b_1, \dotsc, b_n \in \mathbb{R}^2. $$

Again, our camera model tells us that

$$ \lambda_{b_i} \begin{bmatrix} b_i \\ 1 \end{bmatrix} = K p^B_i $$

where $\lambda_{b_i}$ is the $z$ coordinate of $p^B_i$.

### Normalized image coordinates

Suppose we are working with a calibrated camera and so we know the camera matrix $K$. Since $K$ is invertible, the camera model for image $A$ can be rewritten as

$$ \lambda_{a_i} \left( K^{-1} \begin{bmatrix} a_i \\ 1 \end{bmatrix} \right) = p^A_i. $$

We define

$$ \alpha_i = K^{-1} \begin{bmatrix} a_i \\ 1 \end{bmatrix} $$

and call $\alpha_i \in \mathbb{R}^3$ the *normalized image coordinates* of the projection $a_i \in \mathbb{R}^2$ of point $\mathrm{p}_i$ into image $A$. The camera model for image $A$ is then

$$ \lambda_{a_i} \alpha_i = p^A_i. $$

If we do the same thing for image $B$, the camera model in that case is

$$ \lambda_{b_i} \beta_i = p^B_i $$

where

$$ \beta_i = K^{-1} \begin{bmatrix} b_i \\ 1 \end{bmatrix}. $$

### Constraints

We have the following camera models:

$$
\begin{align*}
\lambda_{a_i} \alpha_i &= p^A_i \\
\lambda_{b_i} \beta_i &= p^B_i.
\end{align*}
$$

What relates these two models is the fact that $p^A_i$ and $p^B_i$ are descriptions of the same point $\mathrm{p}_i$ in different frames. We can make this relationship explicit by rewriting the camera model for image $B$ as

$$
\begin{align*}
\lambda_{b_i} \beta_i
&= p^B_i \\
&= R^B_A p^A_i + p^B_A && \text{by coordinate transformation} \\
&= R^B_A \lambda_{a_i} \alpha_i + p^B_A && \text{by plugging in our camera model for image $A$} \\
&= \lambda_{a_i} R^B_A \alpha_i + p^B_A.
\end{align*}
$$

We now have a set of $3n$ equations

$$
\begin{align*}
\lambda_{b_1} \beta_1 &= \lambda_{a_1} R^B_A \alpha_1 + p^B_A \\
& \vdots \\
\lambda_{b_n} \beta_n &= \lambda_{a_n} R^B_A \alpha_n + p^B_A
\end{align*}
$$

in $2n+6$ unknowns

$$ \lambda_{a_1}, \dotsc, \lambda_{a_n} \qquad\qquad \lambda_{b_1}, \dotsc, \lambda_{b_n} \qquad\qquad \left(R^B_A, p^B_A\right) $$

and so have hope that, if the number $n$ of projected points is big enough, we can estimate these unknowns by solving the equations.

### Scale ambiguity

If
$$
\begin{align*}
\lambda_{b_1} \beta_1 &= \lambda_{a_1} R^B_A \alpha_1 + p^B_A \\
& \vdots \\
\lambda_{b_n} \beta_n &= \lambda_{a_n} R^B_A \alpha_n + p^B_A
\end{align*}
$$
then
$$
\begin{align*}
s\left( \lambda_{b_1} \right) \beta_1 &= s\left( \lambda_{a_1} R^B_A \alpha_1 + p^B_A \right) \\
& \vdots \\
s\left( \lambda_{b_n} \right) \beta_n &= s\left( \lambda_{a_n} R^B_A \alpha_n + p^B_A \right)
\end{align*}
$$
and so
$$
\begin{align*}
(s\lambda_{b_1}) \beta_1 &= (s\lambda_{a_1}) R^B_A \alpha_1 + (s p^B_A) \\
& \vdots \\
(s \lambda_{b_n}) \beta_n &= (s\lambda_{a_n}) R^B_A \alpha_n + (s p^B_A)
\end{align*}
$$
for any $s \neq 0$. That is to say, if
$$ \lambda_{a_1}, \dotsc, \lambda_{a_n} \qquad\qquad \lambda_{b_1}, \dotsc, \lambda_{b_n} \qquad\qquad \left(R^B_A, p^B_A\right) $$
satisfy the constraints, then the "scaled" quantities
$$ s\lambda_{a_1}, \dotsc, s\lambda_{a_n} \qquad\qquad s\lambda_{b_1}, \dotsc, s\lambda_{b_n} \qquad\qquad \left(R^B_A, sp^B_A\right) $$
also satisfy the constraints. In other words, the same (normalized) image coordinates $\alpha_i, \beta_i$ would result from frames $A$ and $B$ that are $s$ times farther apart and from a point $\mathrm{p}_i$ whose $z$ coordinate (i.e., depth) with respect to each frame is $s$ times bigger. This is another example of the usual scale ambiguity we have when working with images. On the plus side, we have one fewer unknown to estimate.

## The essential matrix

### What the essential matrix is

To estimate $R^B_A$ and $p^B_A$, we need to eliminate $\lambda_{a_i}$ and $\lambda_{b_i}$ from
$$ \lambda_{b_i} \beta_i = \lambda_{a_i} R^B_A \alpha_i + p^B_A. $$
We will do this in two steps.

First, take the cross product of $p^B_A$ with both sides:
$$
\begin{align*}
\widehat{p^B_A} \lambda_{b_i} \beta_i
&= \widehat{p^B_A} \left( \lambda_{a_i} R^B_A \alpha_i + p^B_A \right) \\
&= \widehat{p^B_A} \lambda_{a_i} R^B_A \alpha_i + \widehat{p^B_A} p^B_A \\
&= \widehat{p^B_A} \lambda_{a_i} R^B_A \alpha_i + 0 \\
&= \lambda_{a_i} \widehat{p^B_A} R^B_A \alpha_i.
\end{align*}
$$
Here, we made use of the fact that the cross product of a vector with itself is zero.

Second, take the dot product of $\beta_i$ with both sides:
$$ \beta_i^\top \left(\widehat{p^B_A} \lambda_{b_i} \beta_i \right) = \beta_i^\top \left( \lambda_{a_i} \widehat{p^B_A} R^B_A \alpha_i \right). $$
On the left, we have
$$
\begin{align*}
\beta_i^\top \left(\widehat{p^B_A} \lambda_{b_i} \beta_i \right)
&= \lambda_{b_i} \left( \beta_i^\top \widehat{p^B_A} \beta_i \right) \\
&= \lambda_{b_i} ( 0 ) \\
&= 0
\end{align*}
$$
since $\beta_i$ is perpendicular to $\widehat{p^B_A} \beta_i$ and since the dot product of a vector with a perpendicular vector is zero. The equation becomes
$$ 0 = \beta_i^\top \left( \lambda_{a_i} \widehat{p^B_A} R^B_A \alpha_i \right) = \lambda_{a_i} \left( \beta_i^\top \widehat{p^B_A} R^B_A \alpha_i \right). $$
We can divide both sides by $\lambda_{a_i} \neq 0$ and arrive at
$$ 0 = \beta_i^\top \widehat{p^B_A} R^B_A \alpha_i. $$
The quantity
$$ E = \widehat{p^B_A} R^B_A $$
in this expression is called the **essential matrix**.


### How to estimate the essential matrix

#### Getting an estimate

We have $n$ constraints, each of which is linear in $E$:
$$\begin{align*} 0 &= \beta_1^\top E \alpha_1 \\ &\vdots \\ 0 &= \beta_n^\top E \alpha_n. \end{align*}$$
These constraints can be rewritten in standard form as
$$\begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix} = \begin{bmatrix} (\alpha_1 \otimes \beta_1)^\top \\ \vdots \\ (\alpha_n \otimes \beta_n)^\top \end{bmatrix} \begin{bmatrix} e_1 \\ e_2 \\ e_3 \end{bmatrix}$$
where $e_1, e_2, e_3$ are the three **columns** of $E$ and where $\alpha_i \otimes \beta_i$ is the **Kronecker product** of $\alpha_i$ and $\beta_i$ (see [Appendix A.1.3 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899), for example). SVD can be applied, as usual, to find a non-trivial solution. Let's call this solution $E^{\prime\prime}$.

#### Normalizing the estimate

Clearly, if $E$ satisfies
$$0 = \beta_i^\top E \alpha_i$$
then so will $sE$ for any $s \neq 0$:
$$0 = s(0) = s \left(\beta_i^\top (sE) \alpha_i\right) = \beta_i^\top (sE) \alpha_i.$$
This scale ambiguity is the same as what we described already. Remember that
$$E = \widehat{p^B_A} R^B_A$$
and so
$$sE = s\left( \widehat{p^B_A} R^B_A \right) = \widehat{(s p^B_A)} R^B_A.$$
That is to say, scaling $E$ is equivalent to scaling $p^B_A$ — the same ambiguity we discussed before.
So, we will have to make a choice. We *could* choose
$$\| E \| = 1.$$
However, notice that
$$
\begin{align*}
\| E \|
&= \| \widehat{p^B_A} R^B_A \| && \text{by definition} \\
&= \| \widehat{p^B_A} \| && \text{since $R^B_A$ is orthonormal} \\
&= \sqrt{2} \| p^B_A \| && \text{by direct calculation (try it yourself).}
\end{align*}
$$
So,
$$\| E \| = 1$$
corresponds to
$$\| p^B_A \| = 1 / \sqrt{2}.$$
Since the matrix norm of $E$ has no clear meaning, while the vector norm of $p^B_A$ is the distance between frames $A$ and $B$, it probably makes more sense to choose
$$\| E \| = \sqrt{2}, $$
which corresponds to
$$\| p^B_A \| = 1.$$
To summarize, we normalize our estimate as
$$ E^\prime = \left( \sqrt{2} / \| E^{\prime\prime} \| \right) E^{\prime\prime}. $$


#### Correcting the estimate

It is a fact (see [Result 9.17 in Chapter 9.6.1 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899) or [Theorem 5.5 in Chapter 5.1.2 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899), for example) that any essential matrix $E$ satisfying $\| E \| = \sqrt{2}$ has a singular value decomposition
$$ E = U S V^\top $$
where $U, V \in SO(3)$ are rotation matrices and
$$ S = \text{diag} \left(1, 1, 0\right). $$
Since our method of finding $p^B_A$ and $R^B_A$ will rely on this fact, and since any estimate $E^\prime$ of $E$ will not actually be an essential matrix, it is important to correct $E^\prime$ before proceeding so that it has a singular value decomposition of the form given above. The essential matrix $E$ that minimizes $\|E - E^\prime\|$ and that has the structure we require can be found (see [Theorem 5.9 in Chapter 5.2.1 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899), for example) by taking the singular decomposition
$$ U^\prime S^\prime (V^\prime)^\top = E^\prime, $$
by extracting the columns of $U^\prime$ and $V^\prime$ as
$$
U^\prime = \begin{bmatrix} u_1^\prime & u_2^\prime & u_3^\prime \end{bmatrix}
\qquad\qquad
V^\prime = \begin{bmatrix} v_1^\prime & v_2^\prime & v_3^\prime \end{bmatrix},
$$
and by choosing
$$
\begin{align*}
U &= \begin{bmatrix} u_1^\prime & u_2^\prime & \det(U^\prime) u_3^\prime \end{bmatrix} \\
S &= \text{diag} \left(1, 1, 0\right) \\
V &= \begin{bmatrix} v_1^\prime & v_2^\prime & \det(V^\prime) v_3^\prime \end{bmatrix}.
\end{align*}
$$
[Theorem 5.9 in Chapter 5.2.1 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899), for example, gives a proof of this result.

Note that the way I've suggested to make sure $U, V \in SO(3)$ — by changing, if necessary, the sign of $u_3^\prime$ and $v_3^\prime$ — is different than what is suggested by [Remark 5.10 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899). My way is a little more direct, I think. It's clear that if $U$ is orthonormal, then flipping the sign of its third column will flip the sign of its determinant but leave it orthonormal — same for $V$. In general, doing so would change the product $USV^\top$, and so would produce an invalid decomposition. However, since the third singular value of the essential matrix $E$ is zero, **any** change to the third column of $U$ and $V$ leaves $USV^\top$ invariant:
$$
\begin{align*}
USV^\top &= (1) u_1v_1^\top + (1) u_2v_2^\top + (0) u_3v_3^\top \\
&= u_1v_1^\top + u_2v_2^\top.
\end{align*}
$$

### How to decompose the essential matrix

#### The four possible decompositions

We have an estimate $E$ of the essential matrix that satisfies
$$ \| E \| = \sqrt{2} $$
and that has the singular value decomposition
$$ U S V^\top = E $$
where
$$ U, V \in SO(3) $$
are rotation matrices and where
$$ S = \text{diag} \left(1, 1, 0\right). $$
Now, we want to find the position $p^B_A$ and orientation $R^B_A$ for which
$$ E = \widehat{p^B_A} R^B_A. $$
Define the matrix
$$ W = \begin{bmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}. $$
There are exactly two possibilities:

* $(R^B_A, p^B_A) = (U W^T V^T, u_3)$
* $(R^B_A, p^B_A) = (U W V^T, -u_3)$

[Theorem 5.7 in Chapter 5.1.2 of 3DV](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99506128312205899), for example, gives a proof of this result. Since $E$ and $-E$ are "the same" up to scale, we have exactly two more possibilities:

* $(R^B_A, p^B_A) = (U W^T V^T, -u_3)$
* $(R^B_A, p^B_A) = (U W V^T, u_3)$

We emphasize that these second two possibilities will satisfy
$$ -E = \widehat{p^B_A} R^B_A $$
and not
$$ E = \widehat{p^B_A} R^B_A. $$

[Result 9.18 and Result 9.19 in Chapter 9.6.2 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899) derive these same four possibilities. I find their derivation a little more confusing, because they couple the pairs that correspond to $+u_3$ and $-u_3$ rather than the pairs that correspond to $E$ and $-E$.

#### Choosing amongst the four

Which of the four possible decompositions of $E$ should we choose? Remember that this whole story began with the assumption that points $\mathrm{p}_1, \dotsc, \mathrm{p}_n$ are visible in both image $A$ and image $B$. "Visible" means "have positive depth" — in other words, it must be the case that
$$
\lambda_{a_i} > 0 \text{ and } \lambda_{b_i} > 0 \text{ for all } i \in \{1, \dotsc, n\}.
$$
This will only be true for *one* of the four possible choices of $R^B_A$ and $p^B_A$, as is explained in [Chapter 9.6.3 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899) (for example). So, in order to choose, we will need to estimate $\lambda_{a_i}$ and $\lambda_{b_i}$ (equivalently, to estimate $p^A_i$ and $p^B_i$) for all $i \in \{1, \dotsc, n\}$.

### Triangulation

Let's return to our governing equation:

$$ \lambda_{b_i} \beta_i = \lambda_{a_i} R^B_A \alpha_i + p^B_A. $$

Suppose we know $R^B_A$ and $p^B_A$. The process of estimating $\lambda_{a_i}$ and $\lambda_{b_i}$ is known as *triangulation* or as *structure computation*.

Given $\lambda_{a_i}$, it is easy to find $\lambda_{b_i}$. Remember that
$$ \beta_i = K^{-1} \begin{bmatrix} b_i \\ 1 \end{bmatrix} = \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}$$
for some $u, v \in \mathbb{R}$. So,
$$
\lambda_{b_i}
= \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}^\top \left( \lambda_{b_i} \beta_i \right)
= \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}^\top \left( \lambda_{a_i} R^B_A \alpha_i + p^B_A \right).
$$

Our strategy, therefore, will be to eliminate $\lambda_{b_i}$ from the governing equation and solve first for $\lambda_{a_i}$. We can do this by taking the cross product of $\beta_i$ with both sides:
$$
\begin{align*}
0
&= \widehat{\beta_i} \lambda_{b_i} \beta_i && \text{since the cross product a vector with itself is zero} \\
&= \widehat{\beta_i} \left( \lambda_{a_i} R^B_A \alpha_i + p^B_A \right) \\
&= \left( \widehat{\beta_i} R^B_A \alpha_i \right) \lambda_{a_i} + \widehat{\beta_i} p^B_A.
\end{align*}
$$
The least-squares solution is
$$ \lambda_{a_i} = -\left( \widehat{\beta_i} R^B_A \alpha_i \right)^\dagger \widehat{\beta_i} p^B_A $$
where "$\dagger$" denotes the matrix pseudo-inverse (a.k.a. the "generalized" or "Moore Penrose" inverse). Since $\lambda_{a_i} \in \mathbb{R}$, we can write this solution more simply as
$$ \lambda_{a_i} = \frac{u^\top v}{u^\top u} $$
where
$$ u = \widehat{\beta_i} R^B_A \alpha_i $$
and
$$ v = - \widehat{\beta_i} p^B_A.$$

Given $\lambda_{a_i}$ and $\lambda_{b_i}$, we compute
$$
p^A_i = \lambda_{a_i} \alpha_i
\qquad\text{and}\qquad
p^B_i = \lambda_{b_i} \beta_i.
$$


### Checking that results are correct

We use a synthetic dataset in this notebook, and so have true values to which estimates can be compared.

The estimate of $R^B_A$ was exact (i.e., unscaled) and can be compared directly to its true value.

The estimates of $p^B_A$, $p^A_1, \dotsc, p^A_n$, and $p^B_1, \dotsc, p^B_n$ were all scaled to be consistent with
$$ \| p^B_A \| = 1. $$

If the true distance between frames $A$ and $B$ is $s$ instead of $1$, then we should compare the scaled estimates
$sp^B_A$, $sp^A_1, \dotsc, sp^A_n$, and $sp^B_1, \dotsc, sp^B_n$
to the true values.

# Practice

## Set up notebook

Do all imports.

In [None]:
import numpy as np
from scipy.spatial.transform import Rotation
from scipy.linalg import block_diag
import cv2

Create random number generator with a particular seed so we can reproduce results.

In [None]:
rng = np.random.default_rng(0)

Define a function that constructs the skew-symmetric matrix

$$ \widehat{v} = \begin{bmatrix} 0 & -v_3 & v_2 \\ v_3 & 0 & -v_1 \\ -v_2 & v_1 & 0 \end{bmatrix} \in \mathbb{R}^{3 \times 3} $$

that is associated with a vector

$$ v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} \in \mathbb{R}^3. $$

In [None]:
def skew(v):
    assert(type(v) == np.ndarray)
    assert(v.shape == (3,))
    return np.array([[0., -v[2], v[1]],
                     [v[2], 0., -v[0]],
                     [-v[1], v[0], 0.]])

Define function to perform coordinate transformation.

In [None]:
def apply_transform(R_inB_ofA, p_inB_ofA, p_inA):
    p_inB = np.row_stack([
        (R_inB_ofA @ p_inA_i + p_inB_ofA) for p_inA_i in p_inA
    ])
    return p_inB

Define a function to print things nicely.

In [None]:
def myprint(M):
    if M.shape:
        with np.printoptions(linewidth=150, formatter={'float': lambda x: f'{x:10.4f}'}):
            print(M)
    else:
        print(f'{M:10.4f}')

## Create dataset

Choose intrinsic parameters, i.e., the camera matrix

$$K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}.$$

In [None]:
K = np.array([
    [1500., 0., 1000.],
    [0., 1500., 500.],
    [0., 0., 1.],
])

Choose extrinsic parameters, i.e., the poses **of frame A in frame W** and **of frame B in frame W**.

In [None]:
# Frame A
R_inW_ofA = Rotation.from_rotvec((0.05 * np.pi) * np.array([1., 0., 0.])).as_matrix()
p_inW_ofA = np.array([0.0, 0.0, -1.0])

# Frame B
R_inW_ofB = Rotation.from_rotvec((0.05 * np.pi) * np.array([0., 1., 0.])).as_matrix()
p_inW_ofB = np.array([0.5, 0.0, -1.1])

Find pose **of frame $A$ in frame $B$**.

In [None]:
R_inB_ofA_true = R_inW_ofB.T @ R_inW_ofA
p_inB_ofA_true = R_inW_ofB.T @ (p_inW_ofA - p_inW_ofB)

Sample points $p^W_1, \dotsc, p^W_n$ **in frame W**.

In [None]:
n = 10
p_inW = rng.uniform(low=[-1., -1., -0.5], high=[1., 1., 2.5], size=(n, 3))

Find coordinates $p^A_1, \dotsc, p^A_n$ and $p^B_1, \dotsc, p^B_n$ of these same points **in frame $A$** and **in frame $B$**.

In [None]:
p_inA_true = apply_transform(R_inW_ofA.T, -R_inW_ofA.T @ p_inW_ofA, p_inW)
p_inB_true = apply_transform(R_inW_ofB.T, -R_inW_ofB.T @ p_inW_ofB, p_inW)

Project points into the images.

In [None]:
def project(K, R_inW_ofC, p_inW_ofC, p_inW):
    p_inC = apply_transform(R_inW_ofC.T, -R_inW_ofC.T @ p_inW_ofC, p_inW)
    assert(np.all(p_inC[:, 2] > 0))
    q = np.row_stack([K @ p_inC_i / p_inC_i[2] for p_inC_i in p_inC])
    return q[:, 0:2]

a = project(K, R_inW_ofA, p_inW_ofA, p_inW)
b = project(K, R_inW_ofB, p_inW_ofB, p_inW)

Knowns:

* `a` and `b` are the image coordinates $a_1, \dotsc, a_n$ and $b_1, \dotsc, b_n$ of projected points

Unknowns:

* `p_inA_true` is the true value of $p^A_1, \dotsc, p^A_n$
* `p_inB_true` is the true value of $p^B_1, \dotsc, p^B_n$
* `R_inB_ofA_true` is the true value of $R^B_A$
* `p_inB_ofA_true` is the true value of $p^B_A$

## Get reference solution with OpenCV

Estimate $R^B_A$ and $p^B_A$.

In [None]:
# Get solution
num_inliers_cv, E_cv, R_inB_ofA_cv, p_inB_ofA_cv, mask_cv = cv2.recoverPose(
    a.copy(),
    b.copy(),
    K, np.zeros(4),
    K, np.zeros(4),
)

# Flatten the position (returned as a 2d array by default)
p_inB_ofA_cv = p_inB_ofA_cv.flatten()

Estimate $p^A_1, \dotsc, p^A_n$.

In [None]:
points = cv2.triangulatePoints(
    K @ np.column_stack([np.eye(3), np.zeros(3)]),
    K @ np.column_stack([R_inB_ofA_cv, p_inB_ofA_cv]),
    a.copy().T,
    b.copy().T,
)

# Normalize points
points /= points[-1, :]

# Extract non-homogeneous coordinates
p_inA_cv = points[0:3, :].T

Check that results are correct.

In [None]:
# Relative orientation is correct
assert(np.allclose(R_inB_ofA_cv, R_inB_ofA_true))

# Make sure estimated distance between frame A and frame B is 1
assert(np.isclose(np.linalg.norm(p_inB_ofA_cv), 1.))

# Find scale
s = np.linalg.norm(p_inB_ofA_true)

# Apply scale to estimates
p_inB_ofA_cv_scaled = s * p_inB_ofA_cv
p_inA_cv_scaled = s * p_inA_cv

# Scaled estimate of relative position is correct
assert(np.allclose(p_inB_ofA_cv_scaled, p_inB_ofA_true))

# Scaled estimate of points in frame A is correct
assert(np.allclose(p_inA_cv_scaled, p_inA_true))

## Get solution with your own code

Define a function to do triangulation (i.e., estimate $p^A_1, \dotsc, p^A_n$ and $p^B_1, \dotsc, p^B_n$ given $R^B_A$, $p^B_A$, $\alpha_1, \dotsc, \alpha_n$, and $\beta_1, \dotsc, \beta_n$).

In [None]:
def triangulate(alpha, beta, R_inB_ofA, p_inB_ofA):
    # Get scales
    # ... FIXME ...

    # Get points
    # ... FIXME ...
    p_inA = None
    p_inB = None

    return p_inA, p_inB

Define a function to do two-view reconstruction (i.e., estimate $E$, $R^B_A$, $p^B_A$, $p^A_1, \dotsc, p^A_n$ and $p^B_1, \dotsc, p^B_n$ given $a_1, \dotsc, a_n$, and $b_1, \dotsc, b_n$). The results should be scaled so that $\| p^B_A \| = 1$.

In [None]:
def twoview(a, b, K):
    # Normalize image coordinates
    # ... FIXME ...

    # Estimate essential matrix
    # ... FIXME ...

    # Normalize essential matrix
    # ... FIXME ...

    # Correct essential matrix
    # ... FIXME ...

    # Decompose essential matrix and do triangulation
    # - Check first solution (if consistent: return E, R_inB_ofA, p_inB_ofA, p_inA, p_inB)
    # ... FIXME ...
    # - Check second solution (if consistent: return E, R_inB_ofA, p_inB_ofA, p_inA, p_inB)
    # ... FIXME ...
    # - Etc.

    # Raise exception if no solution was found
    raise Exception('Failed to find a solution')

Apply function to do two-view reconstruction.

In [None]:
E, R_inB_ofA, p_inB_ofA, p_inA, p_inB = twoview(a, b, K)

Check that results are correct.

In [None]:
# Relative orientation is correct
assert(np.allclose(R_inB_ofA, R_inB_ofA_true))

# Make sure estimated distance between frame A and frame B is 1
assert(np.isclose(np.linalg.norm(p_inB_ofA), 1.))

# Find scale
s = np.linalg.norm(p_inB_ofA_true)

# Apply scale to estimates
p_inB_ofA_scaled = s * p_inB_ofA
p_inA_scaled = s * p_inA
p_inB_scaled = s * p_inB

# Scaled estimate of relative position is correct
assert(np.allclose(p_inB_ofA_scaled, p_inB_ofA_true))

# Scaled estimate of points in frame A is correct
assert(np.allclose(p_inA_scaled, p_inA_true))

# Scaled estimate of points in frame B is correct
assert(np.allclose(p_inB_scaled, p_inB_true))

# Reflection

Answer the following questions:

* How many points are required (in general) by your method? In other words, what is the minimum value of $n$ for which your method would still produce a solution? Form and test a hypothesis.
* What happens if the origin of frame $A$ and the origin of frame $B$ are at the same point? Would your method still work? Form and test a hypothesis.
* Are there arrangements of points (still visible in image $A$ and image $B$) for which your method would fail, even if the number $n$ of points is (in general) big enough? Form and test a hypothesis.
* How robust is your method to noisy data? Read [Chapter 11.2 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899). Form and test a hypothesis. Consider changing your method, as suggested by the reference text, to make it more robust to noisy data.

When you are ready, try applying your method to real data (e.g., to feature matches from a pair of real images) rather than synthetic data.