# Activity: Resectioning

"Resectioning" is the standard way to add a new image to an existing reconstruction in structure-from-motion. Given feature matches from a new image to points whose position has already been estimated, we want to find the camera pose from which the new image was taken. This is also called the "persepective-$n$-point" problem.

# Theory

### The problem

Suppose you know the position
$$p^A_1, \dotsc, p^A_n \in \mathbb{R}^3$$
of $n$ points in frame $A$. Suppose these same points are visible in image $C$ and that their projection is
$$c_1, \dotsc, c_n \in \mathbb{R}^2.$$
The problem of **resectioning** is to find the pose $(R^C_A, p^C_A)$ of frame $A$ in frame $C$.

### The solution

Our camera model is
$$ \lambda_{c_i} \begin{bmatrix} c_i \\ 1 \end{bmatrix} = K p^C_i = K \left( R^C_A p^A_i + p^C_A \right) $$
where $\lambda_{c_i}$ is the $z$ coordinate of $p^C_i$ (i.e., the depth of point $i$ in image $C$). Equivalently, in terms of normalized image coordinates
$$ \gamma_i = K^{-1} \begin{bmatrix} c_i \\ 1 \end{bmatrix}, $$
our camera model is
$$ \lambda_{c_i} \gamma_i = R^C_A p^A_i + p^C_A. $$
To estimate $R^C_A$ and $p^C_A$, we need to eliminate $\lambda_{c_i}$ from this equation. We can do this by taking the cross product of $\gamma_i$ with both sides (yet again, we make use of the fact that the cross product of a vector with itself is zero):
$$
\begin{align*}
0
&= \widehat{\gamma_i} \lambda_{c_i} \gamma_i \\
&= \widehat{\gamma_i} \left( R^C_A p^A_i + p^C_A \right) \\
&= \widehat{\gamma_i} R^C_A p^A_i + \widehat{\gamma_i} p^C_A.
\end{align*}
$$
The term
$$ \widehat{\gamma_i} R^C_A p^A_i $$
is linear in
$$ R^C_A = \begin{bmatrix} x^C_A & y^C_A & z^C_A \end{bmatrix} $$
and can be rewritten in terms of the Kronecker product as
$$ \left( p^A_i \otimes \widehat{\gamma_i} \right) \begin{bmatrix} x^C_A \\ y^C_A \\ z^C_A \end{bmatrix}. $$
So, our constraint can be rewritten as
$$ 0 = \begin{bmatrix} p^A_i \otimes \widehat{\gamma_i}  & \widehat{\gamma_i} \end{bmatrix} \begin{bmatrix} x^C_A \\ y^C_A \\ z^C_A \\ p^C_A \end{bmatrix}. $$
We have one such constraint for each $i \in \{1, \dotsc, n\}.$ This can all be written in standard form as
$$
\begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}
=
\begin{bmatrix}
p^A_1 \otimes \widehat{\gamma_1}  & \widehat{\gamma_1} \\
\vdots & \vdots \\
p^A_n \otimes \widehat{\gamma_n}  & \widehat{\gamma_n}
\end{bmatrix}
\begin{bmatrix} x^C_A \\ y^C_A \\ z^C_A \\ p^C_A \end{bmatrix}
$$
and SVD can be used to find a non-trivial solution. This solution will need to be corrected in three ways:

* First, divide it (the whole solution, including $p^C_A$) by $\| x^C_A \| = 1$ to correct the scale.
* Second, multiply it (the whole solution, including $p^C_A$) by $\det\left( \begin{bmatrix} x^C_A & y^C_A & z^C_A \end{bmatrix} \right)$ to get a right-handed frame.
* Third, define $R^C_A = \det(UV^\top) UV^\top$ where $USV^\top = \begin{bmatrix} x^C_A & y^C_A & z^C_A \end{bmatrix}$ is a singular value decomposition to make sure $R^C_A \in SO(3)$.

There is no scale ambiguity after these three corrections have been made — the scale of $p^C_A$ will be consistent with the scale of $p^A_1, \dotsc, p^A_n$.

# Practice

## Set up notebook

Do all imports.

In [None]:
import numpy as np
from scipy.spatial.transform import Rotation
from scipy.linalg import block_diag
import cv2

Create random number generator with a particular seed so we can reproduce results.

In [None]:
rng = np.random.default_rng(0)

Define a function that constructs the skew-symmetric matrix

$$ \widehat{v} = \begin{bmatrix} 0 & -v_3 & v_2 \\ v_3 & 0 & -v_1 \\ -v_2 & v_1 & 0 \end{bmatrix} \in \mathbb{R}^{3 \times 3} $$

that is associated with a vector

$$ v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} \in \mathbb{R}^3. $$

In [None]:
def skew(v):
    assert(type(v) == np.ndarray)
    assert(v.shape == (3,))
    return np.array([[0., -v[2], v[1]],
                     [v[2], 0., -v[0]],
                     [-v[1], v[0], 0.]])

Define function to perform coordinate transformation.

In [None]:
def apply_transform(R_inB_ofA, p_inB_ofA, p_inA):
    p_inB = np.row_stack([
        (R_inB_ofA @ p_inA_i + p_inB_ofA) for p_inA_i in p_inA
    ])
    return p_inB

Define a function to print things nicely.

In [None]:
def myprint(M):
    if M.shape:
        with np.printoptions(linewidth=150, formatter={'float': lambda x: f'{x:10.4f}'}):
            print(M)
    else:
        print(f'{M:10.4f}')

## Create dataset

Choose intrinsic parameters, i.e., the camera matrix

$$K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}.$$

In [None]:
K = np.array([
    [1500., 0., 1000.],
    [0., 1500., 500.],
    [0., 0., 1.],
])

Choose extrinsic parameters, i.e., the pose **of frame $A$ in frame $C$**.

In [None]:
# A in W
R_inW_ofA = Rotation.from_rotvec((0.05 * np.pi) * np.array([1., 0., 0.])).as_matrix()
p_inW_ofA = np.array([0.0, 0.0, -1.0])

# C in W
R_inW_ofC = Rotation.from_rotvec((0.05 * np.pi) * np.array([0., 0., 1.])).as_matrix()
p_inW_ofC = np.array([0.2, -0.1, -0.8])

# A in C
R_inC_ofA_true = R_inW_ofC.T @ R_inW_ofA
p_inC_ofA_true = R_inW_ofC.T @ (p_inW_ofA - p_inW_ofC)

Sample points $p^A_1, \dotsc, p^A_n$. We assume (1) that these points are already part of a reconstruction, i.e., that their positions have already been estimated and so are "known," and (2) that these points are visible in image $C$.


In [None]:
n = 10
p_inW = rng.uniform(low=[-1., -1., -0.5], high=[1., 1., 2.5], size=(n, 3))
p_inA = apply_transform(R_inW_ofA.T, -R_inW_ofA.T @ p_inW_ofA, p_inW)

Project points into image $C$.

In [None]:
def project(K, R_inC_ofA, p_inC_ofA, p_inA):
    p_inC = apply_transform(R_inC_ofA, p_inC_ofA, p_inA)
    assert(np.all(p_inC[:, 2] > 0))
    q = np.row_stack([K @ p_inC_i / p_inC_i[2] for p_inC_i in p_inC])
    return q[:, 0:2]

c = project(K, R_inC_ofA_true, p_inC_ofA_true, p_inA)

Knowns:

* `p_inA` is coordinates $p^A_1, \dotsc, p^A_n$ of projected points in frame $A$
* `c` is coordinates $c_1, \dotsc, c_n$ of projected points in image $C$

Unknowns:

* `R_inC_ofA_true` is the true value of $R^C_A$
* `p_inC_ofA_true` is the true value of $p^C_A$

## Get reference solution with OpenCV

Estimate $R^C_A$ and $p^C_A$.

In [None]:
retval, rvec, tvec = cv2.solvePnP(
    p_inA,          # 3D points (in A)
    c.copy(),       # 2D points (in image C)
    K,              # camera matrix
    np.zeros(4),    # distortion coefficients
)
assert(retval)      # should be True

# Convert from rotation vector to rotation matrix
R_inC_ofA_cv = Rotation.from_rotvec(rvec.flatten()).as_matrix()

# Flatten the position (returned as 2D array by default)
p_inC_ofA_cv = tvec.flatten()

Check that results are correct.

In [None]:
# Orientation is correct
assert(np.allclose(R_inC_ofA_cv, R_inC_ofA_true))

# Position is correct
assert(np.allclose(p_inC_ofA_cv, p_inC_ofA_true))

## Get solution with your own code

Define a function to do resectioning (i.e., estimate $R^C_A$ and $p^C_A$ given $p^A_1, \dotsc, p^A_n$ and $c_1, \dotsc, c_n$).

In [None]:
def resection(p_inA, c, K):
    # Normalize image coordinates
    # ... FIXME ...

    # Find solution
    # ... FIXME ...

    # Correct solution
    # - Should have the right scale
    # ... FIXME ...
    # - Should be right-handed
    # ... FIXME ...
    # - Should be a rotation matrix
    # ... FIXME ...
    
    R_inC_ofA = np.eye(3)
    p_inC_ofA = np.zeros(3)

    return R_inC_ofA, p_inC_ofA

Apply function to resection image $C$.

In [None]:
R_inC_ofA, p_inC_ofA = resection(p_inA, c, K)

Check that results are correct.

In [None]:
# Orientation is correct
assert(np.allclose(R_inC_ofA, R_inC_ofA_true))

# Position is correct
assert(np.allclose(p_inC_ofA, p_inC_ofA_true))

# Reflection

Answer the following questions:

* How many points are required (in general) by your method? In other words, what is the minimum value of $n$ for which your method would still produce a solution? Form and test a hypothesis.
* What happens if the origin of frame $A$ and the origin of frame $C$ are at the same point? Would your method still work? Form and test a hypothesis.
* Are there arrangements of points (still visible in image $A$ and image $B$) for which your method would fail, even if the number $n$ of points is (in general) big enough? Read [Chapter 12 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899). Form and test a hypothesis.
* How robust is your method to noisy data? Read [Chapter 7.1 of Hartley and Zisserman](https://i-share-uiu.primo.exlibrisgroup.com/permalink/01CARLI_UIU/gpjosq/alma99692409012205899). Form and test a hypothesis. Consider changing your method, as suggested by the reference text, to make it more robust to noisy data.

You could also try applying your method to real data (e.g., to feature matches from a third image, given an existing two-view reconstruction from two images) rather than synthetic data.