Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct 4x4 projection matrix? #5

Open
iperov opened this issue May 31, 2024 · 7 comments
Open

correct 4x4 projection matrix? #5

iperov opened this issue May 31, 2024 · 7 comments

Comments

@iperov
Copy link

iperov commented May 31, 2024

currently projection matrix is 3x3

np.array([ [1015.0, 0, 0], 
           [0, 1015.0, 0],
           [112.0, 112.0, 1] ], dtype=np.float32)

and when the point is transformed, Z is discarded

pts = pts [..., :2] / pts [..., 2:3]

which is not sutiable for standard graphics transformations like in opengl

can you provide correct 4x4 projection matrix in order to transform homogenous 3D points (x,y,z,1.0) ?

@wang-zidu
Copy link
Owner

Thank you for your support, this issue is valuable.

The parameters of the perspective projection camera we use are: focal=1015, znear=5, zfar=15. The camera is located at (0, 0, 10) and faces the negative direction of the z-axis. The rendered image size is 224×224.

I completely agree with what you said about the 4x4 projection matrix being fundamental to some rendering calculations. I believe you can find the process you mentioned in /util/nv_diffrast.py, which is the calculation method for transforming homogenous 3D points (x, y, z, 1).

However, in model/recon.py, the purpose of self.persc_proj is merely to obtain the x and y coordinates without involving the rendering process, so z is not needed (otherwise the calculation would be redundant). You can verify that the v2d obtained using self.persc_proj are consistent with the first two dimensions of screen coordinates obtained by transforming the vertex_ndc using /util/nv_diffrast.py. In short, self.persc_proj is just a way to slightly reduce unnecessary calculations, a similar approach is common in HRN, Deep3D, etc. Hope this helps.

@iperov
Copy link
Author

iperov commented Jun 3, 2024

I tried various 4x4 matrices from these values (focal=1015, znear=5, zfar=15), but none of them match the same result as your code.

reduce unnecessary calculations

for the processor it's nanoseconds, for the programmer and those who will use the repo it's a headache.

@wang-zidu
Copy link
Owner

I tried various 4x4 matrices from these values (focal=1015, znear=5, zfar=15), but none of them match the same result as your code.

reduce unnecessary calculations

for the processor it's nanoseconds, for the programmer and those who will use the repo it's a headache.

I believe you can directly refer to /util/nv_diffrast.py to get the result you want. The camera parameters have also been verified by using PyTorch3D. Let me explain further:

Starting from line 442 of model/recon.py, we assume the homogenous 3D point v = (x,y,z,1) is one of the points in v3d. First, we calculate the perspective projection matrix:

$$ {\rm{Projection - Matrix}} = \begin{bmatrix} {\frac{1}{{\tan (fov/2)}}} & 0 & 0 & 0 \\ 0 & {\frac{1}{{\tan (fov/2)}}} & 0 & 0 \\ 0 & 0 & {\frac{{znear + zfar}}{{znear - zfar}}} & {\frac{{2 \cdot znear \cdot zfar}}{{znear - zfar}}} \\ 0 & 0 & -1 & 0 \end{bmatrix} = \begin{bmatrix} {\frac{{1015}}{{112}}} & 0 & 0 & 0 \\ 0 & {\frac{{1015}}{{112}}} & 0 & 0 \\ 0 & 0 & -2 & -15 \\ 0 & 0 & -1 & 0 \end{bmatrix} $$

Invert the z direction of v to correspond to the camera's coordinate system. (In /util/nv_diffrast.py, you might also find that the y direction of v is inverted as well, which is done to adapt to the renderer.) So we get (x, y, -z, 1), and then perform the perspective projection to obtain:

$$ v' = \begin{bmatrix} {\frac{{1015}}{{112}}} & 0 & 0 & 0 \\ 0 & {\frac{{1015}}{{112}}} & 0 & 0 \\ 0 & 0 & -2 & -15 \\ 0 & 0 & -1 & 0 \end{bmatrix} \cdot \begin{bmatrix} x\\ y\\ - z\\ 1 \end{bmatrix}=\begin{bmatrix} {\frac{{1015}}{{112}}}x\\ {\frac{{1015}}{{112}}}y\\ 2z -15\\ z \end{bmatrix} $$

Homogenize the coordinates:

$$v'' =\begin{bmatrix} {\frac{{1015x}}{{112z}}}\\ {\frac{{1015y}}{{112z}}}\\ {2 - \frac{{15}}{z}}\\ 1 \end{bmatrix} $$

Finally, the coordinate in ndc space is converted to the image plane:

$${v_{image}} =\begin{bmatrix} {\frac{{v'{'_x} + 1}}{2} \cdot 224}\\ {\frac{{v'{'_y} + 1}}{2} \cdot 224} \end{bmatrix} =\begin{bmatrix} {1015\frac{{x}}{z}} + 112\\ {1015\frac{{y}}{z}} + 112 \end{bmatrix}$$

You will find that this result is consistent with the result obtained using self.persc_proj and homogenizing in model/recon.py.

I believe the above process is detailed enough and hope it helps you. The cause of your incorrect results could be due to some axis inversions (such as the common y-flip in images). This is likely because of different coordinate system definitions used in different rendering methods. Usually, you just need to visualize the results and check the steps where inversion is needed.

@ElliotQi
Copy link

@wang-zidu Hi, thanks for your excellent work. I wonder if this work supports CPU inference. I'm trying to use cpu device but get an error with nvdiffrast (RasterizeCudaContext could not use cpu device)

@wang-zidu
Copy link
Owner

@wang-zidu Hi, thanks for your excellent work. I wonder if this work supports CPU inference. I'm trying to use cpu device but get an error with nvdiffrast (RasterizeCudaContext could not use cpu device)

Thank you for your feedback. nvdiffrast can be replaced with a simpler renderer such as face3d. You can try replacing it, or if you only need the mesh results, you can simply remove the corresponding nvdiffrast content. If I have time, I will address this request as soon as possible and let you know.

@wang-zidu
Copy link
Owner

@wang-zidu Hi, thanks for your excellent work. I wonder if this work supports CPU inference. I'm trying to use cpu device but get an error with nvdiffrast (RasterizeCudaContext could not use cpu device)

Thank you for your support and patience. We have updated a new fast CPU renderer (based on face3d). Now the entire work supports CPU inference (using RetinaFace for face box).

@emlcpfx
Copy link

emlcpfx commented Aug 6, 2024

I tried various 4x4 matrices from these values (focal=1015, znear=5, zfar=15), but none of them match the same result as your code.

reduce unnecessary calculations

for the processor it's nanoseconds, for the programmer and those who will use the repo it's a headache.

Did you figure out how to ‘export’ the camera in a way that can be imported to a 3D DCC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants