Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about data.utils.process_camera_inv #22

Closed
Orig1n opened this issue Sep 1, 2023 · 1 comment
Closed

question about data.utils.process_camera_inv #22

Orig1n opened this issue Sep 1, 2023 · 1 comment

Comments

@Orig1n
Copy link

Orig1n commented Sep 1, 2023

I'm confused when I read this function. Do the operations like trans[2] += -10, c *= 0.27 c[1] += 0.015 c[2] += 0.161, K[0,0] = 2985.29/700 * focal / 1050 K[1,1] = 2985.29/700 * focal / 1050 and pose[:3, 3] = pose[:3, 3]/4.0 * 2.7 have any special meaning?

def process_camera_inv(translation, Rs, focals): #crop_params):

    c_list = []

    N = len(translation)
    # for trans, R, crop_param in zip(translation,Rs, crop_params):
    for idx, (trans, R, focal) in enumerate(zip(translation, Rs, focals)):

        idx_prev = max(idx - 1, 0)
        idx_last = min(idx + 2, N - 1)

        trans = np.mean(translation[idx_prev: idx_last], axis = 0)
        R = np.mean(Rs[idx_prev: idx_last], axis = 0)

        # why
        trans[2] += -10
        c = -np.dot(R, trans)

        # # no why
        # c = trans

        pose = np.eye(4)
        pose[:3, :3] = R
        
        # why
        c *= 0.27
        c[1] += 0.015
        c[2] += 0.161
        # c[2] += 0.050  # 0.160

        pose[0, 3] = c[0]
        pose[1, 3] = c[1]
        pose[2, 3] = c[2]

        # focal = 2985.29
        w = 1024#224
        h = 1024#224


        K =np.eye(3)
        K[0][0] = focal
        K[1][1] = focal
        K[0][2] = w/2.0
        K[1][2] = h/2.0

        Rot = np.eye(3)
        Rot[0, 0] = 1
        Rot[1, 1] = -1
        Rot[2, 2] = -1        
        pose[:3, :3] = np.dot(pose[:3, :3], Rot)

        # fix intrinsics
        K[0,0] = 2985.29/700 * focal / 1050
        K[1,1] = 2985.29/700 * focal / 1050
        K[0,2] = 1/2
        K[1,2] = 1/2     
        assert K[0,1] == 0
        assert K[2,2] == 1
        assert K[1,0] == 0
        assert K[2,0] == 0
        assert K[2,1] == 0  

        # fix_pose_orig
        pose = np.array(pose).copy()

        # why
        pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
        # # no why
        # t_1 = np.array([-1.3651,  4.5466,  6.2646])
        # s_1 = np.array([-2.3178, -2.3715, -1.9653]) + 1
        # t_2 = np.array([-2.0536,  6.4069,  4.2269])
        # pose[:3, 3] = (pose[:3, 3] + t_1) * s_1 + t_2

        c = np.concatenate([pose.reshape(-1), K.reshape(-1)])
        c_list.append(c.astype(np.float32))          

    return c_list
@theEricMa
Copy link
Owner

Great question! This was indeed a challenge during our research. Although EG3D had published their code, they did not provide an explanation for the manual adjustments they made to their camera poses in their code. This was particularly problematic since our face alignment process was completely different. For the talking face task, we cropped the videos based solely on the bounding box calculated in the first frame, meaning that the subsequent frames were not aligned. Given this approach, determining how to model the rotation and translation was complex, especially when the EG3D camera convention was potentially misleading. We had no option but to manually adjust these parameters in the context of a talking face setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants