Demo on a single image #1

kevinhartyanyi · 2022-01-25T11:58:10Z

Hi,

I read the paper and it looks interesting.
I was wondering how you would run this model on a single image. The provided demo code seems to only run on the validation set.

So if I have an image for which I have the bounding box/segmentation data of the hands, then how should I preprocess this for the model?

Given that this code is based on InterHand2.6M I tried to do something similar to what they have in their demo code:

def demo_model(args):
    pl.seed_everything(args.seed)
    model = fetch_pl_model(args, args.experiment)

    model.cuda()
    model.freeze()
    model.eval()

    # prepare input image
    transform = transforms.ToTensor()
    img_path = '1.jpg'
    original_img = cv2.imread(img_path)
    original_img_height, original_img_width = original_img.shape[:2]

    # prepare bbox
    bbox = [723, 354, 127, 74] # xmin, ymin, width, height  1
    bbox = process_bbox(bbox, (original_img_height, original_img_width, original_img_height))
    img, trans, inv_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, cfg.input_img_shape)
    img = transform(img.astype(np.float32))/255
    img = img.cuda()[None,:,:,:]

    # forward
    inputs = {'img': img}
    targets = {}
    meta_info = {}

    with torch.no_grad():
        vis_dict = model(inputs, targets, meta_info, 'vis')
        print("vis_dict", vis_dict)`

Where process_bbox and generate_patch_image are from InterHand2.6M however, it didn't work out for me. The model always returned None. My guess is that it requires the bbox to be processed differently, looking at the code I think it has to do something with the "segm_256" key in the "targets" dict, but not sure what would be the easiest way.

I would appreciate some help or directions on what would be the simplest way to write a script that runs the model on a single image.

Best regards,
Kevin

zc-alexfan · 2022-01-26T15:50:46Z

Hi Kevin, thanks for your interest in our work. I think there is an easier workaround if you just want to evaluate on unseen input image. You can simply take the model definition code and remove everything of the target dict and then manually load the weights in the modified class.

DIGIT inference code only relies on the input image if I recalled correctly. You just need to prepare the input image for the network.

kevinhartyanyi · 2022-01-27T09:20:54Z

Thank you for your response. I did some experimenting and managed to run it.

What I did was:

I removed the unnecessary parts from the linked model. So the forward looks like this:

def forward(self, inputs, targets, meta_info, mode):
    input_img = inputs["img"]

    pose_dict = self.pose_reg(input_img, None, None)

    hm_2d = pose_dict["hm_2d"]
    zval = pose_dict["zval"]
    rel_root_depth_out = pose_dict["rel_root_depth_out"]
    hand_type_sigmoid = pose_dict["hand_type_sigmoid"]

    joint_xy = hm_utils.hm2xy(hm_2d, self.args.output_hm_shape, self.args.beta)

    # 2p5 to 3d
    joint_xyz = torch.cat((joint_xy, zval[:, :, None]), dim=2)

    out_dict = {
        "joint_coord": joint_xyz,
        "rel_root_depth": rel_root_depth_out,
        "hand_type": hand_type_sigmoid,
    }
    return out_dict

The self.pose_reg calls the digit model here, which other than the image also needs segm_target_256 and segm_valid.

To be honest, I'm not sure what the purpose of providing segm_target_256 and segm_valid are as the forward function of digit only uses them here to call the forward of SegmNet, but that function doesn't use any of them. Therefore, I use None in the place of these two, however one could also change the forward of SegmNet to not require the unused variables.

For the preprocessing and calling the model, I do the following:

def demo_model(args):
    model = Model(args)
    model.load_pretrained("saved_models/db7cba8c1.pt")
    model.cuda()
    model.eval()

    transform = transforms.ToTensor()
    img_path = 'input.jpg'
    original_img = cv2.imread(img_path)
    img = original_img.copy()

    bbox = [69, 137, 165, 153] # xmin, ymin, width, height

    trans, scale, rot, do_flip, color_scale = [0,0], 1.0, 0.0, False, np.array([1,1,1])
    bbox[0] = bbox[0] + bbox[2] * trans[0]
    bbox[1] = bbox[1] + bbox[3] * trans[1]
    img, trans, inv_trans = generate_patch_image(img, bbox, do_flip, scale, rot, cfg.input_img_shape, cv2.INTER_LINEAR)
    img = np.clip(img * color_scale[None,None,:], 0, 255)

    img = transform(img.astype(np.float32))/255.0
    img = img[None,:,:,:]

    inputs = {"img": img.to('cuda:0')}
    targets = {}
    meta_info = {}

    with torch.no_grad():
        out = model(inputs, targets, meta_info, None)

The preprocessing steps are based on the augmentation function.

Finally, I visualize the results using the code from InterHand2.6M

    # joint set information is in annotations/skeleton.txt
    joint_num = 21 # single hand
    joint_type = {'right': np.arange(0,joint_num), 'left': np.arange(joint_num,joint_num*2)}
    skeleton = load_skeleton(osp.join('data/InterHand/annotations/skeleton.txt'), joint_num*2)

    img = img[0].cpu().numpy().transpose(1,2,0) # cfg.input_img_shape[1], cfg.input_img_shape[0], 3
    joint_coord = out['joint_coord'][0].cpu().numpy() # x,y pixel, z root-relative discretized depth
    rel_root_depth = out['rel_root_depth'][0].cpu().numpy() # discretized depth
    hand_type = out['hand_type'][0].cpu().numpy() # handedness probability

    # restore joint coord to original image space and continuous depth space
    joint_coord[:,0] = joint_coord[:,0] / cfg.output_hm_shape[2] * cfg.input_img_shape[1]
    joint_coord[:,1] = joint_coord[:,1] / cfg.output_hm_shape[1] * cfg.input_img_shape[0]
    joint_coord[:,:2] = np.dot(inv_trans, np.concatenate((joint_coord[:,:2], np.ones_like(joint_coord[:,:1])),1).transpose(1,0)).transpose(1,0)
    joint_coord[:,2] = (joint_coord[:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2)

    # restore right hand-relative left hand depth to continuous depth space
    rel_root_depth = (rel_root_depth/cfg.output_root_hm_shape * 2 - 1) * (cfg.bbox_3d_size_root/2)

    # right hand root depth == 0, left hand root depth == rel_root_depth
    joint_coord[joint_type['left'],2] += rel_root_depth

    # handedness
    joint_valid = np.zeros((joint_num*2), dtype=np.float32)
    right_exist = False
    if hand_type[0] > 0.5: 
        right_exist = True
        joint_valid[joint_type['right']] = 1
    left_exist = False
    if hand_type[1] > 0.5:
        left_exist = True
        joint_valid[joint_type['left']] = 1

    print('Right hand exist: ' + str(right_exist) + ' Left hand exist: ' + str(left_exist))

    # visualize joint coord in 2D space
    filename = f"result_{img_path.split('.')[0]}.jpg"
    vis_img = original_img.copy()[:,:,::-1].transpose(2,0,1)
    vis_img = vis_keypoints(vis_img, joint_coord, joint_valid, skeleton, filename, vis_dir='.')

Running on the test image from Interhand2.6M I get the following result:

I'm closing this issue now

anilesec · 2022-01-27T09:23:40Z

@kevinhartyanyi
THanks for the information. I am also trying to do this. would you be able to share the modified demo script to run?
It would be helpful and save my time. Thanks in advance!

kevinhartyanyi · 2022-01-27T09:51:03Z

Hi @anilesec
I forked the repo and uploaded the changes. You can find it here.

anilesec · 2022-01-27T22:41:21Z

Thank you @kevinhartyanyi

kevinhartyanyi closed this as completed Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo on a single image #1

Demo on a single image #1

kevinhartyanyi commented Jan 25, 2022

zc-alexfan commented Jan 26, 2022 •

edited

kevinhartyanyi commented Jan 27, 2022

anilesec commented Jan 27, 2022

kevinhartyanyi commented Jan 27, 2022

anilesec commented Jan 27, 2022

Demo on a single image #1

Demo on a single image #1

Comments

kevinhartyanyi commented Jan 25, 2022

zc-alexfan commented Jan 26, 2022 • edited

kevinhartyanyi commented Jan 27, 2022

anilesec commented Jan 27, 2022

kevinhartyanyi commented Jan 27, 2022

anilesec commented Jan 27, 2022

zc-alexfan commented Jan 26, 2022 •

edited