New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo on a single image #1
Comments
Hi Kevin, thanks for your interest in our work. I think there is an easier workaround if you just want to evaluate on unseen input image. You can simply take the model definition code and remove everything of the target dict and then manually load the weights in the modified class. DIGIT inference code only relies on the input image if I recalled correctly. You just need to prepare the input image for the network. |
Thank you for your response. I did some experimenting and managed to run it. What I did was: I removed the unnecessary parts from the linked model. So the forward looks like this: def forward(self, inputs, targets, meta_info, mode):
input_img = inputs["img"]
pose_dict = self.pose_reg(input_img, None, None)
hm_2d = pose_dict["hm_2d"]
zval = pose_dict["zval"]
rel_root_depth_out = pose_dict["rel_root_depth_out"]
hand_type_sigmoid = pose_dict["hand_type_sigmoid"]
joint_xy = hm_utils.hm2xy(hm_2d, self.args.output_hm_shape, self.args.beta)
# 2p5 to 3d
joint_xyz = torch.cat((joint_xy, zval[:, :, None]), dim=2)
out_dict = {
"joint_coord": joint_xyz,
"rel_root_depth": rel_root_depth_out,
"hand_type": hand_type_sigmoid,
}
return out_dict The self.pose_reg calls the digit model here, which other than the image also needs segm_target_256 and segm_valid. To be honest, I'm not sure what the purpose of providing segm_target_256 and segm_valid are as the forward function of digit only uses them here to call the forward of SegmNet, but that function doesn't use any of them. Therefore, I use None in the place of these two, however one could also change the forward of SegmNet to not require the unused variables. For the preprocessing and calling the model, I do the following: def demo_model(args):
model = Model(args)
model.load_pretrained("saved_models/db7cba8c1.pt")
model.cuda()
model.eval()
transform = transforms.ToTensor()
img_path = 'input.jpg'
original_img = cv2.imread(img_path)
img = original_img.copy()
bbox = [69, 137, 165, 153] # xmin, ymin, width, height
trans, scale, rot, do_flip, color_scale = [0,0], 1.0, 0.0, False, np.array([1,1,1])
bbox[0] = bbox[0] + bbox[2] * trans[0]
bbox[1] = bbox[1] + bbox[3] * trans[1]
img, trans, inv_trans = generate_patch_image(img, bbox, do_flip, scale, rot, cfg.input_img_shape, cv2.INTER_LINEAR)
img = np.clip(img * color_scale[None,None,:], 0, 255)
img = transform(img.astype(np.float32))/255.0
img = img[None,:,:,:]
inputs = {"img": img.to('cuda:0')}
targets = {}
meta_info = {}
with torch.no_grad():
out = model(inputs, targets, meta_info, None) The preprocessing steps are based on the augmentation function. Finally, I visualize the results using the code from InterHand2.6M # joint set information is in annotations/skeleton.txt
joint_num = 21 # single hand
joint_type = {'right': np.arange(0,joint_num), 'left': np.arange(joint_num,joint_num*2)}
skeleton = load_skeleton(osp.join('data/InterHand/annotations/skeleton.txt'), joint_num*2)
img = img[0].cpu().numpy().transpose(1,2,0) # cfg.input_img_shape[1], cfg.input_img_shape[0], 3
joint_coord = out['joint_coord'][0].cpu().numpy() # x,y pixel, z root-relative discretized depth
rel_root_depth = out['rel_root_depth'][0].cpu().numpy() # discretized depth
hand_type = out['hand_type'][0].cpu().numpy() # handedness probability
# restore joint coord to original image space and continuous depth space
joint_coord[:,0] = joint_coord[:,0] / cfg.output_hm_shape[2] * cfg.input_img_shape[1]
joint_coord[:,1] = joint_coord[:,1] / cfg.output_hm_shape[1] * cfg.input_img_shape[0]
joint_coord[:,:2] = np.dot(inv_trans, np.concatenate((joint_coord[:,:2], np.ones_like(joint_coord[:,:1])),1).transpose(1,0)).transpose(1,0)
joint_coord[:,2] = (joint_coord[:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2)
# restore right hand-relative left hand depth to continuous depth space
rel_root_depth = (rel_root_depth/cfg.output_root_hm_shape * 2 - 1) * (cfg.bbox_3d_size_root/2)
# right hand root depth == 0, left hand root depth == rel_root_depth
joint_coord[joint_type['left'],2] += rel_root_depth
# handedness
joint_valid = np.zeros((joint_num*2), dtype=np.float32)
right_exist = False
if hand_type[0] > 0.5:
right_exist = True
joint_valid[joint_type['right']] = 1
left_exist = False
if hand_type[1] > 0.5:
left_exist = True
joint_valid[joint_type['left']] = 1
print('Right hand exist: ' + str(right_exist) + ' Left hand exist: ' + str(left_exist))
# visualize joint coord in 2D space
filename = f"result_{img_path.split('.')[0]}.jpg"
vis_img = original_img.copy()[:,:,::-1].transpose(2,0,1)
vis_img = vis_keypoints(vis_img, joint_coord, joint_valid, skeleton, filename, vis_dir='.') Running on the test image from Interhand2.6M I get the following result: I'm closing this issue now |
@kevinhartyanyi |
Thank you @kevinhartyanyi |
Hi,
I read the paper and it looks interesting.
I was wondering how you would run this model on a single image. The provided demo code seems to only run on the validation set.
So if I have an image for which I have the bounding box/segmentation data of the hands, then how should I preprocess this for the model?
Given that this code is based on InterHand2.6M I tried to do something similar to what they have in their demo code:
Where process_bbox and generate_patch_image are from InterHand2.6M however, it didn't work out for me. The model always returned None. My guess is that it requires the bbox to be processed differently, looking at the code I think it has to do something with the "segm_256" key in the "targets" dict, but not sure what would be the easiest way.
I would appreciate some help or directions on what would be the simplest way to write a script that runs the model on a single image.
Best regards,
Kevin
The text was updated successfully, but these errors were encountered: