Skip to content

Latest commit

 

History

History
55 lines (40 loc) · 5.26 KB

benchmark.md

File metadata and controls

55 lines (40 loc) · 5.26 KB

Highlights

  • The training sets contains all samples from S1, S5, S6 and S7, including 80k {image,2D,3D} triplets.
  • The test set contains all samples from S8, including 20k triplets.
  • Methods with* output normalized predictions.
  • Results are presented in MPJPE (Mean Per Joint Position Error) metric in mm. We report following:
    • MPJPE for the whole-body, the body (keypoint 1-23), the face (keypoint 24-91) and the hands (keypoint 92-133) when whole-body aligned with the pelvis.
    • MPJPE for the face when it is centered on the nose, i.e.aligned with keypoint 1,
    • MPJPE for the hands when hands are centered on the wrist, i.e left hand aligned with keypoint 92 and right hand aligned with keypoint 113.
  • Unless stated otherwise, results are pelvis aligned.

We use the same layout from COCO-WholeBody: Image source.

1. Results for 2D → 3D task

Method whole-body body face nose-aligned face hand wrist-aligned hand ckpt
CanonPose [2]* 186.7 193.7 188.4 24.6 180.2 48.9 ckpt
SimpleBaseline [1]* 125.4 125.7 115.9 24.6 140.7 42.5 ckpt
CanonPose [2] with 3D supervision* 117.7 117.5 112.0 17.9 126.9 38.3 ckpt
Large SimpleBaseline [1]* 112.3 112.6 110.6 14.6 114.8 31.7 ckpt
Jointformer [3] 88.3 84.9 66.5 17.8 125.3 43.7 ckpt

2. Results for I2D → 3D task

Method whole-body body face nose-aligned face hand wrist-aligned hand ckpt
CanonPose [2]* 285.0 264.4 319.7 31.9 240.0 56.2 ckpt
SimpleBaseline [1]* 268.8 252.0 227.9 34.0 344.3 83.4 ckpt
CanonPose [2] with 3D supervision* 163.6 155.9 161.3 22.2 171.4 47.4 ckpt
Large SimpleBaseline [1]* 131.4 131.6 120.6 19.8 148.8 44.8 ckpt
Jointformer [3] 109.2 103.0 82.4 19.8 155.9 53.5 ckpt

3. Results for RGB → 3D task

Method whole-body body face nose-aligned face hand wrist-aligned hand ckpt
SHN [4] + SimpleBaseline [1]* 182.5 189.6 138.7 32.5 249.4 64.3 ckpt ckpt
CPN [5] + Jointformer [3] 132.6 142.8 91.9 20.7 192.7 56.9 ckpt ckpt
Resnet50 [6] 166.7 151.6 123.6 26.3 244.9 63.1 ckpt

References

[1] Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. A simple yet effective baseline for 3d human pose estimation. ICCV, 2017.
[2] Bastian Wandt, Marco Rudolph, Petrissa Zell, Helge Rhodin, and Bodo Rosenhahn. Canonpose: Self-supervised monocular 3D human pose estimation in the wild. CVPR, 2021.
[3] Sebastian Lutz, Richard Blythman, Koustav Ghosal, Matthew Moynihan, Ciaran Simms, and Aljosa Smolic. Jointformer: Single-frame lifting transformer with error prediction and refinement for 3d human pose estimation. ArXiv, 2022.
[4] Alejandro Newell, Kaiyu Yang, and Jia Deng. Stacked hourglass networks for human pose estimation. ECCV, 2016.
[5] Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. Cascaded pyramid network for multi-person pose estimation. CVPR, 2017.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2015.