Highlights

The training sets contains all samples from S1, S5, S6 and S7, including 80k {image,2D,3D} triplets.
The test set contains all samples from S8, including 20k triplets.
Methods with^* output normalized predictions.
Results are presented in MPJPE (Mean Per Joint Position Error) metric in mm. We report following:
- MPJPE for the whole-body, the body (keypoint 1-23), the face (keypoint 24-91) and the hands (keypoint 92-133) when whole-body aligned with the pelvis.
- MPJPE for the face when it is centered on the nose, i.e.aligned with keypoint 1,
- MPJPE for the hands when hands are centered on the wrist, i.e left hand aligned with keypoint 92 and right hand aligned with keypoint 113.
Unless stated otherwise, results are pelvis aligned.

We use the same layout from COCO-WholeBody: Image source.

1. Results for 2D → 3D task

Method	whole-body	body	face	nose-aligned face	hand	wrist-aligned hand	ckpt
CanonPose [2]^*	186.7	193.7	188.4	24.6	180.2	48.9	ckpt
SimpleBaseline [1]^*	125.4	125.7	115.9	24.6	140.7	42.5	ckpt
CanonPose [2] with 3D supervision^*	117.7	117.5	112.0	17.9	126.9	38.3	ckpt
Large SimpleBaseline [1]^*	112.3	112.6	110.6	14.6	114.8	31.7	ckpt
Jointformer [3]	88.3	84.9	66.5	17.8	125.3	43.7	ckpt

2. Results for I2D → 3D task

Method	whole-body	body	face	nose-aligned face	hand	wrist-aligned hand	ckpt
CanonPose [2]^*	285.0	264.4	319.7	31.9	240.0	56.2	ckpt
SimpleBaseline [1]^*	268.8	252.0	227.9	34.0	344.3	83.4	ckpt
CanonPose [2] with 3D supervision^*	163.6	155.9	161.3	22.2	171.4	47.4	ckpt
Large SimpleBaseline [1]^*	131.4	131.6	120.6	19.8	148.8	44.8	ckpt
Jointformer [3]	109.2	103.0	82.4	19.8	155.9	53.5	ckpt

3. Results for RGB → 3D task

Method	whole-body	body	face	nose-aligned face	hand	wrist-aligned hand	ckpt
SHN [4] + SimpleBaseline [1]^*	182.5	189.6	138.7	32.5	249.4	64.3	ckpt ckpt
CPN [5] + Jointformer [3]	132.6	142.8	91.9	20.7	192.7	56.9	ckpt ckpt
Resnet50 [6]	166.7	151.6	123.6	26.3	244.9	63.1	ckpt

References

[1] Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. A simple yet effective baseline for 3d human pose estimation. ICCV, 2017.
[2] Bastian Wandt, Marco Rudolph, Petrissa Zell, Helge Rhodin, and Bodo Rosenhahn. Canonpose: Self-supervised monocular 3D human pose estimation in the wild. CVPR, 2021.
[3] Sebastian Lutz, Richard Blythman, Koustav Ghosal, Matthew Moynihan, Ciaran Simms, and Aljosa Smolic. Jointformer: Single-frame lifting transformer with error prediction and refinement for 3d human pose estimation. ArXiv, 2022.
[4] Alejandro Newell, Kaiyu Yang, and Jia Deng. Stacked hourglass networks for human pose estimation. ECCV, 2016.
[5] Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. Cascaded pyramid network for multi-person pose estimation. CVPR, 2017.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.md

benchmark.md

Highlights

1. Results for 2D → 3D task

2. Results for I2D → 3D task

3. Results for RGB → 3D task

References

Files

benchmark.md

Latest commit

History

benchmark.md

File metadata and controls

Highlights

1. Results for 2D → 3D task

2. Results for I2D → 3D task

3. Results for RGB → 3D task

References