Fast SAM 3D Body

Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Timing Yang¹, Sicheng He¹, Hongyi Jing¹, Jiawei Yang¹, Zhijian Liu^2,3, Chuhang Zou⁴^†, Yue Wang^1,3^†

¹USC Physical Superintelligence (PSI) Lab ²University of California, San Diego ³NVIDIA ⁴Meta Reality Labs

^† Joint corresponding authors

Speed-accuracy overview of Fast SAM 3D Body. Top left: Qualitative results on in-the-wild images show our framework preserves high-fidelity reconstruction. Top right: Our method achieves up to a 10.25x end-to-end speedup over SAM 3D Body and replaces the iterative MHR-to-SMPL bottleneck with a 10,000x faster neural mapping. Bottom: Our system enables real-time humanoid robot control from a single RGB stream at ~65 ms per frame on an NVIDIA RTX 5090.

Abstract

SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-time application. We present Fast SAM 3D Body, a training-free acceleration framework that reformulates the 3DB inference pathway to achieve interactive rates. By decoupling serial spatial dependencies and applying architecture-aware pruning, we enable parallelized multi-crop feature extraction and streamlined transformer decoding. Moreover, to extract the joint-level kinematics (SMPL) compatible with existing humanoid control and policy learning frameworks, we replace the iterative mesh fitting with a direct feedforward mapping, accelerating this specific conversion by over 10,000x. Overall, our framework delivers up to a 10.9x end-to-end speedup while maintaining on-par reconstruction fidelity, even surpassing 3DB on benchmarks such as LSPET. We demonstrate its utility by deploying Fast SAM 3D Body in a vision-only teleoperation system that enables real-time humanoid control and the direct collection of manipulation policies from a single RGB stream.

Qualitative comparison. The original SAM 3D Body (left) and our Fast variant (right) yield visually comparable mesh reconstructions across diverse poses and multi-person scenes on 3DPW and EMDB.

Getting Started

Environment

Please refer to SAM 3D Body for environment setup, or use our setup script:

bash setup_env.sh
conda activate fast_sam_3d_body

Checkpoints

checkpoints/
├── sam-3d-body-dinov3/       # Auto-downloaded from HuggingFace on first run
│   ├── model.ckpt
│   └── assets/
│       └── mhr_model.pt
├── yolo/                     # Place YOLO-Pose weights here
│   ├── yolo11m-pose.pt
│   └── yolo11m-pose.engine   # Generated by convert_yolo_pose_trt.py (optional)
└── moge_trt/                 # Generated by build_tensorrt.sh (optional)
    └── moge_dinov2_encoder_fp16.engine

Run

# Optimized (torch.compile + TensorRT)
bash run_demo.sh

TensorRT Acceleration (Optional)

# Convert all models (YOLO-Pose + MoGe encoder + DINOv3 backbone)
bash build_tensorrt.sh

# Or convert individually
python convert_yolo_pose_trt.py --model yolo11m-pose.pt --imgsz 640 --half
python convert_moge_encoder_trt.py --all
python convert_backbone_tensorrt.py --all

All generated engines are stored under ./checkpoints/.

Real-World Deployment

For instructions on running the publisher, see docs/realworld_deployment.md.

We demonstrate a real-time, vision-only teleoperation system for the Unitree G1 humanoid robot using a single RGB camera, operating at ~65 ms end-to-end latency on an NVIDIA RTX 5090.

Humanoid teleoperation. The system tracks diverse whole-body motions including upper-body gestures (a), body rotations (b-e), walking (f), wide stance (g), single-leg standing (h), squatting (i), and kneeling (j).

Humanoid policy rollout. The robot grasps a box on the table with both hands, squats down, and steps to the right. Achieving 80% task success rate with 40 demonstrations collected via our system.

Single-View vs Multi-View. Multi-view fusion resolves depth ambiguities inherent in single-view reconstruction, producing more accurate SMPL body estimates.

Citation

@article{yang2026fastsam3dbody,
  title={Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery},
  author={Yang, Timing and He, Sicheng and Jing, Hongyi and Yang, Jiawei and Liu, Zhijian and Zou, Chuhang and Wang, Yue},
  journal={arXiv preprint arXiv:2603.15603},
  year={2026}
}

Acknowledgements

This project builds upon SAM 3D Body (3DB) and Multi-HMR (MHR). We thank the original authors for releasing their models and codebases, which served as the foundation for our acceleration framework.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
checkpoints/sam-3d-body-dinov3		checkpoints/sam-3d-body-dinov3
data		data
docs		docs
mhr2smpl		mhr2smpl
mocap		mocap
notebook		notebook
sam_3d_body		sam_3d_body
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_tensorrt.sh		build_tensorrt.sh
convert_backbone_tensorrt.py		convert_backbone_tensorrt.py
convert_moge_encoder_trt.py		convert_moge_encoder_trt.py
convert_yolo_pose_trt.py		convert_yolo_pose_trt.py
debug_smpl_stream.py		debug_smpl_stream.py
demo.py		demo.py
demo_human.py		demo_human.py
profile_nsight.py		profile_nsight.py
record_realsense.py		record_realsense.py
record_realsense_multi.py		record_realsense_multi.py
run_demo.sh		run_demo.sh
run_multiview_publisher.py		run_multiview_publisher.py
run_publisher.py		run_publisher.py
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast SAM 3D Body

Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Abstract

Getting Started

Environment

Checkpoints

Run

TensorRT Acceleration (Optional)

Real-World Deployment

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

Fast SAM 3D Body

Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Abstract

Getting Started

Environment

Checkpoints

Run

TensorRT Acceleration (Optional)

Real-World Deployment

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages