*Equal Contribution βCorresponding Author
Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China University of Mining and Technology, ETH ZΓΌrich, Shanghai Jiao Tong University
Fast-SAM3D accelerates SAM3D by up to 2.67Γ while maintaining geometric fidelity and semantic consistency.
Fast-SAM3D is a training-free acceleration framework for single-view 3D reconstruction that delivers up to 2.67Γ speedup with negligible quality loss. Our approach dynamically aligns computation with instantaneous generation complexity through three heterogeneity-aware mechanisms.
- [2026.02.05] π Paper and code released! Check out our paper.
-
π Training-Free Acceleration: Achieves 2.67Γ speedup for single-object generation and 2.01Γ for scene generation without any model retraining.
-
π― Heterogeneity-Aware Design: Addresses multi-level heterogeneity in 3D generation pipelines: kinematic distinctiveness, intrinsic sparsity, and spectral variance.
-
π§ Plug-and-Play Modules: Three seamless integration modules:
- Modality-Aware Step Caching: Decouples shape evolution from sensitive layout updates
- Joint Spatiotemporal Token Carving: Concentrates refinement on high-entropy regions
- Spectral-Aware Token Aggregation: Adapts decoding resolution to geometric complexity
-
β¨ Quality Preservation: Maintains or even exceeds original model's geometric fidelity (F-Score: 92.59 vs. 92.34).
Overview of Fast-SAM3D. Our approach integrates three heterogeneity-aware modules: (1) Modality-Aware Step Caching for decoupling structural evolution from layout updates; (2) Joint Spatiotemporal Token Carving for eliminating redundancy; (3) Spectral-Aware Token Aggregation for adaptive decoding resolution.
The Sparse Structure Generator exhibits modality heterogeneity: shape tokens evolve smoothly while layout tokens are volatile. We propose:
- Linear Extrapolation for shape tokens using finite-difference prediction
- Momentum-Anchored Smoothing for layout tokens to suppress high-frequency jitter
The SLaT Generator shows intrinsic refinement sparsity: updates concentrate on high-entropy regions. We design:
- Unified Saliency Potential combining temporal dynamics (magnitude & abruptness) and spatial frequency
- Dynamic Adaptive Step Caching with curvature-aware trajectory approximation
The Mesh Decoder processes dense token sequences. We introduce:
- Spectral Complexity Analysis using High-Frequency Energy Ratio (HFER)
- Instance-Adaptive Aggregation with aggressive compression for simple shapes and detail preservation for complex geometries
- Python >= 3.9
- PyTorch >= 2.0
- CUDA >= 11.8
- SAM3D dependencies
# create fastsam3d environment
mamba env create -f environments/default.yml
mamba activate fastsam3d
# for pytorch/cuda dependencies
export PIP_EXTRA_INDEX_URL="https://pypi.ngc.nvidia.com https://download.pytorch.org/whl/cu121"
# install fastsam3d and core dependencies
pip install -e '.[dev]'
pip install -e '.[p3d]' # pytorch3d dependency on pytorch is broken, this 2-step approach solves it
# for inference
export PIP_FIND_LINKS="https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu121.html"
pip install -e '.[inference]'
# patch things that aren't yet in official pip packages
./patching/hydra # https://github.com/facebookresearch/hydra/pull/2863
If you encounter difficulties during installation, please refer to the more detailed /doc/Setup.md documentation.
From HuggingFace
hf auth login after generating an access token).
pip install 'huggingface-hub[cli]<1.0'
TAG=hf
hf download \
--repo-type model \
--local-dir checkpoints/${TAG}-download \
--max-workers 1 \
facebook/sam-3d-objects
mv checkpoints/${TAG}-download/checkpoints checkpoints/${TAG}
rm -rf checkpoints/${TAG}-download# Generate 3D from single image + mask
cd notebook
python infer.py \
--image_path examples/input.jpg \
--mask_index 1 \
--output_dir outputs/ \
--enable_acceleration# Full Fast-SAM3D acceleration (default)
cd notebook
python infer.py \
--image_path examples/image.png \
--mask_index 1\
--enable_ss_cache \
--enable_slat_carving \
--enable_mesh_aggregation
# Customize acceleration strength
cd notebook
python infer.py \
--image_path examples/image.png \
--mask_index 1 \
--output_dir /data3/wmq/Fast-sam3d-objects/Look \
--ss_cache_stride 3 \
--ss_warmup 2 \
--ss_order 1 \
--ss_momentum_beta 0.5 \
--slat_thresh 1.5 \
--slat_warmup 3 \
--slat_carving_ratio 0.1 \
--mesh_spectral_threshold_low 0.5 \
--mesh_spectral_threshold_high 0.7 \
--enable_accelerationcd notebook
python infer_scene.py \
--image_dir examples \
--output_dir outputs/ \
--enable_acceleration
βββ example/
β βββ image.png
β βββ 0.png
β βββ 1.png
| Method | Visual β | CD β | F1@0.05 β | vIoU β | 3D-IoU β | Scene Time β | Speed β |
|---|---|---|---|---|---|---|---|
| SAM3D | 0.369 | 0.022 | 92.34 | 0.543 | 0.403 | 462.3s | 1.00Γ |
| Random Drop | 0.264 | 0.030 | 83.52 | 0.327 | 0.094 | 402.2s | 1.15Γ |
| Uniform Merge | 0.329 | 0.023 | 91.48 | 0.540 | 0.367 | 366.8s | 1.26Γ |
| Fast3Dcache | 0.348 | 0.022 | 91.31 | 0.505 | 0.051 | 443.3s | 1.04Γ |
| TaylorSeer | 0.344 | 0.028 | 90.95 | 0.504 | 0.374 | 265.6s | 1.74Γ |
| EasyCache | 0.342 | 0.028 | 87.06 | 0.432 | 0.186 | 244.9s | 1.89Γ |
| Fast-SAM3D | 0.350 | 0.022 | 92.59 | 0.552 | 0.375 | 229.7s | 2.01Γ |
Fast-SAM3D produces results perceptually indistinguishable from SAM3D while generic strategies suffer from structural collapse (Random Drop) or semantic drift (TaylorSeer).
If you find this work helpful, please consider citing:
@misc{feng2026fastsam3d3dfyimagesfaster,
title={Fast-SAM3D: 3Dfy Anything in Images but Faster},
author={Weilun Feng and Mingqiang Wu and Zhiliang Chen and Chuanguang Yang and Haotong Qin and Yuqi Li and Xiaokun Liu and Guoxin Fan and Zhulin An and Libo Huang and Yulun Zhang and Michele Magno and Yongjun Xu},
year={2026},
eprint={2602.05293},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.05293},
}This project is built upon the excellent SAM3D framework. We thank the authors for their outstanding work in open-world 3D reconstruction.
This project is released under the MIT License.
For questions or suggestions, please open an issue or contact:
-
Weilun Feng: fengweilun24s@ict.ac.cn
-
Mingqiang Wu wumingqiang25e@ict.ac.cn
-
Chuanguang Yang: yangchuanguang@ict.ac.cn
-
Zhulin An: anzhulin@ict.ac.cn
β Star us on GitHub if you find this project helpful!



