@misc{chu2025fins,
title = {Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation},
author = {Wei-Teng Chu and Tianyi Zhang and Matthew Johnson-Roberson and Weiming Zhi},
year = {2025},
eprint = {2509.20681},
archivePrefix = {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2509.20681},
}FINS: Fast Image-to-Neural Surface reconstructs high-fidelity signed distance fields (SDFs) from as little as a single RGB image in just a few seconds.
Unlike traditional neural surface methods that require dense multi-view supervision and long optimization times, FINS leverages pretrained 3D foundation models to generate geometric priors, combined with multi-resolution hash encoding and lightweight SDF heads for rapid convergence.
The resulting implicit representation enables real-time surface reconstruction and supports downstream robotics tasks such as motion planning, obstacle avoidance, and surface following.
FINS bridges single-image perception and fast neural implicit modeling, making SDF construction practical for real-world robotic systems.
git clone https://github.com/waynechu1109/FINS.git
cd FINSconda create -n FINS python=3.10
conda activate FINS
pip install -r requirements.txt# pull docker image from docker hub
sudo docker pull waynechu1109/droplab_research:latest
# run docker
docker run -it --gpus all \
-p 8000:8000 \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /home/waynechu/FINS:/FINS \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
--name FINS \
waynechu1109/droplab_research:latest /bin/bash
pip install -r requirements.txt- We used DTU Training dataset for experiments. Please download the preprocessed DTU dataset provided by MVSNet.
- The data should be prepared in the structure as:
data/
├── dtu_105_09/
│ └── dtu_105_09.png
├── dtu_108_32/
│ └── dtu_108_32.png
└── ...- You can also try custom data.
Clone VGGT first.
# clone VGGT for preprocess data
mkdir deps && cd deps
git clone https://github.com/facebookresearch/vggt.git
cd ..The image should be placed in data/<image_name>/<image_name>.png. VGGT can generate point cloud with only a single image.
cd tools
# vggt preprocess
python3 vggt_pointcloud_generate.py --file dtu_118_60 --thres 65 --max_points 90000- To tune the confidence threshold in percentage, set
--thres. - When the scene is concave, set
--concave true. The direction of point clouds' normals are important for the training. - When the computing resource is limited, set
--max_points. The default value is 200,000. You can also tune higher if higher mesh quality is needed.
For more options, see python3 vggt_pointcloud_generate.py -h.
After preprocess, you can find the preprocessed point cloud file in data/vggt_preprocessed/<file_name>. It is convenient to view preprocessed point clouds with F3D. You can simply install it with sudo apt install f3d.
The script for the whole pipeline can be found in scripts/experiment.sh, which include the commands for both training and inferring. If you want to run series trainging (for example, multiple scenes at a single run), see scripts/run_exp_series.sh.
# Start series training
./scripts/run_exp_series.shThe results can be found in output/.
If you find this repository useful, please cite our arXiv paper:
@misc{chu2025fins,
title = {Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation},
author = {Wei-Teng Chu and Tianyi Zhang and Matthew Johnson-Roberson and Weiming Zhi},
year = {2025},
eprint = {2509.20681},
archivePrefix = {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2509.20681},
}

