FINS: Fast Image-to-Neural Surface

Wei-Teng Chu¹, Tianyi Zhang², Matthew Johnson-Roberson⁵, Weiming Zhi^3,4,5

¹ Dept. of Electrical Engineering, Stanford University · ² Aurora · ³ School of Computer Science, The University of Sydney · ⁴ Australian Centre for Robotics · ⁵ College of Connected Computing, Vanderbilt University

@misc{chu2025fins,
  title         = {Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation}, 
  author        = {Wei-Teng Chu and Tianyi Zhang and Matthew Johnson-Roberson and Weiming Zhi},
  year          = {2025},
  eprint        = {2509.20681},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2509.20681}, 
}

Overview

FINS: Fast Image-to-Neural Surface reconstructs high-fidelity signed distance fields (SDFs) from as little as a single RGB image in just a few seconds.

Unlike traditional neural surface methods that require dense multi-view supervision and long optimization times, FINS leverages pretrained 3D foundation models to generate geometric priors, combined with multi-resolution hash encoding and lightweight SDF heads for rapid convergence.

The resulting implicit representation enables real-time surface reconstruction and supports downstream robotics tasks such as motion planning, obstacle avoidance, and surface following.

FINS bridges single-image perception and fast neural implicit modeling, making SDF construction practical for real-world robotic systems.

Prerequisites

This repository is designed to run with Docker only. Please make sure the following are ready before installation:

Linux environment (validated on Ubuntu) with an NVIDIA GPU.
Docker Engine.
NVIDIA Driver installed on host.
NVIDIA Container Toolkit (--gpus all support).

Optional:

f3d for quick point cloud visualization.

Quick Start

git clone https://github.com/waynechu1109/FINS.git
cd FINS

Docker

# pull docker image from docker hub
sudo docker pull waynechu1109/droplab_research:latest

# run docker  
docker run -it --gpus all \
  -p 8000:8000 \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  -v $HOME/FINS:/FINS \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  --name FINS \
  waynechu1109/droplab_research:latest /bin/bash

Dataset Preparation

We used DTU Training dataset for experiments. Please download the preprocessed DTU dataset provided by MVSNet.
The data should be prepared in the structure as:

data/
├── dtu_105_09/
│   └── dtu_105_09.png
├── dtu_108_32/
│   └── dtu_108_32.png
└── ...

You can also try custom data.

Image Preprocess

Clone VGGT first.

# clone VGGT for preprocess data
mkdir deps && cd deps
git clone https://github.com/facebookresearch/vggt.git
cd ..

The image should be placed in data/<image_name>/<image_name>.png. VGGT can generate point cloud with only a single image.

cd tools

# vggt preprocess
python3 vggt_pointcloud_generate.py --file dtu_118_60 --thres 65 --max_points 90000

To tune the confidence threshold in percentage, set --thres.
When the scene is concave, set --concave true. The direction of point clouds' normals are important for the training.
When the computing resource is limited, set --max_points. The default value is 200,000. You can also tune higher if higher mesh quality is needed.

For more options, see python3 vggt_pointcloud_generate.py -h.

After preprocess, you can find the preprocessed point cloud file in data/vggt_preprocessed/<file_name>. It is convenient to view preprocessed point clouds with F3D. You can simply install it with sudo apt install f3d.

Training and Inferring

The script for the whole pipeline can be found in scripts/experiment.sh, which include the commands for both training and inferring. If you want to run series training (for example, multiple scenes at a single run), see scripts/run_exp_series.sh.

# Start series training
./scripts/run_exp_series.sh

The results can be found in output/.

Acknowledgements

This project would not have been possible without prior work such as VGGT and DUSt3R. We thank the authors of these works and the broader research community for making this project possible.

Citation

If you find this repository useful, please cite our arXiv paper:

@misc{chu2025fins,
  title         = {Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation}, 
  author        = {Wei-Teng Chu and Tianyi Zhang and Matthew Johnson-Roberson and Weiming Zhi},
  year          = {2025},
  eprint        = {2509.20681},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2509.20681}, 
}

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
media		media
schedule		schedule
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FINS: Fast Image-to-Neural Surface

Table of Contents

Overview

Prerequisites

Quick Start

Docker

Dataset Preparation

Image Preprocess

Training and Inferring

Acknowledgements

Citation

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FINS: Fast Image-to-Neural Surface

Table of Contents

Overview

Prerequisites

Quick Start

Docker

Dataset Preparation

Image Preprocess

Training and Inferring

Acknowledgements

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages