AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

Lihan Jiang*, Yucheng Mao*, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, Bo Dai^†

News

[2024.07.08] We thank Alex Nasa for providing us with an excellent Huggingface demo.

[2024.06.30] We release the training & inference code.

Overview

Starting from a set of uncalibrated images, a transformer-based geometry encoder is followed by three decoder heads: F_G, F_D, and F_C, which respectively predict the Gaussian parameters (μ, σ, r, s, c), the depth map D, and the camera poses p. These outputs are used to construct a set of pixel-wise 3D Gaussians, which is then voxelized into pre-voxel 3D Gaussians with the proposed Differentiable Voxelization module. From the voxelized 3D Gaussians, multi-view images and depth maps are subsequently rendered. The rendered images are supervised using an RGB loss against the ground truth image, while the rendered depth maps, along with the decoded depth D and camera poses p, are used to compute geometry losses. The geometries are supervised by pseudo-geometry priors obtained by the pretrained VGGT.

Installation

Our code relies on Python 3.10+, and is developed based on PyTorch 2.2.0 and CUDA 12.1, but it should work with other Pytorch/CUDA versions as well.

Clone AnySplat.

git clone https://github.com/OpenRobotLab/AnySplat.git
cd AnySplat

Create the environment, here we show an example using conda.

conda create -y -n anysplat python=3.10
conda activate anysplat
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Quick Start


from pathlib import Path
import torch
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from src.misc.image_io import save_interpolated_video
from src.model.model.anysplat import AnySplat
from src.utils.image import process_image

# Load the model from Hugging Face
model = AnySplat.from_pretrained("lhjiang/anysplat")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
for param in model.parameters():
    param.requires_grad = False

# Load and preprocess example images (replace with your own image paths)
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"] 
images = [process_image(image_name) for image_name in image_names]
images = torch.stack(images, dim=0).unsqueeze(0).to(device) # [1, K, 3, 448, 448]
b, v, _, h, w = images.shape

# Run Inference
gaussians, pred_context_pose = model.inference((images+1)*0.5)

pred_all_extrinsic = pred_context_pose['extrinsic']
pred_all_intrinsic = pred_context_pose['intrinsic']
save_interpolated_video(pred_all_extrinsic, pred_all_intrinsic, b, h, w, gaussians, image_folder, model.decoder)

Training

# single node:
python src/main.py +experiment=dl3dv trainer.num_nodes=1

# multi nodes:
export GPU_NUM=8
export NUM_NODES=2
torchrun \
  --nnodes=$NUM_NODES \
  --nproc_per_node=$GPU_NUM \
  --rdzv_id=test \
  --rdzv_backend=c10d \
  --rdzv-endpoint=$MASTER_ADDR:$MASTER_PORT \
  -m src.main +experiment=multi-dataset +hydra.job.config.store_config=false

Here, we provide three example datasets (CO3Dv2, DL3DV and ScanNet++), each representing a different training view sampling strategy. You can use them as templates and add any other datasets you prefer.

Post Optimization

python src/post_opt/simple_trainer.py default --data_dir ...

Evaluation

# Novel View Synthesis
python src/eval_nvs.py --data_dir ...

# Pose Estimation
python src/eval_pose.py --co3d_dir ... --co3d_anno_dir ...

Dataset Preprocessing

We use the original data from the DL3DV datasets. For other datasets, please follow CUT3R's data preprocessing instructions to prepare training data.

Demo

python demo_gradio.py

This will automatically download the pre-trained model weights and config from Hugging Face Model.

The demo is a Gradio interface where you can upload images or a video and visualize the reconstructed 3D Gaussian Splat, along with the rendered RGB and depth videos. The trajectory of the rendered video is obtained by interpolating the estimated input image poses.

Citation

If you find our work helpful, please consider citing:

@article{jiang2025anysplat,
  title={AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views},
  author={Jiang, Lihan and Mao, Yucheng and Xu, Linning and Lu, Tao and Ren, Kerui and Jin, Yichen and Xu, Xudong and Yu, Mulin and Pang, Jiangmiao and Zhao, Feng and others},
  journal={arXiv preprint arXiv:2505.23716},
  year={2025}
}

Acknowledgement

We thank all authors behind these repositories for their excellent work: VGGT, NoPoSplat, CUT3R and gsplat.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
config		config
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_gradio.py		demo_gradio.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

News

Overview

Installation

Quick Start

Training

Post Optimization

Evaluation

Dataset Preprocessing

Demo

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

OpenRobotLab/AnySplat

Folders and files

Latest commit

History

Repository files navigation

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

News

Overview

Installation

Quick Start

Training

Post Optimization

Evaluation

Dataset Preprocessing

Demo

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages