Lihan Jiang*, Yucheng Mao*, Linning Xu,
Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, Bo Dai†
[2024.07.08] We thank Alex Nasa for providing us with an excellent Huggingface demo.
[2024.06.30] We release the training & inference code.
Starting from a set of uncalibrated images, a transformer-based geometry encoder is followed by three decoder heads: FG, FD, and FC, which respectively predict the Gaussian parameters (μ, σ, r, s, c), the depth map D, and the camera poses p. These outputs are used to construct a set of pixel-wise 3D Gaussians, which is then voxelized into pre-voxel 3D Gaussians with the proposed Differentiable Voxelization module. From the voxelized 3D Gaussians, multi-view images and depth maps are subsequently rendered. The rendered images are supervised using an RGB loss against the ground truth image, while the rendered depth maps, along with the decoded depth D and camera poses p, are used to compute geometry losses. The geometries are supervised by pseudo-geometry priors obtained by the pretrained VGGT.
Our code relies on Python 3.10+, and is developed based on PyTorch 2.2.0 and CUDA 12.1, but it should work with other Pytorch/CUDA versions as well.
- Clone AnySplat.
git clone https://github.com/OpenRobotLab/AnySplat.git
cd AnySplat
- Create the environment, here we show an example using conda.
conda create -y -n anysplat python=3.10
conda activate anysplat
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
from pathlib import Path
import torch
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from src.misc.image_io import save_interpolated_video
from src.model.model.anysplat import AnySplat
from src.utils.image import process_image
# Load the model from Hugging Face
model = AnySplat.from_pretrained("lhjiang/anysplat")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
for param in model.parameters():
param.requires_grad = False
# Load and preprocess example images (replace with your own image paths)
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
images = [process_image(image_name) for image_name in image_names]
images = torch.stack(images, dim=0).unsqueeze(0).to(device) # [1, K, 3, 448, 448]
b, v, _, h, w = images.shape
# Run Inference
gaussians, pred_context_pose = model.inference((images+1)*0.5)
pred_all_extrinsic = pred_context_pose['extrinsic']
pred_all_intrinsic = pred_context_pose['intrinsic']
save_interpolated_video(pred_all_extrinsic, pred_all_intrinsic, b, h, w, gaussians, image_folder, model.decoder)
# single node:
python src/main.py +experiment=dl3dv trainer.num_nodes=1
# multi nodes:
export GPU_NUM=8
export NUM_NODES=2
torchrun \
--nnodes=$NUM_NODES \
--nproc_per_node=$GPU_NUM \
--rdzv_id=test \
--rdzv_backend=c10d \
--rdzv-endpoint=$MASTER_ADDR:$MASTER_PORT \
-m src.main +experiment=multi-dataset +hydra.job.config.store_config=false
Here, we provide three example datasets (CO3Dv2, DL3DV and ScanNet++), each representing a different training view sampling strategy. You can use them as templates and add any other datasets you prefer.
python src/post_opt/simple_trainer.py default --data_dir ...
# Novel View Synthesis
python src/eval_nvs.py --data_dir ...
# Pose Estimation
python src/eval_pose.py --co3d_dir ... --co3d_anno_dir ...
We use the original data from the DL3DV datasets. For other datasets, please follow CUT3R's data preprocessing instructions to prepare training data.
python demo_gradio.py
This will automatically download the pre-trained model weights and config from Hugging Face Model.
The demo is a Gradio interface where you can upload images or a video and visualize the reconstructed 3D Gaussian Splat, along with the rendered RGB and depth videos. The trajectory of the rendered video is obtained by interpolating the estimated input image poses.
If you find our work helpful, please consider citing:
@article{jiang2025anysplat,
title={AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views},
author={Jiang, Lihan and Mao, Yucheng and Xu, Linning and Lu, Tao and Ren, Kerui and Jin, Yichen and Xu, Xudong and Yu, Mulin and Pang, Jiangmiao and Zhao, Feng and others},
journal={arXiv preprint arXiv:2505.23716},
year={2025}
}
We thank all authors behind these repositories for their excellent work: VGGT, NoPoSplat, CUT3R and gsplat.