Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

✨ News

June 3, 2025: We are preparing the v1.2 release, featuring enhanced character generation. Stay tuned!
May 30, 2025: 🤯 We have released both v1.0 and v1.1. The new model offers even greater speed compared to FlashAttention-2, with 12.2× faster forward pass and 19.7× faster backward pass, resulting in nearly 2× inference speedup over v1.0.
May 30, 2025: 🔨 Release inference code and model.
May 26, 2025: 🎁 Release live demo on 🤗 Hugging Face.
May 26, 2025: 🚀 Release paper and project page.

📝 Abstract

Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce Direct3D‑S2, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the Spatial Sparse Attention (SSA) mechanism, which greatly enhances the efficiency of Diffusion Transformer (DiT) computations on sparse volumetric data. SSA allows the model to effectively process large token sets within sparse volumes, substantially reducing computational overhead and achieving a 3.9× speedup in the forward pass and a 9.6× speedup in the backward pass. Our framework also includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. Compared to previous methods with heterogeneous representations in 3D VAE, this unified design significantly improves training efficiency and stability. Our model is trained on public available datasets, and experiments demonstrate that Direct3D‑S2 not only surpasses state-of-the-art methods in generation quality and efficiency, but also enables training at 1024³ resolution with just 8 GPUs, a task typically requiring at least 32 GPUs for volumetric representations at 256³ resolution, thus making gigascale 3D generation both practical and accessible.

🌟 Highlight

Gigascale 3D Generation: Direct3D-S2 enables training at 1024³ resolution with only 8 GPUs.
Spatial Sparse Attention (SSA): A novel attention mechanism designed for sparse volumetric data, enabling efficient processing of large token sets.
Unified Sparse VAE: A variational autoencoder that maintains a consistent sparse volumetric format across input, latent, and output stages, improving training efficiency and stability.

🚀 Getting Started

Installation

✅ Tested Environment

💡 If you're setting up on Windows, check out issue #11 and issue #12. Big thanks to the contributors who helped get Direct3D-S2 working on Windows!

System: Ubuntu 22.04
CUDA Toolkit: CUDA 12.1
PyTorch: Install torch and torchvision first.
Make sure the PyTorch CUDA version matches your installed CUDA Toolkit.
```
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
```

Torchsparse: follow the offical guide or:

git clone https://github.com/mit-han-lab/torchsparse
cd torchsparse && python -m pip install .

Install dependencies:

git clone https://github.com/DreamTechAI/Direct3D-S2.git

cd Direct3D-S2

pip install -r requirements.txt

pip install -e .

Usage

Note: Generating at 512 resolution requires at least 10GB of VRAM, and 1024 resolution needs around 24GB. We don’t recommend generating models at 512 resolution, as it’s just an intermediate step of the 1024 model and the quality is noticeably lower.

from direct3d_s2.pipeline import Direct3DS2Pipeline
pipeline = Direct3DS2Pipeline.from_pretrained(
  'wushuang98/Direct3D-S2', 
  subfolder="direct3d-s2-v-1-1"
)
pipeline.to("cuda:0")

mesh = pipeline(
  'assets/test/13.png', 
  sdf_resolution=1024, # 512 or 1024
  remove_interior=True,
  remesh=False, # Switch to True if you need to reduce the number of triangles.
)["mesh"]

mesh.export('output.obj')

Web Demo

We provide a Gradio web demo for Direct3D-S2, which allows you to generate 3D meshes from images interactively.

python app.py

🤗 Acknowledgements

Thanks to the following repos for their great work, which helps us a lot in the development of Direct3D-S2:

📄 License

Direct3D-S2 is released under the MIT License. See LICENSE for details.

📖 Citation

If you find our work useful, please consider citing our paper:

@article{wu2025direct3ds2gigascale3dgeneration,
  title={Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention}, 
  author={Shuang Wu and Youtian Lin and Feihu Zhang and Yifei Zeng and Yikang Yang and Yajie Bao and Jiachen Qian and Siyu Zhu and Philip Torr and Xun Cao and Yao Yao},
  journal={arXiv preprint arXiv:2505.17412},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
direct3d_s2		direct3d_s2
docker		docker
third_party/voxelize		third_party/voxelize
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

✨ News

📝 Abstract

🌟 Highlight

🚀 Getting Started

Installation

✅ Tested Environment

Usage

Web Demo

🤗 Acknowledgements

📄 License

📖 Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

DreamTechAI/Direct3D-S2

Folders and files

Latest commit

History

Repository files navigation

Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

✨ News

📝 Abstract

🌟 Highlight

🚀 Getting Started

Installation

✅ Tested Environment

Usage

Web Demo

🤗 Acknowledgements

📄 License

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages