OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Paper | Project Page

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Lening Wang*, Wenzhao Zheng* $\dagger$, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

* Equal contribution $\dagger$ Project leader

With trajectory-aware 4D generation, OccSora has the potential to serve as a world simulator for the decision-making of autonomous driving.

News

[2024/05/31] Training, evaluation, and visualization code release.
[2024/05/31] Paper released on arXiv.

Demo

Trajectory-aware Video Generation:

Scene Video Generation:

Overview

Different from most existing world models which adopt an autoregressive framework to perform next-token prediction, we propose a diffusion-based 4D occupancy generation model, OccSora, to model long-term temporal evolutions more efficiently. We employ a 4D scene tokenizer to obtain compact discrete spatial-temporal representations for 4D occupancy input and achieve high-quality reconstruction for long-sequence occupancy videos. We then learn a diffusion transformer on the spatial-temporal representations and generate 4D occupancy conditioned on a trajectory prompt. OccSora can generate 16s-videos with authentic 3D layout and temporal consistency, demonstrating its ability to understand the spatial and temporal distributions of driving scenes.

Getting Started

Installation

Create a conda environment with Python version 3.8.0
Install all the packages in environment.yaml
Please refer to mmdetection3d about the installation of mmdetection3d

Preparing

Create a soft link from data/nuscenes to your_nuscenes_path
Prepare the gts semantic occupancy introduced in [Occ3d]
Download the generated train/val pickle files and put them in data/

[nuscenes_infos_train_temporal_v3_scene.pkl]

[nuscenes_infos_val_temporal_v3_scene.pkl]

The dataset should be organized as follows:

OccSora/data
    nuscenes                 -    downloaded from www.nuscenes.org
        lidarseg
        maps
        samples
        sweeps
        v1.0-trainval
        gts                  -    download from Occ3d
    nuscenes_infos_train_temporal_v3_scene.pkl
    nuscenes_infos_val_temporal_v3_scene.pkl

Training

Train the VQVAE on A100 with 80G GPU memory.

python train_1.py --py-config config/train_vqvae.py --work-dir out/vqvae

Generate training Token data using the vqvae results

python step02.py --py-config config/train_vqvae.py --work-dir out/vqvae

Train the OccSora on A100 with 80G GPU memory.

torchrun --nnodes=1 --nproc_per_node=8 train_2.py --model DiT-XL/2 --data-path /path

Evaluation

Evaluate the model on A100 with 80G GPU memory.

The token is obtained by denoising the noise samples_array.npy

python sample.py --model DiT-XL/2 --image-size 256 --ckpt "/results/001-DiT-XL-2/checkpoints/1200000.pt"

Visualization

python visualize_demo.py --py-config config/train_vqvae.py --work-dir out/vqvae

Related Projects

Our code is based on OccWorld and DiT.

Also thanks to these excellent open-sourced repos: TPVFormer MagicDrive BEVFormer

Citation

If you find this project helpful, please consider citing the following paper:

  @article{wang2024occsora,
    title={OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving},
    author={Wang, Lening and Zheng, Wenzhao and Ren, Yilong and Jiang, Han and Cui, Zhiyong and Yu, Haiyang and Lu, Jiwen},
    journal={arXiv preprint arXiv:2405.20337},
    year={2024}
	}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
assets		assets
config		config
dataset		dataset
diffusion		diffusion
loss		loss
model		model
utils		utils
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.py		download.py
embedder.py		embedder.py
environment.yaml		environment.yaml
models.py		models.py
sample.py		sample.py
step02.py		step02.py
train_1.py		train_1.py
train_2.py		train_2.py
visualize_demo.py		visualize_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Paper | Project Page

News

Demo

Trajectory-aware Video Generation:

Scene Video Generation:

Overview

Getting Started

Installation

Preparing

Training

Evaluation

Visualization

Related Projects

Citation

About

Releases

Packages

Contributors 2

Languages

License

wzzheng/OccSora

Folders and files

Latest commit

History

Repository files navigation

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Paper | Project Page

News

Demo

Trajectory-aware Video Generation:

Scene Video Generation:

Overview

Getting Started

Installation

Preparing

Training

Evaluation

Visualization

Related Projects

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages