GDiT: A Graph-Prior-Guided Diffusion Transformer for Semantic-Controllable Remote Sensing Image Synthesis

Abstract

Semantic image synthesis (SIS) is essential for remote sensing, particularly in generating high-quality training data for scarce annotated datasets. While existing SIS methods have advanced pixel-wise mappings between semantic maps and images, they often overlook spatial priors, such as relationships between geographic objects (e.g., road-building adjacency), leading to structural inconsistencies in synthesized images. To address this, we propose the graph-prior diffusion transformer (GDiT) for semantically controllable remote sensing image synthesis. We first convert semantic maps into semantic graphs, encoding geographic objects as nodes with structured spatial interactions. To capture spatial and semantic relationships, we propose the Geometric-Semantic Aware Module (GSAM), which integrates CLIP-extracted semantics and geometric attributes for a more context-aware representation. Furthermore, we design the Graph Diffusion Transformer (GDiT) Block, which employs graph-to-image cross-attention to refine spatial structures, ensuring topological coherence and semantic fidelity in synthesized images. Experiments on land-cover and land-use datasets show that GDiT achieves competitive performance and enables multilevel control across global, object, and pixel dimensions with text prompts, while using only 38.9% of the parameters compared to GeoSynth, improving both efficiency and synthesis quality.

🧩 Method Overview

GDiT converts semantic maps into semantic graphs and injects graph priors into a diffusion transformer via graph-to-image cross-attention. GSAM fuses CLIP semantics with geometric attributes to build context-aware node representations. The GDiT Blocks refine spatial structures to ensure topological coherence and semantic fidelity during synthesis.

🖼️ Visualization Results (OEM & OSM)

OEM

OSM

Semantic Edit

Multi-level Control (R + G + S)

Installation

```bash
git clone https://github.com/whudk/GDiT.git
cd GDiT
conda create -n gdit python=3.10 -y
conda activate gdit
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -e .
```

🏋️ Training

Train on OEM (256, latent + seg + graph, AdaLn)

python train.py ^
  --gpu 0 ^
  --config .\configs\train\oem256-latent_seg_graph_with_graph_adaln.yaml ^
  --global-batch-size 10 ^
  --with-graph 1 ^
  --use_clip ^
  --version v_prompt ^
  --image-size 32 ^
  --mode regions_graph_sem

🧪 Evaluation / Sampling (OEM)

python sample.py 
  --config .\configs\train\oem256-latent_seg_graph_with_graph_adaln.yaml 
  --gpu 0 
  --outdir .\GDiT_25_steps_OEMtest 
  --use_clip 
  --with-graph 1 
  --version GDiT 
  --eval_type regions_graph_sem 
  --num_steps 25 
  --seed 0-49999 
  --image_size 32 
  --cfg_scale 2.5 
  --ckpt-path "path\to\your_checkpoint.pt"

📦 Dataset (OEM)

We provide the OEM dataset via Baidu Netdisk:

Baidu Netdisk link: https://pan.baidu.com/s/1UegGtMf7JylkE_4XIS-W3g?pwd=whdk
Extraction code: whdk

🔥 Pretrained Weights (OEM)

We provide an OEM pretrained checkpoint via Baidu Netdisk:

Baidu Netdisk link: https://pan.baidu.com/s/1qegh4KQ4ikPni5zZQROJ3g?pwd=whdk
Extraction code: whdk

Train your dataset

🏋️ Train Your Own Dataset

Please organize your dataset in the following structure:

your_data_dir/
  train/
    images/
    labels/
  val/
    images/
    labels/

Before training, you need two preprocessing steps:

Generate VAE latents for images/ (saved to vae_feats/).

python scripts/generate_vae_feats.py \
  --data_dir "your data_dir" \
  --splits train val \
  --out_name vae_feats

Convert semantic maps in labels/ into semantic graphs (saved to graphs/).

python scripts/generate_graph_from_seg.py \
  --data_dir "your data_dir"  \
  --splits train val \
  --out_name graph

After preprocessing, your directory should look like:

your_data_dir/
  train/
    images/
    labels/
    vae_feats/
    graphs/
  val/
    images/
    labels/
    vae_feats/
    graphs/

Citation

bibtex
@article{DENG2026105038,
    title = {GDiT: A graph-prior-guided diffusion transformer for semantic-controllable remote sensing image synthesis},
    journal = {International Journal of Applied Earth Observation and Geoinformation},
    volume = {146},
    pages = {105038},
    year = {2026},
    issn = {1569-8432},
    doi = {https://doi.org/10.1016/j.jag.2025.105038},
    url = {https://www.sciencedirect.com/science/article/pii/S1569843225006855},
    author = {Kai Deng and Xiangyun Hu and Yibing Xiong and Aokun Liang and Jiong Xu}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs/train		configs/train
diffusion		diffusion
image		image
ldm		ldm
licenses		licenses
models		models
scripts		scripts
torch_utils		torch_utils
train_utils		train_utils
.gitignore		.gitignore
README.MD		README.MD
autoencoder.py		autoencoder.py
fid.py		fid.py
generate.py		generate.py
generete_single_sample.py		generete_single_sample.py
requirements.txt		requirements.txt
sample.py		sample.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDiT: A Graph-Prior-Guided Diffusion Transformer for Semantic-Controllable Remote Sensing Image Synthesis

Abstract

🧩 Method Overview

🖼️ Visualization Results (OEM & OSM)

OEM

OSM

Semantic Edit

Multi-level Control (R + G + S)

Installation

🏋️ Training

Train on OEM (256, latent + seg + graph, AdaLn)

🧪 Evaluation / Sampling (OEM)

📦 Dataset (OEM)

🔥 Pretrained Weights (OEM)

Train your dataset

🏋️ Train Your Own Dataset

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GDiT: A Graph-Prior-Guided Diffusion Transformer for Semantic-Controllable Remote Sensing Image Synthesis

Abstract

🧩 Method Overview

🖼️ Visualization Results (OEM & OSM)

OEM

OSM

Semantic Edit

Multi-level Control (R + G + S)

Installation

🏋️ Training

Train on OEM (256, latent + seg + graph, AdaLn)

🧪 Evaluation / Sampling (OEM)

📦 Dataset (OEM)

🔥 Pretrained Weights (OEM)

Train your dataset

🏋️ Train Your Own Dataset

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages