Consistent Text-to-Image Generation via Scene De-Contextualization (SDeC)

Official codebase for the ICLR 2026 paper “Consistent Text-to-Image Generation via Scene De-Contextualization” (OpenReview: https://openreview.net/forum?id=rRp8yYKRGj).

✨ Highlights

We propose a scene contextualization perspective for ID shift with T2I models.
We theoretically characterize and quantify this contextualization, leading to a novel SDeC approach for mitigating ID shift per scene without the need for complete target scenes in advance.
Extensive experiments show that SDeC can enhance identity preservation, maintain scene diversity, and offer plug-and-play flexibility at per-scene level and across diverse tasks.

📌 Paper

Title: Consistent Text-to-Image Generation via Scene De-Contextualization
Venue: ICLR 2026
Authors: Song Tang, Peihao Gong, Kunyu Li, Kai Guo, Boyu Wang, Mao Ye, Jianwei Zhang, Xiatian Zhu
Project Page: http://www.taulab.cc/SDeC/
OpenReview: https://openreview.net/forum?id=rRp8yYKRGj
PDF: PDF

Abstract

Consistent text-to-image (T2I) generation seeks to produce identity-preserving images of the same subject across diverse scenes, yet it often fails due to a phenomenon called identity (ID) shift. Previous methods have tackled this issue, but typically rely on the unrealistic assumption of knowing all target scenes in advance. This paper reveals that a key source of ID shift is the native correlation between subject and scene context, called scene contextualization, which arises naturally as T2I models fit the training distribution of vast natural images. We formally prove the near-universality of this scene-subject correlation and derive theoretical bounds on its strength. On this basis, we propose a novel, efficient, training-free prompt embedding editing approach, called Scene De-Contextualization (SDeC), that imposes an inversion process of T2I’s built-in scene contextualization. Specifically, it identifies and suppresses the latent scene-subject correlation within the ID prompt’s embedding by quantifying SVD directional stability to re-weight the corresponding eigenvalues adaptively. Critically, SDeC allows for per-scene use (one prompt per scene) without requiring prior access to all target scenes. This makes it a highly flexible and general solution well-suited to real-world applications where such prior knowledge is often unavailable or varies over time. Experiments demonstrate that SDeC significantly enhances identity preservation while maintaining scene diversity.

🚀 Quick Start

$ conda create --name SDeC python=3.10
$ conda activate SDeC
$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia 
$ pip install transformers==4.46.3  # or: conda install conda-forge::transformers 
$ conda install -c conda-forge diffusers
$ pip install opencv-python scipy gradio==4.44.1 sympy==1.13.1
$ pip install compel
### Install dependencies ENDs ###

# Run infer code
$ python main.py

🔒 License

MIT/Apache-2.0

📎 Citation

@inproceedings{tang2026SDeC,
  title     = {Consistent Text-to-Image Generation via Scene De-Contextualization},
  author    = {Tang, Song and Gong, Peihao and Li, Kunyu and Guo, Kai and Wang, Boyu and Ye, Mao and Zhang, Jianwei and Zhu, Xiatian},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026},
  url       = {https://openreview.net/forum?id=rRp8yYKRGj}
}

🙏 Acknowledgements

Built upon:

Hugging Face diffusers
Stable Diffusion XL
Stable Diffusion 3

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
resource		resource
unet		unet
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
main.py		main.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consistent Text-to-Image Generation via Scene De-Contextualization (SDeC)

✨ Highlights

📌 Paper

🚀 Quick Start

🔒 License

📎 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

tntek/SDeC

Folders and files

Latest commit

History

Repository files navigation

Consistent Text-to-Image Generation via Scene De-Contextualization (SDeC)

✨ Highlights

📌 Paper

🚀 Quick Start

🔒 License

📎 Citation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages