Official repository for the paper "Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation".
[🌍 Project Page] [📖 Paper] [🤗 TwiG-50K Dataset]
- [2025.11.20] The paper “Thinking-while-Generating” is released on arXiv. 🚀
Existing methods inject textual reasoning either before (pre-planning) or after (post-refinement) visual generation.
TwiG is the first framework to interleave textual reasoning throughout the entire visual synthesis process.
We weave textual thoughts directly into the unfolding canvas, providing on-the-fly semantic guidance and reflection during generation.
Interleaving textual reasoning throughout visual generation.
TwiG decouples generation into Scheduling (When to Think), Reasoning (What to Say), and Reflection (How to Refine).
Clone the repository:
git clone https://github.com/ZiyuGuo99/Thinking-while-Generating.git
cd Thinking-while-GeneratingCreate a conda environment:
conda create -n twig python=3.11
conda activate twigPlease follow the instructions here to install both PyTorch and TorchVision dependencies.
Install additional dependencies:
pip install -r requirements.txt-
Download the Janus-Pro model weights from this link.
-
Configure the model and output paths in
twig.sh:
model_path=Janus-Pro-7B # Local path to the pre-trained model weights
output_folder=twig # Subfolder name for this evaluation task
base_output=./results # Root directory for saving generated images and text results- Run the following command:
bash twig.shPlease cite us if you find this project helpful:
@article{guo2026thinking,
title={Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation},
author={Guo, Ziyu and Zhang, Renrui and Li, Hongyu and Zhang, Manyuan and Chen, Xinyan and Wang, Sifan and Feng, Yan and Pei, Peng and Heng, Pheng-Ann},
journal={arXiv:2511.16671},
year={2025}
}- Image Generation with CoT: Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step.
- T2I-R1: T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.
- DPO vs. GRPO: Delving into RL for Image Generation with CoT: A comprehensive comparison of DPO and GRPO.






