Skip to content

Official code for 'Paragraph-to-Image Generation with Information-Enriched Diffusion Model'

Notifications You must be signed in to change notification settings

weijiawu/ParaDiffusion

Repository files navigation

ParaDiffusion

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

🎶 Updates

  • Mar. 24, 2024. The inference code have been released.
  • Nov. 28, 2023. ParaPrompts-400 and ParaImage-3k have been released.
  • Nov. 15, 2023. Rep initialization.

🐱 Abstract

ParaDiffusion an information-enriched diffusion model for paragraph-to-image generation task, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to align the text-image feature spaces in the generation task. A high-quality paragraph-image pair dataset, namely ParaImage is proposed to facilitate the training of long-text semantic alignment.


image.


image.

🔧 Dependencies and Installation

diffusers

git clone https://github.com/weijiawu/ParaDiffusion
cd ParaDiffusion

conda create -n ParaDiffusion python=3.8
conda activate ParaDiffusion
pip install -r requirements.txt

⏬ Download Models

Download our pretrained model for the ParaDiffusion:

mkdir -p weight
cd weight

# download the weight of DragAnything to ./weight
git lfs install
git clone https://huggingface.co/weijiawu/ParaDiffusion

We provide two sets of UNet weights, and you can choose the corresponding one for testing and inference.

💻 Inference

python demo.py

✏️ Paragraph-Image Dataset: ParaImage-Small


image.

The proposed ParaImage dataset mainly includes two parts:

(a) ParaImage-Big: High-quality images with generative captions (ParaImage-Big) are primarily employed for the paragraph-image alignment learning in Stage 2.

(b) ParaImage-Small: Aesthetic images with manual long-term description (ParaImage- Small) are primarily used for quality-tuning in Stage 3.

ParaImage-Small is a few thousand high-quality images are thoughtfully selected from LAION-Aesthetics, adhering to common principles in photography, then professionally annotated by skilled annotators.

The ParaImage-Small can be download from Google Drive

✏️ New Prompts Eval: ParaPrompts-400

The current test prompts focus on short text-to-image generation, ignoring the evaluation for paragraph-to-image generation, we introduced a new evaluation set of prompts called ParaPrompts, including 400 long-text descriptions.

The previous prompts testing was mostly concentrated on text alignments within the range of 0-25 words, while our prompts extend to long-text alignments of 100 words or more.

📖BibTeX

@misc{wu2023paradiffusion,
      title={Paragraph-to-Image Generation with Information-Enriched Diffusion Model}, 
      author={Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang},
      year={2023},
      eprint={2311.14284},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🤗Acknowledgements

  • Thanks to Diffusers for the wonderful work and codebase.

About

Official code for 'Paragraph-to-Image Generation with Information-Enriched Diffusion Model'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages