Skip to content

whlzy/FiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Figure

FiT: Flexible Vision Transformer for Diffusion Model

📃 Paper • 📦 Checkpoint

This repo contains PyTorch model definitions, pre-trained weights and sampling code for our flexible vision transformer (FiT). FiT is a diffusion transformer based model which can generate images at unrestricted resolutions and aspect ratios.

The core features will include:

  • Pre-trained class-conditional FiT-XL-2-16 (1800K) model weight trained on ImageNet ($H\times W \le 256\times256$).
  • A pytorch sample code for running pre-trained DiT-XL/2 models to generate images at unrestricted resolutions and aspect ratios.

Why we need FiT?

  • 🧐 Nature is infinitely resolution-free. FiT, like Sora, was trained on the unrestricted resolution or aspect ratio. FiT is capable of generating images at unrestricted resolutions and aspect ratios.
  • 🤗 FiT exhibits remarkable flexibility in resolution extrapolation generation.

Stay tuned for this project! 😆

Acknowledgments

This codebase borrows from DiT.

BibTeX

@article{Lu2024FiT,
  title={FiT: Flexible Vision Transformer for Diffusion Model},
  author={Zeyu Lu and Zidong Wang and Di Huang and Chengyue Wu and Xihui Liu and Wanli Ouyang and Lei Bai},
  year={2024},
  journal={arXiv preprint arXiv:2402.12376},
}

About

FiT: Flexible Vision Transformer for Diffusion Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published