Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
The primary function of these models is to denoise an input sample, by modeling the distribution torch.nn.module
with basic functionality for saving and loading models both locally and from the HuggingFace hub.
Models should provide the def forward
function and initialization of the model.
All saving, loading, and utilities should be in the base ['ModelMixin'] class.
- The ['UNetModel'] was proposed in TODO and has been used in paper1, paper2, paper3.
- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the GLIDE paper, the ['UNetGradTTS'] model from this paper for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this paper, and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning paper.
- TODO: mention VAE / SDE score estimation