<a href="https://colab.research.google.com/github/vlordier/colabs/blob/main/FuseDream_Single.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *FuseDream*: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization.

By Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su and Qiang Liu from UCSD and UT Austin. (https://github.com/gnobitab/FuseDream). 

Following the commands in order to set up the environment and generate images with text queries using *FuseDream*.

This Colab notebook is the single image version of *FuseDream*. *FuseDream-Composition* will be shared in another Colab notebook.

A baseline method (BigSleep) was provided by https://twitter.com/advadnoun.


In [None]:
# @title Licensed under the MIT License

# Copyright (c) 2021 Katherine Crowson

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.


In [1]:
!nvidia-smi

Sat Apr  9 23:30:00 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    24W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!git clone https://github.com/gnobitab/FuseDream.git
!pip install ftfy regex tqdm numpy scipy h5py lpips==0.1.4
!pip install git+https://github.com/openai/CLIP.git
!pip install gdown
!gdown 'https://drive.google.com/uc?id=17ymX6rhsgHDZw_g5XgAFW4xLSDocARCM'
!gdown 'https://drive.google.com/uc?id=1sOZ9og9kJLsqMNhaDnPJgzVsBZQ1sjZ5'

Cloning into 'FuseDream'...
remote: Enumerating objects: 124, done.[K
remote: Total 124 (delta 0), reused 0 (delta 0), pack-reused 124[K
Receiving objects: 100% (124/124), 8.24 MiB | 17.05 MiB/s, done.
Resolving deltas: 100% (40/40), done.
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 1.8 MB/s 
Collecting lpips==0.1.4
  Downloading lpips-0.1.4-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.5 MB/s 
Installing collected packages: lpips, ftfy
Successfully installed ftfy-6.1.1 lpips-0.1.4
Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-ocsbc4e_
  Running command git clone -q https://github.com/openai/CLIP.git /tmp/pip-req-build-ocsbc4e_
Building wheels for collected packages: clip
  Building wheel for clip (setup.py) ... [?25l[?25hdone
  Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369221 sha256=9fe164

In [None]:
!ls
!cp biggan-256.pth FuseDream/BigGAN_utils/weights/
!cp biggan-512.pth FuseDream/BigGAN_utils/weights/
%cd FuseDream

In [None]:
import torch
from tqdm import tqdm
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
import torchvision
import BigGAN_utils.utils as utils
import clip
import torch.nn.functional as F
from DiffAugment_pytorch import DiffAugment
import numpy as np
from fusedream_utils import FuseDreamBaseGenerator, get_G, save_image

### Setting up parameters
1. SENTENCE: The query text for generating the image. Note: we find that putting a period '.' at the end of the sentence can boost the quality of the generated images, e.g., 'A photo of a blue dog.' generates better images than 'A photo of a blue dog'.
2. INIT_ITERS: Controls the number of images used for initialization (M in the paper, and M = INIT_ITERS*10). Use the default number 1000 should work well.
3. OPT_ITERS: Controls the number of iterations for optimizing the latent variables. Use the default number 1000 should work well.
4. NUM_BASIS: Controls the number of basis images used in optimization (k in the paper). Choose from 5, 10, 15 should work well.
5. MODEL: Currently please choose from 'biggan-256' and 'biggan-512'.
6. SEED: Random seed. Choose an arbitrary integer you like.

In [None]:
#@title Parameters
SENTENCE = "A photo of a blue dog." #@param {type:"string"}
INIT_ITERS =  1000#@param {type:"number"}
OPT_ITERS = 1000#@param {type:"number"}
NUM_BASIS = 5#@param {type:"number"}
MODEL = "biggan-256" #@param ["biggan-256","biggan-512"]
SEED = 0#@param {type:"number"}

import sys
sys.argv = [''] ### workaround to deal with the argparse in Jupyter

In [None]:
### Generation: Click the 'run' button and the final generated image will be shown after the end of the algorithm
utils.seed_rng(SEED) 

sentence = SENTENCE

print('Generating:', sentence)
if MODEL == "biggan-256":
    G, config = get_G(256) 
elif MODEL == "biggan-512":
    G, config = get_G(512) 
else:
    raise Exception('Model not supported')
generator = FuseDreamBaseGenerator(G, config, 10) 
z_cllt, y_cllt = generator.generate_basis(sentence, init_iters=INIT_ITERS, num_basis=NUM_BASIS)

z_cllt_save = torch.cat(z_cllt).cpu().numpy()
y_cllt_save = torch.cat(y_cllt).cpu().numpy()
img, z, y = generator.optimize_clip_score(z_cllt, y_cllt, sentence, latent_noise=False, augment=True, opt_iters=OPT_ITERS, optimize_y=True)
### Set latent_noise = True yields slightly higher AugCLIP score, but slightly lower image quality. We set it to False for dogs.
score = generator.measureAugCLIP(z, y, sentence, augment=True, num_samples=20)
print('AugCLIP score:', score)
import os
if not os.path.exists('./samples'):
    os.mkdir('./samples')
save_image(img, 'samples/fusedream_%s_seed_%d_score_%.4f.png'%(sentence, SEED, score))

from IPython import display
display.display(display.Image('samples/fusedream_%s_seed_%d_score_%.4f.png'%(sentence, SEED, score)))