# Générateur d’images à partir de phrases, avec VQGAN & CLIP (méthode z+quantize avec améliorations).

Méthode d’origine BigGAN+CLIP créée par https://twitter.com/advadnoun, adapté par Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings).

Traduction en espagnol, modifications, et explications ajoutées par Eleiber#8347, et interface conviviale conçue par Abulafia#3734.

Améliorations et traduction en français par Épinards & Caramel (https://github.com/xDEADC0DE, https://twitter.com/epinardscaramel)

Un tutoriel détaillé est disponible (en espagnol) à https://tuscriaturas.miraheze.org/wiki/Ayuda:Crear_imágenes_con_VQGAN+CLIP, écrit par Jakeukalane#2767 et Avengium (Ángel)#3715.


## À lire en premier

* à quoi sert cette page
  * exemple d’image ?
* qu’est ce que Google CoLab
  * besoin d’un compte Google
  * comment déplier un bloc, comment l’exécuter

## Explications

TODO

## Explications détaillées

TODO

## notes documentation, à ajouter avant publication

Faire en sorte que par défaut, cliquer sur `Runtime` / `Run All` fasse une image intéressante

----


les seeds utilisées sont enregistrées dans les images

`$ pip install stegano`

```
$ stegano-lsb reveal -i wikiart-1024\ 2\ 200\ mushrooms.png
{"title": "teen selfie * mushrooms * topographic Map * cinema * child\u2019s drawing * young redhead reading * ste
ampunk * neo-noir * cyberpunk * Lovecraftian * a bunker covered with graffiti on the beach * brutalist architectur
e * cats * an armchair in the shape of an avocado * can of soda * a chocolate buffet * the trophy doesn't fit into
 the suitcase because it's too small * brain * Tolkien psychedelia * skyline", "notebook": "VQGAN+CLIP", "i": 400,
 "model": "WikiArt 1024", "seed": "2724284967354882491", "input_images": ""}
```

----

NOTE: Large models like COCO-Stuff can take a long time to download, so if you're losing a lot of time that way and you've mounted your personal Drive above then you may want to copy the .ckpt and .yaml files to somewhere in your drive. If you do this you can easily add your local model to the dropdown (e.g. if your ckpt is at "`drive/MyDrive/colab/coco.ckpt`" the dropdown entry should be "`drive/MyDrive/colab/coco`") and then uncheck the download. If all models you wish to use are local it is fine to uncheck all of these.

----

Checklist de modifs à faire avant commit nouvelle version
* déplier `Table of contents`
* vérifier le `textos` enregistré
* décocher `utiliser_google_drive`


# 📚 Installation

In [None]:
# @title 💽 Utiliser Google Drive (optionnel)

#@markdown This notebook supports mounting your google drive as a folder and storing outputs to it in order to keep them when the runtime disconnects. 

#@markdown If you are running this notebook or a copy of this notebook, this step will require you authorize mounting your drive every time you start up the notebook. I recommend making a new notebook, going to "Runtime->Change Runtime Type" in the menu bar and selecting "GPU" and "High-RAM", and then copying each cell of this notebook in. This way you can mount your drive without manual intervention.

#@markdown ----

utiliser_google_drive = False #@param {type:"boolean"}

#@markdown To use your drive, set `root_path` to the relative drive folder path you want outputs to be saved to, then execute the cell. Leaving the field blank or just not running this will have outputs save to the runtime temp storage.
gdrive_path = "VQGANCLIP" #@param {type: "string"}

MODE_GDRIVE = False
models_path = ''

if utiliser_google_drive:
  abs_root_path = "/content"
  if len(gdrive_path) > 0:
      abs_root_path = abs_root_path + "/drive/MyDrive/" + gdrive_path
      if abs_root_path[-1] != "/":
        abs_root_path += '/'
      print(abs_root_path)

  from google.colab import drive
  drive.mount("/content/drive", force_remount=True)

  #TODO: à revoir
  models_path = abs_root_path

  MODE_GDRIVE = True
  print("MODE_GDRIVE == True") #TODO: à revoir
else:
  print("MODE_GDRIVE == False") #TODO: à revoir




In [None]:
# @title 📚 Chargement des bibliothéques et des définitions

!nvidia-smi
print()

print("(1/5) Téléchargement de CLIP…")
!git clone https://github.com/openai/CLIP                 &> /dev/null
 
print("(2/6) Installation des bibliothèques Python pour l'IA…")
!git clone https://github.com/CompVis/taming-transformers &> /dev/null
!pip install ftfy regex tqdm omegaconf pytorch-lightning  &> /dev/null
!pip install kornia                                       &> /dev/null
!pip install einops                                       &> /dev/null
!pip install transformers &> /dev/null # ajout perso

print("(3/6) Installation des bibliothèques Python pour la manipulation de métadonnées…")
!pip install stegano                                      &> /dev/null
!apt install exempi                                       &> /dev/null
!pip install python-xmp-toolkit                           &> /dev/null
!pip install imgtag                                       &> /dev/null
!pip install pillow==7.1.2                                &> /dev/null

print("(4/6) Installation des bibliothèques Python pour la création de vidéos…")
!pip install imageio-ffmpeg &> /dev/null

# téléchargement police Noto, pour les captions
print("(5/6) Téléchargement de la police Noto Serif…")
!wget "https://noto-website-2.storage.googleapis.com/pkgs/NotoSerif-hinted.zip" &> /dev/null 
!unzip -p "NotoSerif-hinted.zip" NotoSerif-Regular.ttf > NotoSerif-Regular.ttf
!rm "NotoSerif-hinted.zip"

print("(6/6) Configuration Python…")

import argparse
import math
from pathlib import Path
import sys
# remove before sharing
import datetime 
 
sys.path.append('./taming-transformers')
from IPython import display
from base64 import b64encode
from omegaconf import OmegaConf
from PIL import Image, ImageDraw, ImageFont
from taming.models import cond_transformer, vqgan
import torch
from torch import nn, optim
from torch.nn import functional as F
from torchvision import transforms
from torchvision.transforms import functional as TF
from tqdm.notebook import tqdm
 
from CLIP import clip
import kornia.augmentation as K
import numpy as np
import imageio
from PIL import ImageFile, Image
from imgtag import ImgTag    # metadatos 
from libxmp import *         # metadatos
import libxmp                # metadatos
from stegano import lsb
import json
ImageFile.LOAD_TRUNCATED_IMAGES = True
 
def sinc(x):
    return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))
 
 
def lanczos(x, a):
    cond = torch.logical_and(-a < x, x < a)
    out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))
    return out / out.sum()
 
 
def ramp(ratio, width):
    n = math.ceil(width / ratio + 1)
    out = torch.empty([n])
    cur = 0
    for i in range(out.shape[0]):
        out[i] = cur
        cur += ratio
    return torch.cat([-out[1:].flip([0]), out])[1:-1]
 
 
def resample(input, size, align_corners=True):
    n, c, h, w = input.shape
    dh, dw = size
 
    input = input.view([n * c, 1, h, w])
 
    if dh < h:
        kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)
        pad_h = (kernel_h.shape[0] - 1) // 2
        input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')
        input = F.conv2d(input, kernel_h[None, None, :, None])
 
    if dw < w:
        kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)
        pad_w = (kernel_w.shape[0] - 1) // 2
        input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')
        input = F.conv2d(input, kernel_w[None, None, None, :])
 
    input = input.view([n, c, h, w])
    return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)
 
 
class ReplaceGrad(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x_forward, x_backward):
        ctx.shape = x_backward.shape
        return x_forward
 
    @staticmethod
    def backward(ctx, grad_in):
        return None, grad_in.sum_to_size(ctx.shape)
 
 
replace_grad = ReplaceGrad.apply
 
 
class ClampWithGrad(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, min, max):
        ctx.min = min
        ctx.max = max
        ctx.save_for_backward(input)
        return input.clamp(min, max)
 
    @staticmethod
    def backward(ctx, grad_in):
        input, = ctx.saved_tensors
        return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None
 
 
clamp_with_grad = ClampWithGrad.apply
 
 
def vector_quantize(x, codebook):
    d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T
    indices = d.argmin(-1)
    x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook
    return replace_grad(x_q, x)
 
 
class Prompt(nn.Module):
    def __init__(self, embed, weight=1., stop=float('-inf')):
        super().__init__()
        self.register_buffer('embed', embed)
        self.register_buffer('weight', torch.as_tensor(weight))
        self.register_buffer('stop', torch.as_tensor(stop))
 
    def forward(self, input):
        input_normed = F.normalize(input.unsqueeze(1), dim=2)
        embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)
        dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)
        dists = dists * self.weight.sign()
        return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()
 
 
def parse_prompt(prompt):
    vals = prompt.rsplit(':', 2)
    vals = vals + ['', '1', '-inf'][len(vals):]
    return vals[0], float(vals[1]), float(vals[2])
 
 
class MakeCutouts(nn.Module):
    def __init__(self, cut_size, cutn, cut_pow=1.):
        super().__init__()
        self.cut_size = cut_size
        self.cutn = cutn
        self.cut_pow = cut_pow
        self.augs = nn.Sequential(
            K.RandomHorizontalFlip(p=0.5),
            # K.RandomSolarize(0.01, 0.01, p=0.7),
            K.RandomSharpness(0.3,p=0.4),
            K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),
            K.RandomPerspective(0.2,p=0.4),
            K.ColorJitter(hue=0.01, saturation=0.01, p=0.7))
        self.noise_fac = 0.1
 
 
    def forward(self, input):
        sideY, sideX = input.shape[2:4]
        max_size = min(sideX, sideY)
        min_size = min(sideX, sideY, self.cut_size)
        cutouts = []
        for _ in range(self.cutn):
            size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
            offsetx = torch.randint(0, sideX - size + 1, ())
            offsety = torch.randint(0, sideY - size + 1, ())
            cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
            cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
        batch = self.augs(torch.cat(cutouts, dim=0))
        if self.noise_fac:
            facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)
            batch = batch + facs * torch.randn_like(batch)
        return batch
 
 
def load_vqgan_model(config_path, checkpoint_path):
    config = OmegaConf.load(config_path)
    if config.model.target == 'taming.models.vqgan.VQModel':
        model = vqgan.VQModel(**config.model.params)
        model.eval().requires_grad_(False)
        model.init_from_ckpt(checkpoint_path)
    elif config.model.target == 'taming.models.cond_transformer.Net2NetTransformer':
        parent_model = cond_transformer.Net2NetTransformer(**config.model.params)
        parent_model.eval().requires_grad_(False)
        parent_model.init_from_ckpt(checkpoint_path)
        model = parent_model.first_stage_model
    else:
        raise ValueError(f'unknown model type: {config.model.target}')
    del model.loss
    return model
 
 
def resize_image(image, out_size):
    ratio = image.size[0] / image.size[1]
    area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])
    size = round((area * ratio)**0.5), round((area / ratio)**0.5)
    return image.resize(size, Image.LANCZOS)

print("Installation terminée.")



In [None]:
#@title 📖 Sélection des modèles à télécharger
#@markdown Por defecto, el notebook descarga el modelo 16384 de ImageNet. Existen otros como ImageNet 1024, COCO-Stuff, WikiArt 1024, WikiArt 16384, FacesHQ o S-FLCKR, que no se descargan por defecto, ya que sería en vano si no los vas a usar, así que si quieres usarlos, simplemente selecciona los modelos a descargar.

imagenet_1024 = False #@param {type:"boolean"}
imagenet_16384 = True #@param {type:"boolean"}
coco = False #@param {type:"boolean"}
faceshq = False #@param {type:"boolean"}
wikiart_1024 = False #@param {type:"boolean"}
wikiart_16384 = False #@param {type:"boolean"}
sflckr = False #@param {type:"boolean"}

if imagenet_1024:
  print()
  print("imagenet_1024")
  !curl -L -o {models_path}vqgan_imagenet_f16_1024.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_1024.yaml' #ImageNet 1024
  !curl -L -o {models_path}vqgan_imagenet_f16_1024.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_1024.ckpt'  #ImageNet 1024 # 914 Mo
if imagenet_16384:
  print()
  print("imagenet_16384")
  !curl -L -o {models_path}vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384
  !curl -L -o {models_path}vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384 # 935 Mo
if coco:
  print()
  print("coco")
  #!curl -L -o {models_path}coco.yaml -C - 'https://dl.nmkd.de/ai/clip/coco/coco.yaml' #COCO
  !curl -L -o {models_path}coco.ckpt -C - 'https://dl.nmkd.de/ai/clip/coco/coco.ckpt' #COCO # 7.9 Go
if faceshq:
  print()
  print("faceshq")
  !curl -L -o {models_path}faceshq.yaml -C - 'https://drive.google.com/uc?export=download&id=1fHwGx_hnBtC8nsq7hesJvs-Klv-P0gzT' #FacesHQ
  !curl -L -o {models_path}faceshq.ckpt -C - 'https://app.koofr.net/content/links/a04deec9-0c59-4673-8b37-3d696fe63a5d/files/get/last.ckpt?path=%2F2020-11-13T21-41-45_faceshq_transformer%2Fcheckpoints%2Flast.ckpt' #FacesHQ # 3.7 Go (?)
if wikiart_1024:
  print()
  print("wikiart_1024")
  !curl -L -o {models_path}wikiart_1024.yaml -C - 'http://mirror.io.community/blob/vqgan/wikiart.yaml' #WikiArt 1024
  !curl -L -o {models_path}wikiart_1024.ckpt -C - 'http://mirror.io.community/blob/vqgan/wikiart.ckpt' #WikiArt 1024 # 914 Mo
if wikiart_16384: 
  print()
  print("wikiart_16384")
  !curl -L -o {models_path}wikiart_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/wikiart_16384.yaml' #WikiArt 16384
  !curl -L -o {models_path}wikiart_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/wikiart_16384.ckpt' #WikiArt 16384 # 959 Mo
if sflckr:
  print()
  print("sflckr")
  !curl -L -o {models_path}sflckr.yaml -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fconfigs%2F2020-11-09T13-31-51-project.yaml&dl=1' #S-FLCKR
  !curl -L -o {models_path}sflckr.ckpt -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fcheckpoints%2Flast.ckpt&dl=1' #S-FLCKR # 4 Go


# 🏃 Exécution



Principalmente lo que tendrás que modificar será `textos:`, ahí puedes colocar el o los textos que  quieres generar (separados con `|` ). Es una lista porque puedes poner más de un texto, y así la IA trate de 'mezclar' las imágenes, dándole la misma prioridad a ambos textos.

Para usar una imagen inicial al modelo, solo debes subir un archivo al entorno del Colab (en la sección a la izquierda), y luego modificas `imagen_inicial:` poniendo el nombre exacto del archivo. Ejemplo: `sample.png`

También puedes modificar el modelo cambiando las línea que dice `modelo:`. Actualmente están disponibles 1024, 16384, WikiArt, S-FLCKR y COCO-Stuff. Para activarlos tienes que haberlos descargado primero, y luego puedes simplemente seleccionarlo.

También puedes usar `imagenes_objetivo`, que es básicamente ponerle una o más imágenes que la IA tomará como "meta", cumpliendo la misma función que ponerle un texto. Para poner más de una tienes que usar `|` como separador.

In [None]:
#@title ⚙️ Paramètres
textos = "morel:0.80 | dragon:1 |trending on artstation" #@param {type:"string"}
reuse_last_image = False #@param {type:"boolean"}
ancho =  600#@param {type:"number"}
alto =  600#@param {type:"number"}
modelo = "wikiart_16384" #@param ["vqgan_imagenet_f16_16384", "vqgan_imagenet_f16_1024", "wikiart_1024", "wikiart_16384", "coco", "faceshq", "sflckr"]
intervalo_imagenes =  100#@param {type:"number"}
imagen_inicial = "None"#@param {type:"string"}
imagenes_objetivo = "None"#@param {type:"string"}
seed = -1#@param {type:"number"}
max_iteraciones = 200#@param {type:"number"}
input_images = ""

if not reuse_last_image:
  print("Suppression des images déjà crées…")
  !rm steps/*.png  > /dev/null 2>&1
  !rmdir steps > /dev/null 2>&1
  !mkdir steps
  #print()

nombres_modelos={"vqgan_imagenet_f16_16384": 'ImageNet 16384',"vqgan_imagenet_f16_1024":"ImageNet 1024", 
                 "wikiart_1024":"WikiArt 1024", "wikiart_16384":"WikiArt 16384", "coco":"COCO-Stuff", "faceshq":"FacesHQ", "sflckr":"S-FLCKR"}
nombre_modelo = nombres_modelos[modelo]     

if seed == -1:
    seed = None
if imagen_inicial == "None":
    imagen_inicial = None
if imagenes_objetivo == "None" or not imagenes_objetivo:
    imagenes_objetivo = []
else:
    imagenes_objetivo = imagenes_objetivo.split("|")
    imagenes_objetivo = [image.strip() for image in imagenes_objetivo]

if imagen_inicial or imagenes_objetivo != []:
    input_images = True

#textos = [frase.strip() for frase in textos.split("|")]
prompts_list = []
for text in textos.split('*'):
  tmp = [frase.strip() for frase in text.split("|")]
  if tmp != ['']:
    prompts_list.append(tmp)
if prompts_list == ['']:
    prompts_list = []

args = argparse.Namespace(
    prompts=prompts_list,
    image_prompts=imagenes_objetivo,
    noise_prompt_seeds=[],
    noise_prompt_weights=[],
    size=[ancho, alto],
    init_image=imagen_inicial,
    init_weight=0.,
    clip_model='ViT-B/32',
    vqgan_config=f'{modelo}.yaml',
    vqgan_checkpoint=f'{modelo}.ckpt',
    step_size=0.1,
    cutn=64,
    cut_pow=1.,
    display_freq=intervalo_imagenes,
    seed=seed,
)

if MODE_GDRIVE:
  args.vqgan_config = models_path + args.vqgan_config
  args.vqgan_checkpoint = models_path + args.vqgan_checkpoint
  # debug
  #print(args.vqgan_checkpoint)
  #print()


print(prompts_list)
if (len(prompts_list)>1):
  MODE_MULTIPLE = True 
  print(len(prompts_list), " prompts")
else:
  MODE_MULTIPLE = False 


In [None]:
#@title 🏃 Créer les images !

def synth(z):
    z_q = vector_quantize(z.movedim(1, 3), model.quantize.embedding.weight).movedim(3, 1)
    return clamp_with_grad(model.decode(z_q).add(1).div(2), 0, 1)

def add_xmp_data(nombrefichero):
    imagen = ImgTag(filename=nombrefichero)
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'creator', 'VQGAN+CLIP', {"prop_array_is_ordered":True, "prop_value_is_array":True})
    if args.prompts:
        imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', " * ".join([" | ".join([str(c) for c in lst]) for lst in args.prompts]), {"prop_array_is_ordered":True, "prop_value_is_array":True})
    else:
        imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', 'None', {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'i', str(i), {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'model', nombre_modelo, {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'seed',str(seed) , {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'input_images',str(input_images) , {"prop_array_is_ordered":True, "prop_value_is_array":True})
    #for frases in args.prompts:
    #    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'Prompt' ,frases, {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.close()

def add_stegano_data(filename):
    data = {
        "title": " * ".join([" | ".join([str(c) for c in lst]) for lst in args.prompts]) if args.prompts else None,
        "notebook": "VQGAN+CLIP",
        "i": i,
        "model": nombre_modelo,
        "seed": str(seed),
        "input_images": input_images
    }
    lsb.hide(filename, json.dumps(data)).save(filename)

@torch.no_grad()
def checkin(i, losses):
    losses_str = ', '.join(f'{loss.item():g}' for loss in losses)
    tqdm.write(f'i: {i}, loss: {sum(losses).item():g}, losses: {losses_str}')
    out = synth(z)
    TF.to_pil_image(out[0].cpu()).save('progress.png')
    add_stegano_data('progress.png')
    add_xmp_data('progress.png')
    display.display(display.Image('progress.png'))

def ascend_txt():
    global i
    out = synth(z)
    iii = perceptor.encode_image(normalize(make_cutouts(out))).float()

    result = []

    if args.init_weight:
        result.append(F.mse_loss(z, z_orig) * args.init_weight / 2)

    for prompt in pMs:
        result.append(prompt(iii))
    img = np.array(out.mul(255).clamp(0, 255)[0].cpu().detach().numpy().astype(np.uint8))[:,:,:]
    img = np.transpose(img, (1, 2, 0))
    filename = f"steps/{i:04}.png"
    imageio.imwrite(filename, np.array(img))
    add_stegano_data(filename)
    add_xmp_data(filename)
    return result

def train(i):
    opt.zero_grad()
    lossAll = ascend_txt()
    if (i != 0) and ((i % args.display_freq == 0) or (i % max_iteraciones == 0)):
        checkin(i, lossAll)
    loss = sum(lossAll)
    loss.backward()
    opt.step()
    with torch.no_grad():
        z.copy_(z.maximum(z_min).minimum(z_max))



device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)


i = 0
run = 1
for prompts in args.prompts:
  print()
  print('prompt #', run, ': ', prompts)
  if imagenes_objetivo:
      print('Using image prompts:', imagenes_objetivo)
  if args.seed is None:
      seed = torch.seed()
  else:
      seed = args.seed
  torch.manual_seed(seed)
  print('Using seed:', seed)

  model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
  perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)

  cut_size = perceptor.visual.input_resolution
  e_dim = model.quantize.e_dim
  f = 2**(model.decoder.num_resolutions - 1)
  make_cutouts = MakeCutouts(cut_size, args.cutn, cut_pow=args.cut_pow)
  n_toks = model.quantize.n_e
  toksX, toksY = args.size[0] // f, args.size[1] // f
  sideX, sideY = toksX * f, toksY * f
  z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]
  z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]


  if args.init_image:
      pil_image = Image.open(args.init_image).convert('RGB')
      pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
      z, *_ = model.encode(TF.to_tensor(pil_image).to(device).unsqueeze(0) * 2 - 1)
  else:
      one_hot = F.one_hot(torch.randint(n_toks, [toksY * toksX], device=device), n_toks).float()
      z = one_hot @ model.quantize.embedding.weight
      z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)
  z_orig = z.clone()
  z.requires_grad_(True)
  opt = optim.Adam([z], lr=args.step_size)

  normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
                                  std=[0.26862954, 0.26130258, 0.27577711])


  pMs = []

  for prompt in prompts:
      txt, weight, stop = parse_prompt(prompt)
      embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
      pMs.append(Prompt(embed, weight, stop).to(device))

  for prompt in args.image_prompts:
      path, weight, stop = parse_prompt(prompt)
      img = resize_image(Image.open(path).convert('RGB'), (sideX, sideY))
      batch = make_cutouts(TF.to_tensor(img).unsqueeze(0).to(device))
      embed = perceptor.encode_image(normalize(batch)).float()
      pMs.append(Prompt(embed, weight, stop).to(device))

  for seed, weight in zip(args.noise_prompt_seeds, args.noise_prompt_weights):
      gen = torch.Generator().manual_seed(seed)
      embed = torch.empty([1, perceptor.visual.output_dim]).normal_(generator=gen)
      pMs.append(Prompt(embed, weight).to(device))

  try:
      with tqdm() as pbar:
          while True:
              train(i)
              if i == (max_iteraciones*run):
                  # on met à jour l'image initiale
                  if reuse_last_image:
                    args.init_image = f"steps/{i:04}.png"

                  current_prompt = " - ".join(prompts)

                  now = datetime.datetime.now(datetime.timezone.utc)
                  print("nom de fichier suggéré : {:02d}h{:02d} {:s} {:d} {:s}".format( \
                      now.hour+2, now.minute, \
                      nombre_modelo, \
                      i, \
                      current_prompt[:min(len(current_prompt),100)].replace(':','_')))
                  break
              i += 1
              pbar.update()
  except KeyboardInterrupt:
      pass
  
  run += 1



# 📽️  En faire une vidéo


In [None]:
#@title 📝 Ajouter les prompts en bas de chaque image (optionnel)
captions = "None"#@param {type:"string"}
#@markdown `captions` : liste des textes à ajouter en bas des images, séparés par le caractère `*`.<br>Laisser à None pour utiliser automatiquement `textos`
hauteur_bordure =  30#@param {type:"number"}
#@markdown `hauteur_bordure` : taille en pixels de la bordure ajoutée en bas

font = ImageFont.truetype("NotoSerif-Regular.ttf", 20)

def ajouter_texte_v2(chemin_image, caption=None):
  image_pil = Image.open(chemin_image)

  # ajout de place en bas
  img_colour = (0,0,0)
  width, height = image_pil.size
  result = Image.new(image_pil.mode, (width, height+hauteur_bordure), img_colour)
  result.paste(image_pil, (0, 0))

  # ajout du texte 
  if caption:
    draw = ImageDraw.Draw(result)
    draw.text((10, height), caption, size=20, font=font)

  result.save(chemin_image)


def ajouter_texte(chemin_image, caption=None):
  image_pil = Image.open(chemin_image)

  # ajout de place en bas
  img_colour = (0,0,0)
  width, height = image_pil.size
  #if height < alto+hauteur_bordure:
  result = Image.new(image_pil.mode, (width, height+hauteur_bordure), img_colour)
  result.paste(image_pil, (0, 0))
  #else:
  #  result = image_pil

  # ajout du texte 
  if caption:
    draw = ImageDraw.Draw(result)
    draw.text((10, height+5), caption, size=20)

  result.save(chemin_image)

if captions == "None":
  captions = prompts_list
else:
  captions_list = []
  for caption in captions.split('*'):
    tmp = [frase.strip() for frase in caption.split("|")]
    if tmp != ['']:
      captions_list.append(tmp)
  if captions_list == ['']:
      captions_list = []
  captions = captions_list

print("captions: ", captions)
print()

i -= 1 # 100 itérations, dernier fichier : 0100.png et non 0099.png

premiere_image = 0
derniere_image = i
for t in range(premiere_image,derniere_image+1): # ne pas oublier le +1 :)
  ajouter_texte_v2(f"steps/{t:04}.png", captions[t // max_iteraciones][0])	# // : floor division

  if t % max_iteraciones == 0:
    print('image {:04d}\t{:s}'.format(t,captions[t // max_iteraciones][0]))

print()
print("dernière image :", f"steps/{t:04}.png")
display.display(display.Image(f"steps/{t:04}.png"))


In [None]:
#@title 📽️ Créer la vidéo
#@markdown `interval` specifies how frequently a frame should be added to the video - 1 is all frames, 2 every other frame, etc.
first_frame = 1 #@param {type: Number} #Este es el frame donde el vídeo empezará
last_frame = i-1

fps = 20 #@param {type: Number} #
interval = 1 #@param {type: Number} #

total_frames = last_frame-first_frame

frames = []
tqdm.write('Création de la vidéo…')

#ensure first frame is always appended
if(first_frame % interval is not 0):
    filename = f"steps/{first_frame:04}.png"
    frames.append(Image.open(filename))

#for i in range(first_frame,min(last_frame,max_iteraciones)+1): 
for i in range(first_frame,last_frame+1):
    if(i % interval is not 0):
       continue
    filename = f"steps/{i:04}.png"
    frames.append(Image.open(filename))

#ensure last frame is always appended
if(last_frame % interval is not 0):
    filename = f"steps/{last_frame:04}.png"
    frames.append(Image.open(filename))

from subprocess import Popen, PIPE
p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE)
for im in tqdm(frames):
    im.save(p.stdin, 'PNG')
p.stdin.close()

print("La vidéo est en train d'être finalisée…")
p.wait()
print("La vidéo est prête.")


In [None]:
# @title 📥 Télécharger la vidéo
from google.colab import files
files.download("video.mp4")

## ⚖️ Licence

**Licensed under the MIT License**

Copyright (c) 2021 Katherine Crowson

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.