<a href="https://colab.research.google.com/github/voroninvisuals/google-colab-notebooks/blob/main/Batch_image_VQGAN%2BCLIP_public.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generate images from text inputs using VQGAN+CLIP (z+quantize method with augmentations).

Notebook created Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). The original BigGAN+CLIP method was created by https://twitter.com/advadnoun. Translation \[to spanish\] modifications and explanations added by Eleiber#8347, user-friendly UI created by Abulafia#3734, translated \[from spanish\] and expanded on by An Graves.

Further adapted by Danny Perry of [@Datamosh](https://instagram.com/datamosh) and [artificial_art_](https://twitter.com/artificial_art_)

In [None]:
# @title Licensed under the MIT License

# Copyright (c) 2021 Katherine Crowson

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

In [None]:
#@markdown Execute this cell to view the status of the runtime's GPU, otherwise this is not needed.
!nvidia-smi

#Execution Instructions


**OVERVIEW**

This notebook works by batch augmenting png images in a list. It works well with footage, but can be used on any sequentially named pngs. You can use it on footage by uploading a video, running the ffmpeg cell to split it into individual png frames, specifying prompt and iteration parameters, and then automatically running vqgan on every frame.

**Worfklow should be as follows:**

1.   Mount Drive and define working directory. (Does not need to be re-run)
2.   Download Models and Install the remaining libraries. (Does not need to be re-run)
3.   Upload your footage file and split it into frames using ffmpeg (Does not need to be re-run, only if you want to change the video init)
4.   Define your parameters 
5.   Load model (run this every time)
6.   Execute the run (run this every time)
7.   Upscale with ESRGAN (optional)
8.   Concatenate images into video

**Parameters**

`prompts:` - here you can put the prompt or prompts you want to generate (separated by `|` ). It's a list because you can use more than one prompt, and the AI will try to 'mix' the images, giving the same weight to each one.

`model_id:` -  You can also change which model to use. Currently the available models are 1024, 16384, WikiArt, S-FLCKR and COCO-Stuff. To activate them you first must download them, then just select them from the dropdown.

`width:` - The width of the video. You may get OOM errors if going above 680 square. Keep in mind height and width will resize the video

`height:` - The height of the video. You may get OOM errors if going above 680 square. Keep in mind height and width will resize the video

`display_frequency:` - How often to display the output preview. Measured in iterations.

`seed:` - Which seed to be used for every init frame.

`starting_iterations:` - How many iterations to start frame 1 at.

`max_iterations:` - The maximum amount iterations for any frame.

`increase_every_frame:` - How often to add iterations. IE: If set to 5, it will increase `increase_iterations` amount every 5 frames.

`increase_iterations:` - When adding iterations, decide how many to add.

`amount_of_frames:` - The total amount of frames to process.

`step_size:` - The learning rate. For more cohesive results it is important to keep this lower. 

`angle:` - How many degrees to rotate every frame. 

`zoom_per_frame:` - The percentage to scale your each init image. 1.05 = %5 zoom every frame. 

`acceleration_per_frame:` - The percentage to accelerate the zoom per frame. 1.05 = %5 acceleration every frame. 

`switch_at_frame:` - When this number frame is reached vqgan will no longer use the footage frame as an init image. It will instead use the previously exported vqgan image as an init image.

`add_x:` - Add this many pixels to the x axis every frame. This will move your video horizontally. 

`add_y:` - Add this many pixels to the y axis every frame. This will move your video vertically.


#Mount Drive


This notebook supports mounting your google drive as a folder and storing outputs to it in order to keep them when the runtime disconnects. 

First, choose a `root_path` in your Drive for where you want the project stored. Keep in mind that `root_path` is automatically prefixed with `content/drive/MyDrive`. Once you choose a directory in your Drive a "in" folder, "out" folder, "esrOut" folder and "renders" folder are made. The frame png sequence will be stored in "in", VQGAN exports will be stored in "out", ESRGAN exports in "esrOut" and video renders from the notebook will be stored in "renders".

In [None]:
from google.colab import drive
drive.mount('/content/drive')
root_path = "AI/VQGAN/fishEye1" #@param {type: "string"}
abs_root_path = f'/content/drive/MyDrive/{root_path}'

In [None]:
from pathlib import Path
def checkMakePath(filepath):
    make_file = Path(filepath)
    if not make_file.exists():
      !mkdir --parent {make_file}
      print(f'Made {filepath}')
    else:
      print(f'filepath {filepath} exists.')

inDirPath = f'{abs_root_path}/in'
checkMakePath(inDirPath)
outDirPath = f'{abs_root_path}/out'
checkMakePath(outDirPath)
esrOutDirPath = f'{abs_root_path}/esrOut'
checkMakePath(esrOutDirPath)
rendersDirPath = f'{abs_root_path}/renders'
checkMakePath(rendersDirPath)
videoInPath = f'VideoIn'
checkMakePath(videoInPath)


#Select and Download Models + Libraries

In [None]:
#@title Selection of models to download
#@markdown By default, the notebook downloads Model 16384 from ImageNet. There are others such as ImageNet 1024, COCO-Stuff, WikiArt 1024, WikiArt 16384, FacesHQ or S-FLCKR, which are not downloaded by default, since it would be in vain if you are not going to use them, so if you want to use them, simply select the models to download.

imagenet_1024 = False #@param {type:"boolean"}
imagenet_16384 = True #@param {type:"boolean"}
coco = False #@param {type:"boolean"}
faceshq = False #@param {type:"boolean"}
wikiart_16384 = False #@param {type:"boolean"}
sflckr = False #@param {type:"boolean"}

if imagenet_1024:
  !curl -L -o vqgan_imagenet_f16_1024.yaml -C - 'https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 1024
  !curl -L -o vqgan_imagenet_f16_1024.ckpt -C - 'https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/files/?p=%2Fckpts%2Flast.ckpt&dl=1'  #ImageNet 1024
if imagenet_16384:
  !curl -L -o vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 16384
  !curl -L -o vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 16384
if coco:
  !curl -L -o coco.yaml -C - 'https://dl.nmkd.de/ai/clip/coco/coco.yaml' #COCO
  !curl -L -o coco.ckpt -C - 'https://dl.nmkd.de/ai/clip/coco/coco.ckpt' #COCO
if faceshq:
  !curl -L -o faceshq.yaml -C - 'https://drive.google.com/uc?export=download&id=1fHwGx_hnBtC8nsq7hesJvs-Klv-P0gzT' #FacesHQ
  !curl -L -o faceshq.ckpt -C - 'https://app.koofr.net/content/links/a04deec9-0c59-4673-8b37-3d696fe63a5d/files/get/last.ckpt?path=%2F2020-11-13T21-41-45_faceshq_transformer%2Fcheckpoints%2Flast.ckpt' #FacesHQ
if wikiart_16384:
  !curl -L -o wikiart_16384.yaml -C - 'http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.yaml' #WikiArt 16384
  !curl -L -o wikiart_16384.ckpt -C - 'http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.ckpt' #WikiArt 16384
if sflckr:
  !curl -L -o sflckr.yaml -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fconfigs%2F2020-11-09T13-31-51-project.yaml&dl=1' #S-FLCKR
  !curl -L -o sflckr.ckpt -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fcheckpoints%2Flast.ckpt&dl=1' #S-FLCKR

In [None]:
# @markdown ###Download Libraries
# @markdown This cell may take some time.
 
print("Downloading CLIP...")
!git clone https://github.com/openai/CLIP                 &> /dev/null
 
print("Installing Python AI libraries...")
!git clone https://github.com/CompVis/taming-transformers
!git -C ./taming-transformers/ reset --hard 9d17ea64b820f7633ea6b8823e1f78729447cb5
!pip install -e ./taming-transformers
!pip install ftfy regex tqdm omegaconf pytorch-lightning  &> /dev/null
!pip install kornia                                       &> /dev/null
!pip install einops                                       &> /dev/null
!pip install transformers                                 &> /dev/null
 
print("Installing libraries for metadata handling...")
!pip install stegano                                      &> /dev/null
!apt install exempi                                       &> /dev/null
!pip install python-xmp-toolkit                           &> /dev/null
!pip install imgtag                                       &> /dev/null
!pip install pillow==7.1.2                                &> /dev/null
 
print("Installing Python video creation libraries...")
!pip install imageio-ffmpeg &> /dev/null
!mkdir steps
print("Installation finished.")

In [None]:
# @title Load Libraries and Definitions

import argparse
import math
import sys
import hashlib
from IPython.display import Audio, clear_output
 
sys.path.append('./taming-transformers')
from IPython import display
from base64 import b64encode
from omegaconf import OmegaConf
from PIL import Image
from taming.models import cond_transformer, vqgan
import torch
from torch import nn, optim
from torch.nn import functional as F
from torchvision import transforms
from torchvision.transforms import functional as TF
from tqdm.notebook import tqdm
 
from CLIP import clip
import kornia.augmentation as K
import numpy as np
import imageio
from PIL import ImageFile, Image
from imgtag import ImgTag    # metadatos 
from libxmp import *         # metadatos
import libxmp                # metadatos
from stegano import lsb
import json
import re
ImageFile.LOAD_TRUNCATED_IMAGES = True
 
def sinc(x):
    return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))
 
 
def lanczos(x, a):
    cond = torch.logical_and(-a < x, x < a)
    out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))
    return out / out.sum()
 
 
def ramp(ratio, width):
    n = math.ceil(width / ratio + 1)
    out = torch.empty([n])
    cur = 0
    for i in range(out.shape[0]):
        out[i] = cur
        cur += ratio
    return torch.cat([-out[1:].flip([0]), out])[1:-1]
 
 
def resample(input, size, align_corners=True):
    n, c, h, w = input.shape
    dh, dw = size
 
    input = input.view([n * c, 1, h, w])
 
    if dh < h:
        kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)
        pad_h = (kernel_h.shape[0] - 1) // 2
        input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')
        input = F.conv2d(input, kernel_h[None, None, :, None])
 
    if dw < w:
        kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)
        pad_w = (kernel_w.shape[0] - 1) // 2
        input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')
        input = F.conv2d(input, kernel_w[None, None, None, :])
 
    input = input.view([n, c, h, w])
    return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)
 
 
class ReplaceGrad(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x_forward, x_backward):
        ctx.shape = x_backward.shape
        return x_forward
 
    @staticmethod
    def backward(ctx, grad_in):
        return None, grad_in.sum_to_size(ctx.shape)
 
 
replace_grad = ReplaceGrad.apply
 
 
class ClampWithGrad(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, min, max):
        ctx.min = min
        ctx.max = max
        ctx.save_for_backward(input)
        return input.clamp(min, max)
 
    @staticmethod
    def backward(ctx, grad_in):
        input, = ctx.saved_tensors
        return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None
 
 
clamp_with_grad = ClampWithGrad.apply
 
 
def vector_quantize(x, codebook):
    d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T
    indices = d.argmin(-1)
    x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook
    return replace_grad(x_q, x)
 
 
class Prompt(nn.Module):
    def __init__(self, embed, weight=1., stop=float('-inf')):
        super().__init__()
        self.register_buffer('embed', embed)
        self.register_buffer('weight', torch.as_tensor(weight))
        self.register_buffer('stop', torch.as_tensor(stop))
 
    def forward(self, input):
        input_normed = F.normalize(input.unsqueeze(1), dim=2)
        embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)
        dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)
        dists = dists * self.weight.sign()
        return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()
 
 
def parse_prompt(prompt):
    vals = prompt.rsplit(':', 2)
    vals = vals + ['', '1', '-inf'][len(vals):]
    return vals[0], float(vals[1]), float(vals[2])
 
 
class MakeCutouts(nn.Module):
    def __init__(self, cut_size, cutn, cut_pow=1.):
        super().__init__()
        self.cut_size = cut_size
        self.cutn = cutn
        self.cut_pow = cut_pow
        self.augs = nn.Sequential(
            K.RandomHorizontalFlip(p=0.5),
            # K.RandomSolarize(0.01, 0.01, p=0.7),
            K.RandomSharpness(0.3,p=0.4),
            K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),
            K.RandomPerspective(0.2,p=0.4),
            K.ColorJitter(hue=0.01, saturation=0.01, p=0.7))
        self.noise_fac = 0.1
 
 
    def forward(self, input):
        sideY, sideX = input.shape[2:4]
        max_size = min(sideX, sideY)
        min_size = min(sideX, sideY, self.cut_size)
        cutouts = []
        for _ in range(self.cutn):
            size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
            offsetx = torch.randint(0, sideX - size + 1, ())
            offsety = torch.randint(0, sideY - size + 1, ())
            cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
            cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
        batch = self.augs(torch.cat(cutouts, dim=0))
        if self.noise_fac:
            facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)
            batch = batch + facs * torch.randn_like(batch)
        return batch
 
 
def load_vqgan_model(config_path, checkpoint_path):
    config = OmegaConf.load(config_path)
    if config.model.target == 'taming.models.vqgan.VQModel':
        model = vqgan.VQModel(**config.model.params)
        model.eval().requires_grad_(False)
        model.init_from_ckpt(checkpoint_path)
    elif config.model.target == 'taming.models.cond_transformer.Net2NetTransformer':
        parent_model = cond_transformer.Net2NetTransformer(**config.model.params)
        parent_model.eval().requires_grad_(False)
        parent_model.init_from_ckpt(checkpoint_path)
        model = parent_model.first_stage_model
    else:
        raise ValueError(f'unknown model type: {config.model.target}')
    del model.loss
    return model
 
 
def resize_image(image, out_size):
    ratio = image.size[0] / image.size[1]
    area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])
    size = round((area * ratio)**0.5), round((area / ratio)**0.5)
    return image.resize(size, Image.LANCZOS)

# Split video into images - FFmpeg

Upload your video to the folder `VideoIn` and rename it to `input.mp4` and run the below cell.

 You can skip this step by uploading your png images to `root_path/in` (specified above). Name each frame in-0000.png in ascending order. 

In [None]:
from subprocess import Popen, PIPE
p = Popen(['ffmpeg', '-y', '-i', 'VideoIn/input.mp4', '-qscale:', '2', f'{inDirPath}/in-%04d.png'], stdin=PIPE)

#Parameters

In [None]:
prompts = "the vivid sunset forest landscape with trees made of instruments | psychedelic | Foliage | rendered in 4k 3d " #@param {type:"string"}
width =  800#@param {type:"number"}
height =  450#@param {type:"number"}
model_id = "vqgan_imagenet_f16_16384" #@param ["vqgan_imagenet_f16_16384", "vqgan_imagenet_f16_1024", "wikiart_1024", "wikiart_16384", "coco", "faceshq", "sflckr"]
display_frequency =  3#@param {type:"number"}
#@markdown
seed = "1"#@param {type:"string"}
#@markdown
starting_iterations = 5#@param {type:"number"}
max_iterations = 20#@param {type:"number"}
increase_iterations = 2#@param {type:"number"}
increase_every_frame = 4#@param {type:"number"}
run_batch = True
amount_of_frames = 163#@param {type:"number"}

step_size = 0.04 #@param {type:"slider", min:0, max:1, step:0.01}

#@title Animation properties
#@markdown
angle = 0
zoom_per_frame = 1.0#@param {type:"number"}
acceleration_per_frame = 1.0#@param {type:"number"}
switch_at_frame = 1000#@param {type:"number"}
add_iteration_switch = 10#@param {type:"number"}
add_zoom_switch = 1.0#@param {type:"number"}
add_x = 0#@param {type:"number"}
add_y = 0#@param {type:"number"}

model_names={"vqgan_imagenet_f16_16384": 'ImageNet 16384',"vqgan_imagenet_f16_1024":"ImageNet 1024", 
                 "wikiart_1024":"WikiArt 1024", "wikiart_16384":"WikiArt 16384", "coco":"COCO-Stuff", 
                 "drive/MyDrive/colab/coco":"COCO-Stuff (Local)","faceshq":"FacesHQ", "sflckr":"S-FLCKR"}
model_name = model_names[model_id]     

if not 'abs_root_path' in globals():
    abs_root_path = "/content"


prompts_str = prompts
prompt_hash = hashlib.md5(prompts.encode())
prompt_md5 = prompt_hash.hexdigest()

prompts = [frase.strip() for frase in prompts.split("|")]
if prompts == ['']:
    prompts = []

args = argparse.Namespace(
    prompts=prompts,
    noise_prompt_seeds=[1],
    noise_prompt_weights=[],
    size=[width, height],
    init_weight=0.,
    clip_model='ViT-B/32',
    vqgan_config=f'{model_id}.yaml',
    vqgan_checkpoint=f'{model_id}.ckpt',
    step_size=step_size,
    cutn=64,
    cut_pow=1.,
    display_freq=display_frequency,
    seed=seed,
    amount_of_frames=amount_of_frames,
    hash = prompt_md5,
    prompt_str = prompts_str,
)

!echo "{prompts_str}" > {abs_root_path}/{args.hash}/prompt.txt

#Load Model

In [None]:
#@markdown This cell must be run after parameters are configured, but does NOT need to be run again when parameters are changed
#@markdown Once this has been run successfully you only need to run parameters and then the program to execute with new parameters
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)

cut_size = perceptor.visual.input_resolution
e_dim = model.quantize.e_dim
f = 2**(model.decoder.num_resolutions - 1)
make_cutouts = MakeCutouts(cut_size, args.cutn, cut_pow=args.cut_pow)
n_toks = model.quantize.n_e
toksX, toksY = args.size[0] // f, args.size[1] // f
sideX, sideY = toksX * f, toksY * f
z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]
z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]

normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
                                std=[0.26862954, 0.26130258, 0.27577711])

#Execute the Program

In [None]:
#@title Execute
amount_of_frames = args.amount_of_frames
batchNum = 1;
lastZoomX = 0
lastZoomY = 0
iterationCount = starting_iterations

def synth(z):
    z_q = vector_quantize(z.movedim(1, 3), model.quantize.embedding.weight).movedim(3, 1)
    return clamp_with_grad(model.decode(z_q).add(1).div(2), 0, 1)

def add_xmp_data(nombrefichero):
    imagen = ImgTag(filename=nombrefichero)
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'creator', 'VQGAN+CLIP', {"prop_array_is_ordered":True, "prop_value_is_array":True})
    if args.prompts:
        imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', " | ".join(args.prompts), {"prop_array_is_ordered":True, "prop_value_is_array":True})
    else:
        imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', 'None', {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'i', str(i), {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'model', model_name, {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'seed',str(seed) , {"prop_array_is_ordered":True, "prop_value_is_array":True})
    #for frases in args.prompts:
    #    imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'Prompt' ,frases, {"prop_array_is_ordered":True, "prop_value_is_array":True})
    imagen.close()

@torch.no_grad()
def checkin(i, losses):
    losses_str = ', '.join(f'{loss.item():g}' for loss in losses)
    tqdm.write(f'i: {i}, loss: {sum(losses).item():g}, losses: {losses_str},batch: {batchNum}')
    out = synth(z)
    TF.to_pil_image(out[0].cpu()).save('progress.png')
    #no one is looking at this info just save some cycles
    #add_stegano_data('progress.png')
    #add_xmp_data('progress.png')
    display.display(display.Image('progress.png'))

def ascend_txt():
    global i
    global z

    out = synth(z)
    iii = perceptor.encode_image(normalize(make_cutouts(out))).float()

    result = []

    if args.init_weight:
        result.append(F.mse_loss(z, z_orig) * args.init_weight / 2)

    for prompt in pMs:
        result.append(prompt(iii))
    img = np.array(out.mul(255).clamp(0, 255)[0].cpu().detach().numpy().astype(np.uint8))[:,:,:]
    img = np.transpose(img, (1, 2, 0))
    filename = f"steps/{i:04}.png"
    imageio.imwrite(filename, np.array(img))

    add_xmp_data(filename)
    return result

def train(i):
    global opt
    global z

    opt.zero_grad()
    lossAll = ascend_txt()
    if i % args.display_freq == 0:
        checkin(i, lossAll)
    loss = sum(lossAll)
    loss.backward()
    opt.step()
    with torch.no_grad():
        z.copy_(z.maximum(z_min).minimum(z_max))


def save_final(s):
    global i
    out = synth(z)
    filename = f"{outDirPath}/out-{batchNum:04}.png"
    TF.to_pil_image(out[0].cpu()).save(filename)


def run_seed():
    torch.cuda.empty_cache()
    global opt
    global z
    global i
    global lastZoomX
    global lastZoomY
    global iterationCount
    i = 0

    # **** Animation ****
    addX = add_x*batchNum
    addY = add_y*batchNum
    accelerate = (acceleration_per_frame-1)*batchNum + 1


    filenameIn = f"{inDirPath}/in-{batchNum:04}.png"
    
    if batchNum < switch_at_frame: # If the current frame number is less than switch_at_frame. Will load in frame.
        img = Image.open(filenameIn);
        inWidth, inHeight = img.size
        currentZoomX = round( (inWidth*zoom_per_frame-inWidth)*accelerate + lastZoomX )
        currentZoomY = round( (inHeight*zoom_per_frame-inHeight)*accelerate + lastZoomY )
        lastZoomX = currentZoomX
        lastZoomy = currentZoomY
        zoom_width = (inWidth + currentZoomX)
        zoom_height = (inHeight + currentZoomY)
        crop_coord_left = (currentZoomX+addX)
        crop_coord_upper = (currentZoomY+addY)
        crop_box = (crop_coord_left, crop_coord_upper, inWidth, inHeight)
    else: # If the current frame number is greater than switch_at_frame. Will load previous vqgan export as init.
        currentZoom = round( ((width*zoom_per_frame)-width)*accelerate*add_zoom_switch ) * 2
        initImage = f"{outDirPath}/out-{batchNum-1:04}.png"
        zoom_width = (width + currentZoom)
        zoom_height = (height + currentZoom)
        crop_coord_left = (currentZoom)
        crop_coord_upper = (currentZoom)
        crop_box = (crop_coord_left, crop_coord_upper, width, height)
        img = Image.open(initImage)

    # *** LOAD INIT IMAGE ***
    img_edited = img.rotate(angle).resize((zoom_width, zoom_height), resample=Image.NEAREST).crop(crop_box)
    pil_image = img_edited.convert('RGB')
    pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
    z, *_ = model.encode(TF.to_tensor(pil_image).to(device).unsqueeze(0) * 2 - 1)

    z_orig = z.clone()
    z.requires_grad_(True)
    opt = optim.Adam([z], lr=args.step_size)

    # *** LOOP THROUGH ITERATIONS ***
    with tqdm() as pbar:
        while True:
            train(i)
            if i == iterationCount:
                break
            i += 1
            pbar.update()


pMs = []

for prompt in args.prompts:
    txt, weight, stop = parse_prompt(prompt)
    embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
    pMs.append(Prompt(embed, weight, stop).to(device))
for seed, weight in zip(args.noise_prompt_seeds, args.noise_prompt_weights):
    gen = torch.Generator().manual_seed(seed)
    embed = torch.empty([1, perceptor.visual.output_dim]).normal_(generator=gen)
    pMs.append(Prompt(embed, weight).to(device))

def loadImgPrompts():
    # Loads the last vqgan export as an image prompt at weight 1.5. Can try changing the 1.5 below.
    filename = f"{outDirPath}/out-{batchNum-1:04}.png:1.0"
    path, weight, stop = parse_prompt(filename)
    img = resize_image(Image.open(path).convert('RGB'), (sideX, sideY))
    batch = make_cutouts(TF.to_tensor(img).unsqueeze(0).to(device))
    embed = perceptor.encode_image(normalize(batch)).float()
    if batchNum != 1:
        pMs.pop(-1)
    pMs.append(Prompt(embed, weight, stop).to(device))



try:
    while amount_of_frames > 0:
        clear_output(wait=True)

        s = args.seed
        torch.manual_seed(1)
        print('Using seed:', s)
        if batchNum != 1:
          loadImgPrompts()
        run_seed()
        save_final(s)
        batchNum += 1;
        if iterationCount < max_iterations:
          if batchNum % increase_every_frame == 0:
            iterationCount += increase_iterations
        if batchNum == switch_at_frame:
            iterationCount += add_iteration_switch
        amount_of_frames -= 1

except KeyboardInterrupt:
    del z
    del z_max
    del z_min
    del opt
    pass


# Upscale images ESRGAN


In [None]:
def installESRGAN():
  print("Installing libraries for Real-ESRGAN upscaling.")
  !git clone https://github.com/xinntao/Real-ESRGAN.git
  %cd Real-ESRGAN
  !pip install basicsr
  !pip install facexlib
  !pip install gfpgan
  !pip install -r requirements.txt
  !python setup.py develop
  # Download the pre-trained model(s)
  !wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth -P experiments/pretrained_models
  !wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth -P experiments/pretrained_models
  !wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth -P experiments/pretrained_models
  print("Finished Installing libraries for Real-ESRGAN upscaling.")
  %cd ..

installESRGAN()

In [None]:
def runUpscale(filename):
  #upload images
  inFilename = f'{outDirPath}/{filename}' 
  outFilename = f'{esrOutDirPath}' 
  import os
  from google.colab import files
  import shutil

  #run upscaler
  #!python inference_realesrgan.py --model_path experiments/pretrained_models/RealESRGAN_x4plus.pth --input upload --netscale $scale_value --outscale $scale_value --half --face_enhance
  !python /content/Real-ESRGAN/inference_realesrgan.py --model_path $model_value --netscale $scale_value --input $inFilename --output $outFilename --ext jpg



print("Cleaning up from last run...")
!rm -rf esrOutDirPath
!mkdir esrOutDirPath


model_value='/content/Real-ESRGAN/experiments/pretrained_models/RealESRGAN_x2plus.pth' #@param ['/content/Real-ESRGAN/experiments/pretrained_models/RealESRGAN_x4plus_anime_6B.pth','/content/Real-ESRGAN/experiments/pretrained_models/RealESRGAN_x4plus.pth','/content/Real-ESRGAN/experiments/pretrained_models/RealESRGAN_x2plus.pth'] {type:"string"}
scale_value="4" #@param [2, 4] {type:"string"}
init_frame = 1#@param {type:"number"}
last_frame = 110#@param {type:"number"}

torch.cuda.empty_cache()
try:
    for i in range(init_frame, last_frame+1): #
        filename = f"out-{i:04}.png"
        print(f'Upscaling frame {i}')
        runUpscale(filename)
except KeyboardInterrupt:
    torch.cuda.empty_cache()
    pass

#Generate Video from frames - FFmpeg

In [None]:
#@title Generate
#@markdown `make_video_from` specifies which directory you would like to make the video from. `ESRGAN_Out` will only work if you have ran the upscaler above.

make_video_from = 'ESRGAN_Out' #@param ['ESRGAN_Out','VQGAN_Out']
fps =  30#@param {type: Number} #
final_video_name = "waktinsVQGANesr.mp4" #@param {type:"string"}
inDir = f'{outDirPath}/out-%4d.png'
if make_video_from == 'ESRGAN_Out':
    inDir = f'{esrOutDirPath}/out-%4d_out.jpg'


from subprocess import Popen, PIPE
p = Popen(['ffmpeg', '-y', '-i', inDir, '-r', f'{fps}', '-qscale:v', '2', f'{rendersDirPath}/{final_video_name}', ], stdin=PIPE)
p.stdin.close()

print("The video is being compiled, please wait...")
p.wait()
print("The video is ready.")

In [None]:
#@markdown This process may be quite slow. It is quicker to open up [Google Drive](https://www.google.com/drive/) in the browser and navigate to the `root_dir` you set when Drive was mounted. 
# @title Download Video
from google.colab import files
files.download(f'{rendersDirPath}/{final_video_name}')