# torchvideo Transforms demo

This notebook demonstrates the video transforms present in `torchvideo`.

## Contents

1. [Set up](#Set-up)
  1. [Imports](#Imports)
  2. [Downloading media](#Downloading-media)
2. [PIL Video transforms](#PIL-Video-transforms)
  1. [CenterCropVideo](#CenterCropVideo)
  2. [RandomCropVideo](#RandomCropVideo)
  3. [RandomHorizontalFlipVideo](#RandomHorizontalFlipVideo)
  4. [MultiScaleCropVideo](#MultiScaleCropVideo)
  5. [RandomResizedCropVideo](#RandomResizedCropVideo)
  6. [TimeApply](#TimeApply)
3. [Torch.*Tensor transforms](#Torch.*Tensor-transforms)
  1. [NormalizeVideo](#NormalizeVideo)
  2. [TimeToChannel](#TimeToChannel)

---

## Set up

### Imports

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Add the library path to sys.path so we can import torchvideo
import sys
sys.path.append('../src')
print(sys.executable)
print(sys.version)

In [None]:
from torchvideo.transforms import *
from torchvideo.datasets import *
from torchvideo.samplers import *
from torchvideo.tools import show_video
from torchvision.transforms import Compose, Lambda, Grayscale

---

### Downloading media

First we need to download a test video and prepare the some toy datasets. We'll reuse the media used to test `torchvideo`. The `gen_test_media` script will download a short clip of [Big Buck Bunny](https://peach.blender.org/) and create datasets suitable for use with all `VideoDataset` classes:

- An [`ImageFolderVideoDataset`](https://torchvideo.readthedocs.io/en/latest/datasets.html#imagefoldervideodataset) where each example is comprised of a set of frames stored as images on disk.
- A [`VideoFolderDataset`](https://torchvideo.readthedocs.io/en/latest/datasets.html#videofolderdataset) where each example is stored as a video file.
- A [`GulpVideoDataset`](https://torchvideo.readthedocs.io/en/latest/datasets.html#gulpvideodataset) where frames are stored in a simple binary format of concatenated JPEGs (see the [GulpIO](https://github.com/TwentyBN/GulpIO) README for more info on this format)

In [None]:
%%bash
# Download the test media
if [[ ! -f ../tests/data/media/big_buck_bunny_360p_5mb.mp4 ]]; then
  cd ../tests/data/media 
  ./gen_test_media.sh > /dev/null 2>&1
  cd -
fi

---

Now that we have some data, we can play around with `torchvideo`'s transform classes. But first we'll need to get a video to transform.

In [None]:
dataset = VideoFolderDataset('../tests/data/media/video_folder/')
video = dataset[0]
type(video), video.shape

By default the `VideoDataset` classes process the data into a CTHW format suitable for training a network. We'll convert this back to a representation that we can display in this notebook using `show_video`, a little function included in `torchvideo` for playing around with transforms.

Here's our original video

In [None]:
show_video(dataset[0])

Now let's load the frames a list of PIL.Images.

In [None]:
# The VideoFolderDataset will pass an iterator of PIL Images to the transform
# so we need to collect the frames from this iterator into a list.
dataset.transform = CollectFrames()
video = dataset[0]
type(video), len(video), type(video[0])

Now let's play around with the transforms, we'll define a few helper functions to sample the video multiple as many transforms have a random element.

In [None]:
from moviepy.editor import *
import torch
import numpy as np


def tensor_video2clip(tensor_video, fps=24):
    # CTHW -> THWC
    vid = tensor_video.mul_(255).to(torch.uint8).permute(1, 2, 3, 0).cpu().numpy()
    return ImageSequenceClip(list(vid), fps=fps)


def flat2grid(seq, n_cols):
    n_rows = int(np.ceil(len(seq) / n_cols))
    grid = []
    for irow in range(n_rows):
        row = []
        grid.append(row)
        for icol in range(n_cols):
            i = irow * n_cols + icol
            if i == len(seq):
                break
            row.append(seq[i])
    return grid
    

def demo_transform(transform, n_samples=1, tile_width=3, fps=24):
    if n_samples > tile_width and not (n_samples / tile_width).is_integer():
        raise ValueError("Expected tile_width to divide n_samples perfectly.")
    samples = []
    for i in range(n_samples):
        samples.append(tensor_video2clip(transform(video), fps=fps))
    samples = flat2grid(samples, tile_width)
        
    return clips_array(samples).ipython_display()

... and now we can play with transforms!

---

# [PIL Video transforms](https://torchvideo.readthedocs.io/en/latest/transforms.html#transforms-on-pil-videos)

## [CenterCropVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#centercropvideo)

In [None]:
demo_transform(Compose([
    CenterCropVideo((200, 400)),
    CollectFrames(),
    PILVideoToTensor()
]))

## [RandomCropVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#randomcropvideo)

In [None]:
demo_transform(Compose([
    RandomCropVideo((150, 300)),
    CollectFrames(),
    PILVideoToTensor()
]), n_samples=9)

In [None]:
demo_transform(Compose([
    ResizeVideo((100, 200)),
    RandomCropVideo((150, 300), pad_if_needed=True),
    CollectFrames(),
    PILVideoToTensor()
]), n_samples=9)

In [None]:
demo_transform(Compose([
    ResizeVideo((130, 280)),
    RandomCropVideo((150, 300), padding=(20, 20, 0, 0)),
    CollectFrames(),
    PILVideoToTensor()
]), n_samples=9)

## [RandomHorizontalFlipVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#randomhorizontalflipvideo)

In [None]:
demo_transform(Compose([
    ResizeVideo((184, 320)),
    RandomHorizontalFlipVideo(),
    CollectFrames(),
    PILVideoToTensor()
]), n_samples=4, tile_width=2)

## [MultiScaleCropVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#multiscalecropvideo)

In [None]:
demo_transform(Compose([
    ResizeVideo((184, 320)),
    MultiScaleCropVideo((100, 200), max_distortion=2),
    CollectFrames(),
    PILVideoToTensor(),
]), n_samples=9)

## [RandomResizedCropVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#randomresizedcropvideo)

In [None]:
demo_transform(Compose([
    RandomResizedCropVideo((100, 200)),
    CollectFrames(),
    PILVideoToTensor(),
]), n_samples=9)

## [TimeApply](https://torchvideo.readthedocs.io/en/latest/transforms.html#timeapply)

In [None]:
demo_transform(Compose([
    TimeApply(Grayscale()),
    CollectFrames(),
    PILVideoToTensor(),
]), n_samples=1)

---

# [Torch.*Tensor transforms](https://torchvideo.readthedocs.io/en/latest/transforms.html#transforms-on-torch-tensor-videos)

## [NormalizeVideo](https://torchvideo.readthedocs.io/en/latest/transforms.html#normalizevideo)

In [None]:
dataset.transform = Compose([
    CollectFrames(),
    PILVideoToTensor(),
    NormalizeVideo(128, 100)
])
x = dataset[0].cpu().numpy()
x.shape, x.mean(), x.std()

## [TimeToChannel](https://torchvideo.readthedocs.io/en/latest/transforms.html#timetochannel)

In [None]:
dataset.transform = Compose([
    CollectFrames(),
    PILVideoToTensor(),
    TimeToChannel()
])
dataset[0].cpu().shape