<a href="https://colab.research.google.com/github/xvdp/siren/blob/x_dev/siren_video.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Siren Video stress test
Notes on tests for video

**Loss** is $ (x - y)^2 $ Decreases in few iterations very quickly; most of the computation for a video is required to smooth the high frequency details and sharp boundaries.

**Sampling** strategy differs from sampling of a single image. In the first case the image is loaded as a single instance. The current loaders load the video in RAM but cannot load entire data to GPU, choosing a single uniform random sample of size *batch_size*- which may be repeated - per epoch (pass strategy=-1), resulting in a normal distribution of sample visitation over time. This form of sampling results in close to monotonic loss decay.

Ensuring complete non repeating sampling per epoch (strategy=1) results in worse performance, as does passing complete contiguous blocks of data. Possibly the repetition of some samples regularizes the loss preventing overfitting. 

**Batch size** will depend on GPU availability.
Supplementary Info pg. 16
"*The Adam optimizer with a learning rate of1×10−4was used for all experi-ments. We set the batch size to fill the memory of the GPUs (roughly 160,000)"
"We train the videos for 100,000 iterations, requiring approximately 15 hours."*

 Batch or sample size on this colab ~ 320000 - each iteration is taking 2s, making training of a small video, ~ 57 hours, a bit much. On a local machine with TitanGTX 24G - computation took 23 hours.


In [None]:
!pip install git+https://github.com/xvdp/vidi.git

In [33]:
!rm -rf siren/
!git clone -b x_dev https://github.com/xvdp/siren.git

Cloning into 'siren'...
remote: Enumerating objects: 351, done.[K
remote: Counting objects: 100% (158/158), done.[K
remote: Compressing objects: 100% (121/121), done.[K
remote: Total 351 (delta 97), reused 79 (delta 36), pack-reused 193[K
Receiving objects: 100% (351/351), 15.68 MiB | 15.89 MiB/s, done.
Resolving deltas: 100% (164/164), done.


In [6]:
import os
import os.path as osp
import yaml
import torch
from IPython.display import HTML
from base64 import b64encode

os.chdir("siren")
import x_utils
import x_dataio
import x_modules
import x_infer
from x_infer import SirenRender, render_video
from x_training import train, _continue, load_last_checkpoint, _prevent_overwrite

## check available GPU and CPU

In [7]:
x_utils.GPUse(), x_utils.CPUse()

(GPU: ({'total': 15109, 'used': 0, 'available': 15109, 'percent': 0.0, 'units': 'MB'}),
 CPU: ({'total': 12993, 'used': 627, 'available': 12092, 'percent': 6.9, 'units': 'MB'}))

## connect gdrive to colab and redirect logging root

In [8]:
from google.colab import drive
drive_dir = '/content/gdrive'
drive.mount(drive_dir)

Mounted at /content/gdrive


In [9]:
logging_root = "/content/gdrive/MyDrive/siren"
os.makedirs(logging_root, exist_ok=True)
osp.isdir(logging_root)

True

## download cat video
from https://drive.google.com/drive/u/0/folders/1_iq__37-hw7FJOEUK1tX7mdp8SKB368K

In [11]:
#https://drive.google.com/file/d/1ZCr6HTrNu8f6T-nyIbToYXHMOKU88f7P/view?usp=sharing
!gdown --id 1ZCr6HTrNu8f6T-nyIbToYXHMOKU88f7P 

Downloading...
From: https://drive.google.com/uc?id=1ZCr6HTrNu8f6T-nyIbToYXHMOKU88f7P
To: /content/siren/cat_video.mp4
4.49MB [00:00, 39.4MB/s]


In [12]:
os.rename("cat_video.mp4", "data/cat_video.mp4")

convert video for display

In [13]:
!ffmpeg -i data/cat_video.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 data/cat_video_web.mp4

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lib

In [14]:
mp4 = open("data/cat_video_web.mp4","rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=512 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [18]:
# Easy_Dict is a dictionary accessible as object ( not pypi EasyDict )
config_file = "x_periment_scripts/cat_s-1_100k.yml"
with open(config_file, "r") as _fi:
    opt = x_utils.EasyDict(yaml.load(_fi))#, Loader=yaml.FullLoader))

In [19]:
opt.logging_root = logging_root
config_file = "_colab".join(osp.splitext(config_file))
opt.to_yaml(config_file)
print(opt.logging_root, config_file, osp.isfile(config_file))

/content/gdrive/MyDrive/siren x_periment_scripts/cat_s-1_100k_colab.yml True


### training options differ from main branch: containing:
* model definition
* strategy  [-1] original, fully random single sample per epoch
 [1] complete set of non repeating samples per epoch [2] grided sampling, dense and sparse : original strategy works best
* sample_frac: None, is estimated from GPU available
* sample_size: estimated below


In [20]:
gpus = x_utils.GPUse()
# sample size is multiplied * 2 over the estimate, probably gradient operations over latents are optimized
# ergo single operations (sin, +, *) do not save full new latent gradient 
opt.sample_size = x_utils.estimate_samples(gpus.available, **opt["siren"])*2 
opt.epochs_til_checkpoint = 1000 # lower epochs till checkpoint, just in case, colab may boot us out
opt.num_epochs = 5000 # lower epochs, 100k woudl be 57 hours
opt.num_steps = 5000

# To continue, skip 

In [13]:
opt

{'batch_size': 1,
 'checkpoint_path': None,
 'data_path': './data/cat_video.mp4',
 'epochs_til_checkpoint': 1000,
 'experiment_name': 'cat_s-1_100k',
 'frame_range': None,
 'logging_root': '/content/gdrive/MyDrive/siren',
 'lr': 0.0001,
 'model_type': 'sine',
 'num_epochs': 5000,
 'num_steps': 5000,
 'sample_frac': None,
 'sample_size': 321482,
 'shuffle': 1,
 'siren': {'first_omega_0': 30,
  'hidden_features': 1024,
  'hidden_layers': 3,
  'hidden_omega_0': 30.0,
  'in_features': 3,
  'out_features': 3,
  'outermost_linear': True},
 'steps_til_summary': 1,
 'strategy': -1,
 'train_type': 'video',
 'verbose': 1}

## in this instance, sample_size is estimated at 321,480 float32
~2x the spec on the paper (160,000) 

**Does this mean that instead of 100k iterations similar loss will be reaches in 50k operations?**

Running the same experiments with the other strategies demonstrates that the a uniform random sampling (i.e. original strategy, -1), has  nearly monotonic convergence whereas running randomly repeating grids (strategy 2) or, non repeating complete random samples over the data ( strategy 1), overfits some data. 

These experimients will not be run here but to test wsuffices to set opt.strategy = 1 or 2



# VideoDataset is modificiation over main branch.
* Does not create m_grid then referenceit but finds grid pos from random index
* different strategies. see x_dataio.VideoDataset

In [14]:
dset = x_dataio.VideoDataset(opt.data_path, sample_size=opt.sample_size, frame_range=opt.frame_range, strategy=opt.strategy)

Siren: VideoDataset - INFO - Loaded data, sidelen: [300, 512, 512], channels 3
Siren: VideoDataset - INFO -          => reshaped to: (78643200, 3)
Siren: VideoDataset - INFO -  max sample_size, 321482, fraction, 0.0041
Siren: VideoDataset - INFO -  strategy: -1, single sample per epoch


In [15]:
dataloader = torch.utils.data.DataLoader(dset, shuffle=opt.shuffle, batch_size=1, pin_memory=True, num_workers=0)

## store training options to yaml

In [16]:
opt.sidelen = tuple(dset.data.shape[:-1])
opt.chanels = dset.data.shape[-1]
opt.sample_size = dset.sample_size
opt.dset_len = len(dset)
if "num_steps" not in opt:
    opt.num_steps = None
folder = osp.join(opt.logging_root, opt.experiment_name)
opt.to_yaml(osp.join(folder, "training_options.yml"))

## create model and run
Siren used in this instance is simplified - meta nodes disabled

In [17]:
model = x_modules.Siren(**opt["siren"])
model.cuda()

Siren(
  (net): Sequential(
    (0): SineLayer(
      (linear): Linear(in_features=3, out_features=1024, bias=True)
    )
    (1): SineLayer(
      (linear): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (2): SineLayer(
      (linear): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (3): SineLayer(
      (linear): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (4): Linear(in_features=1024, out_features=3, bias=True)
  )
)

# Only subset training, at 2.09s/iteration, 100k would be ~57 hours

In [18]:
checkpoint = train(model, dataloader, opt.num_epochs, opt.lr, opt.epochs_til_checkpoint, folder, dataset=dset,
                   num_steps=opt.num_steps, steps_til_summary=100, terminator={"end":"\n"})

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


logging file created: /content/gdrive/MyDrive/siren/cat_s-1_100k/train.csv
Epoch	Iter	IterAll	tGPU	GPU	CPU	Loss	Time	Total_Time
99	0	99	13878	14962	4083	0.019	1.95	195
199	0	199	13878	14962	4083	0.0159	1.98	395
299	0	299	13878	14962	4079	0.014	1.99	597
399	0	399	13878	14962	4081	0.0128	2.0	798
499	0	499	13878	14962	4080	0.012	2.0	999
599	0	599	13878	14962	4080	0.0109	2.0	1201
699	0	699	13878	14962	4083	0.0101	2.0	1402
799	0	799	13878	14962	4084	0.00947	2.0	1603
899	0	899	13878	14962	4083	0.00894	2.01	1805
999	0	999	13878	14962	4083	0.00866	2.01	2006
1099	0	1099	13878	14962	4085	0.00825	2.01	2207
1199	0	1199	13878	14962	4084	0.00793	2.01	2408
1299	0	1299	13878	14962	4085	0.00741	2.01	2610
1399	0	1399	13878	14962	4085	0.00724	2.01	2811
1499	0	1499	13878	14962	4085	0.007	2.01	3012
1599	0	1599	13878	14962	4085	0.00668	2.01	3214
1699	0	1699	13878	14962	4087	0.00653	2.01	3415
1799	0	1799	13878	14962	4086	0.00631	2.01	3616
1899	0	1899	13878	14962	4086	0.00609	2.01	3817
1999	0	1999	13878	14962

## Render partial result
* delete training, render from checkpoint 
could render from model. 



In [19]:
sidelen = dset.sidelen.tolist()
del model
del dset
del dataloader
print(sidelen)

[300, 512, 512]


In [20]:
checkpoint = "/content/gdrive/MyDrive/siren/cat_s-1_100k/model_final.pth"
osp.isfile(checkpoint), osp.isdir(logging_root)

(True, True)

In [22]:
with torch.no_grad():
    # model.eval()
    chunksize = int(x_utils.estimate_frames(sidelen[-2:], grad=0)//0.5)
    render = x_utils.EasyDict(model=checkpoint, sidelen=sidelen, chunksize=chunksize)
    render.name = osp.join(folder, "cat_recon_{:04}.mp4".format(5000))
    render.fps = 25
    S = SirenRender(**render)



Siren: Render - INFO - loading model from checkpoint: /content/gdrive/MyDrive/siren/cat_s-1_100k/model_final.pth


In [23]:
S.render_video()

Siren: Render - INFO - Rendering /content/gdrive/MyDrive/siren/cat_s-1_100k/cat_recon_5000.mp4: 300,(512, 512),3, chunks: 3
Siren: Render - INFO - 
 Finished render /content/gdrive/MyDrive/siren/cat_s-1_100k/cat_recon_5000.mp4, to play result call: >>> self.play()


In [24]:
render = "/content/gdrive/MyDrive/siren/cat_s-1_100k/cat_recon_5000.mp4"
cat = "data/cat_video.mp4"
osp.isfile(render), osp.isfile(cat)

(True, True)

In [44]:
# something not right with the compositing command
# comp = "/content/gdrive/MyDrive/siren/cat_s-1_100k/composite.mp4"
# !ffmpeg -i $render -i $cat -filter_complex "[0:v][1:v]hstack,format=yuv420p[v];[0:a][1:a]amerge[a]" -map "[v]" -map "[a]" -c:v libx264 -crf 18 -ac 2 $comp


# render of checkpoint at 5000 iterations of sample size 321,482, shows some sine crawling artifacts. Paper calls for 100k iterations of sample size 160,000

In [25]:
mp4 = open(render,"rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=512 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [27]:
rng_state = torch.get_rng_state()
print("random_state", rng_state)
# torch.set_rng_state

random_state tensor([144, 241, 175,  ...,   0,   0,   0], dtype=torch.uint8)


# Continue Training for 10k iterations
- even if paper calls for 100k (or presumably 50k iterations) seeing that loss is no longer completely monotonic, decay lr by 1/2 order of magnitude.
- TODO measure ema of loss instead of item loss.

In [33]:
opt.lr /= 10**(1/2)

In [None]:
dset = x_dataio.VideoDataset(opt.data_path, sample_size=opt.sample_size, frame_range=opt.frame_range, strategy=opt.strategy)

torch.set_rng_state(rng_state)
dataloader = torch.utils.data.DataLoader(dset, shuffle=opt.shuffle, batch_size=1, pin_memory=True, num_workers=0)

model = x_modules.Siren(**opt["siren"])
model.cuda()
opt.num_epochs = 10000 # lower epochs, 100k woudl be 57 hours
opt.num_steps = 10000

train(model, dataloader, opt.num_epochs, opt.lr, opt.epochs_til_checkpoint, folder, dataset=dset,
                   num_steps=opt.num_steps, steps_til_summary=200, terminator={"end":"\n"})



Siren: VideoDataset - INFO - Loaded data, sidelen: [300, 512, 512], channels 3
Siren: VideoDataset - INFO -          => reshaped to: (78643200, 3)
Siren: VideoDataset - INFO -  max sample_size, 321482, fraction, 0.0041
Siren: VideoDataset - INFO -  strategy: -1, single sample per epoch


The model directory /content/gdrive/MyDrive/siren/cat_s-1_100k exists. Overwrite? (y/n)n
/content/gdrive/MyDrive/siren/cat_s-1_100k/train.csv found with len 5000
{'Epoch': 4999.0, 'Iter': 0.0, 'IterAll': 4999.0, 'tGPU': 13878.0, 'GPU': 14962.0, 'CPU': 4092.0, 'Loss': 0.00417, 'Time': 2.01, 'Total_Time': 10047.0}
5198	0	199	13898	14982	5007	0.0029	52.25	10449
5398	0	399	13898	14982	5007	0.00283	27.13	10853
5598	0	599	13898	14982	5009	0.00275	18.76	11257
5798	0	799	13898	14982	5007	0.00268	14.58	11662
5998	0	999	13898	14982	5008	0.00263	12.07	12066
6198	0	1199	13898	14982	5009	0.00262	10.39	12470
6398	0	1399	13898	14982	5009	0.00262	9.2	12875
6598	0	1599	13898	14982	5009	0.00258	8.3	13280
6798	0	1799	13898	14982	5010	0.00252	7.6	13684
6998	0	1999	13898	14982	5010	0.00249	7.04	14088
7198	0	2199	13898	14982	5010	0.00248	6.59	14492
7398	0	2399	13898	14982	5010	0.00245	6.21	14897
7598	0	2599	13898	14982	5011	0.00245	5.89	15301
7798	0	2799	13898	14982	5011	0.00242	5.61	15704
7998	0	2999	13898

In [None]:
del model
del dset
del dataloader

# render video after 13k iterations - colab booted us out!

videos rendered offline, comparison in github README.md

In [None]:
# if we hadnt been booted out, this would be inherited from the training
sidelen = [300,512,512]

In [None]:
checkpoint = "/content/gdrive/MyDrive/siren/cat_s-1_100k/model_epoch_13000.pth"
print(osp.isfile(checkpoint), osp.isdir(logging_root))
folder = osp.split(checkpoint)[0]
rendername = osp.join(folder, "cat_recon_{:04}.mp4".format(13000))

with torch.no_grad():
    # model.eval()
    chunksize = int(x_utils.estimate_frames(sidelen[-2:], grad=0)//0.5)
    render = x_utils.EasyDict(model=checkpoint, sidelen=sidelen,
                              chunksize=chunksize)
    render.name = rendername
    render.fps = 25
    S = SirenRender(**render)
    S.render_video()


In [None]:
render = "/content/gdrive/MyDrive/siren/cat_s-1_100k/cat_recon_13000.mp4"
mp4 = open(render,"rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=512 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

# continue after 13k iterations

In [22]:
checkpoint = "/content/gdrive/MyDrive/siren/cat_s-1_100k/model_epoch_13000.pth"
osp.isfile(checkpoint)


True

In [25]:
# lower learning rate 1e05
# train for another 5k epochs
opt.lr = 1e-5

{'batch_size': 1,
 'checkpoint_path': None,
 'data_path': './data/cat_video.mp4',
 'epochs_til_checkpoint': 1000,
 'experiment_name': 'cat_s-1_100k',
 'frame_range': None,
 'logging_root': '/content/gdrive/MyDrive/siren',
 'lr': 1e-05,
 'model_type': 'sine',
 'num_epochs': 5000,
 'num_steps': 5000,
 'sample_frac': None,
 'sample_size': 321482,
 'shuffle': 1,
 'siren': {'first_omega_0': 30,
  'hidden_features': 1024,
  'hidden_layers': 3,
  'hidden_omega_0': 30.0,
  'in_features': 3,
  'out_features': 3,
  'outermost_linear': True},
 'steps_til_summary': 1,
 'strategy': -1,
 'train_type': 'video',
 'verbose': 1}

In [27]:
folder = osp.join(opt.logging_root, opt.experiment_name)

dset = x_dataio.VideoDataset(opt.data_path, sample_size=opt.sample_size,
                             frame_range=opt.frame_range, strategy=opt.strategy)

#torch.set_rng_state(rng_state)
dataloader = torch.utils.data.DataLoader(dset, shuffle=opt.shuffle,
                                         batch_size=1, pin_memory=True,
                                         num_workers=0)

model = x_modules.Siren(**opt["siren"])
model.cuda()
train(model, dataloader, opt.num_epochs, opt.lr, opt.epochs_til_checkpoint,
      folder, dataset=dset, num_steps=opt.num_steps, steps_til_summary=200,
      terminator={"end":"\n"})



Siren: VideoDataset - INFO - Loaded data, sidelen: [300, 512, 512], channels 3
Siren: VideoDataset - INFO -          => reshaped to: (78643200, 3)
Siren: VideoDataset - INFO -  max sample_size, 321482, fraction, 0.0041
Siren: VideoDataset - INFO -  strategy: -1, single sample per epoch


The model directory /content/gdrive/MyDrive/siren/cat_s-1_100k exists. Overwrite? (y/n)n
/content/gdrive/MyDrive/siren/cat_s-1_100k/train.csv found with len 13079
{'Epoch': 13077.0, 'Iter': 0.0, 'IterAll': 8078.0, 'tGPU': 13898.0, 'GPU': 14982.0, 'CPU': 4912.0, 'Loss': 0.00215, 'Time': 3.27, 'Total_Time': 26383.0}


To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


13276	0	199	13878	14962	5412	0.00259	133.97	26793
13476	0	399	13878	14962	5410	0.00252	68.01	27204
13676	0	599	13878	14962	5410	0.00244	46.03	27616
13876	0	799	13878	14962	5411	0.00239	35.04	28028
14076	0	999	13878	14962	5413	0.00233	28.44	28440
14276	0	1199	13878	14962	5415	0.00229	24.04	28852
14476	0	1399	13878	14962	5415	0.00224	20.9	29262
14676	0	1599	13878	14962	5416	0.00222	18.55	29672
14876	0	1799	13878	14962	5415	0.00219	16.71	30085
15076	0	1999	13878	14962	5416	0.00216	15.25	30500
15276	0	2199	13878	14962	5417	0.00216	14.05	30911
15476	0	2399	13878	14962	5418	0.00209	13.05	31324
15676	0	2599	13878	14962	5418	0.00209	12.21	31737
15876	0	2799	13878	14962	5418	0.00208	11.48	32149
16076	0	2999	13878	14962	5418	0.00206	10.85	32561
16276	0	3199	13878	14962	5419	0.00204	10.3	32973
16476	0	3399	13878	14962	5418	0.00201	9.82	33385
16676	0	3599	13878	14962	5418	0.002	9.39	33797
16876	0	3799	13878	14962	5419	0.00199	9.0	34209
17076	0	3999	13878	14962	5418	0.00197	8.66	34622
17276	0	4199	

'/content/gdrive/MyDrive/siren/cat_s-1_100k/model_final.pth'