lossy image compression with compressive autoencoders

arch

These models are inspired from [1].

As input, we have raw 720p images from YouTube-8M dataset (credit goes to gsssrao for the downloader and frames generator scripts). The dataset consists of 121,827 frames. The images are padded to 1280x768 (i.e. 24,24 height pad), so that they can be split into 60 128x128 patches. The model only gets to see a singular patch at a time; the loss is computed as MSELoss(orig_ij, out_ij) (thus, there are 60 optimization steps per image).

Before I get the chance to better document the code, here is a short description of each model:

conv_32x32x32_bin - latent size is 32x32x32 bits/patch (i.e. compressed size: 240KB)
conv_bin - latent size is 16x8x8 bits/patch (i.e. compressed size: 7.5KB)
conv_refl_pad_bin - same as above, only that reflection pad is used (as opposed to zero pad)
conv_512_bin - latent size is 16x16x16 bits/patch (i.e. compressed size: 30KB)

[1] https://arxiv.org/abs/1703.00395

The documentation and further work will be written in the repo's wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
results		results
.gitignore		.gitignore
README.md		README.md
arch.png		arch.png
image_folder.py		image_folder.py
launch_train.sh		launch_train.sh
smoothing.py		smoothing.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lossy image compression with compressive autoencoders

arch

results

About

Releases

Packages

Languages

zhiqiang-zhu/cae

Folders and files

Latest commit

History

Repository files navigation

lossy image compression with compressive autoencoders

arch

results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages