Skip to content

Latest commit

 

History

History
208 lines (170 loc) · 16.9 KB

architectures.md

File metadata and controls

208 lines (170 loc) · 16.9 KB

Supported architectures

Below are the network architectures and, in general, training strategies supported in this repository. Note that it is possible to combine ideas from all those below, from training strategy to even modifying the networks with components from different networks. They can serve as a baseline for your experiments as well.

Super-Resolution

  1. SRGAN (2017), that originally uses the SRResNet network, and introduced the idea of using a generative adversarial network (GAN) for image superresolution (SR). It was the first framework capable of inferring photo-realistic natural images for 4× with a loss function which consists of an adversarial loss (GAN), a feature loss (using a pretrained VGG classification network) and a content (pixel) loss.

  2. Enhanced SRGAN (2018). Enhanced SRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. Originally uses the RRDB network. ESRGAN remains until today (2021) as the base for many projects and research papers that continuing building upon it. For more details, please refer to ESRGAN repo.

  1. ESRGAN+ Repo (2020). A follow up paper that introduced two main changes to ESRGAN's RRDB network and can be enabled with the network options plus and gaussian.

  2. SFTGAN (2018). Adopts Spatial Feature Transform (SFT) to incorporate other conditions/priors, like semantic prior for image SR, representing by segmentation probability maps. For more details, please refer to SFTGAN repo.

  1. PPON (2019). The model and training strategy for "Progressive Perception-Oriented Network for Single Image Super-Resolution", which the authors compare favorably against ESRGAN. Training is done progressively, by freezing and unfreezing layers in phases, which are: Content Reconstruction, Structure Reconstruction and Perceptual Reconstruction. For more details, please refer to PPON repo.

  1. PAN (2020). Pixel Attention Network for Efficient Image Super-Resolution. Aims at designing a lightweight network for image super resolution (SR) that can potentially be used in real-time. More details in PAN repo.

  1. The Consistency Enforcing Module (CEM) module from Explorable-Super-Resolution (2020). Can be used to wrap any network (during training or testing) around a module that has no trainable parameters, but enforces results to be consistent with the LR images, instead of just the HR images as is the common case. More information on CEM here. Note that the rest of the explorable SR framework is TBD, but is available in the ESR repo.

  2. SRFlow (2020). Repo. Aims at fixing one common pitfall of other frameworks, in that the results of the models are deterministic. SRFlow proposes using a normalizing flow (based on GLOW) which allows the network to learn the conditional distribution of the output given the low-resolution input. It doesn't require the GAN formulation and can be trained using only the Negative Log Likelihood (NLL). In this repo, it has also been modified to use any of the regular losses on the deterministic version of the super-resolved image. Check how to train for more details.

In addition, since they are based on ESRGAN and don't modify the general training strategy or the network architecture, but only the data used for training, Real-SR (2020), BSRGAN (2021) and Real-ESRGAN (2021) are supported. Real-SR by means of the realistic kernels and noise injection from image patches and BSRGAN and Real-ESRGAN through the on the fly augmentations pipeline. More information in the augmentations document. These strategies can be combined with any of the networks above.

Image to image translation

  1. pix2pix (2017) Image-to-Image Translation with Conditional Adversarial Networks. Uses the conditional GANs formulation as a general-purpose solution to image-to-image translation problems when paired images are available, in a way that doesn't require hand-engineered mapping functions or losses. More information in how to train, the Pix2pix Pytorch repo and the project page.

  1. CycleGAN (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Different to previous approaches, CycleGAN was one of the first works to use an approach for learning to translate an image from a source domain A to a target domain B in the absence of paired examples. More information in how to train, the CycleGAN Pytorch repo and the project page.

  1. WBC (2020) Learning to Cartoonize Using White-box Cartoon Representations. Unlike the black-box strategies like Pix2pix and CycleGAN use, white-box cartoonization (WBC) is designed to use domain knowledge about how cartoons (anime) are made and decomposes the training task in image representations that correspond to the cartoon images workflow, each with different objectives. In general, the representations are: smooth surfaces (surface), sparse color blocks (structure) and contours and fine textures (texture). Like CycleGAN, it uses unpaired images and by tuning the scale of each representation, as well as the scale of the guided filter, different results can be obtained. More information in how to train. You can build your own datasets, but for reference the ones used by WBC are:
    • landscape photos: the photos for the style transfer CycleGAN dataset (6227).
    • landscape cartoon: frames extracted and cropped from Miyazaki Hayao (3617), Hosoda Mamoru (5107) and Shinkai Makoto (5891) films.
    • face photos: FFHQ photos (#00000-10000).
    • face cartoon: faces extracted from works by PA Works (5000) and Kyoto Animation (5000).

Video

Important: Video network training can be considered fully functional, but experimental, with an overhaul to the pipeline pending for now (Help welcomed).

Video Super-Resolution (VSR)

  1. SOFVSR (2020) Deep Video Super-Resolution using HR Optical Flow Estimation. Instead of the usual strategy of estimating optical flow for temporal consistency in the low-resolution domain, SOFVSR does so at the high-resolution level to prevent inconsistencies between low-resolution flows and high-resolution frames. This network has been modifified in this repo to also work with an ESRGAN network in the super-resolution step, as well as using 3 channel images as input, but requires more testing. More information in the SOFVSR repo.

  1. EVSRGAN Video ESRGAN and SR3D networks, inspired by the paper 3DSRnet: "Video Super-resolution using 3D Convolutional Neural Networks". EVSRGAN uses the regular ESRGAN network as backbone, but modifies it with 3D Convolutions to account for the time dimension, while SR3D more closely resembles the network proposed in 3DSRnet. Require more testing.

  1. EDVR (2019): Video Restoration with Enhanced Deformable Convolutional Networks. Uses deformable convolutions to align frames at a feature level, instead of explicitly estimating optical flow. More information in project page.

Frame Interpolation (FI)

  1. DVD (2017) Real-time Deep Video Deinterlacing, implemented for the specific case of efficient video de-interlacing.

  1. Initial integration of RIFE (2020). Combining all 3 separate model files in a single structure. RIFE repo. (Training not yet available, pending for video pipeline overhaul).

BibTex

@misc{traiNNer,
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/victorca25/traiNNer}}
}
@InProceedings{wang2018esrgan,
    author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},
    title = {ESRGAN: Enhanced super-resolution generative adversarial networks},
    booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},
    month = {September},
    year = {2018}
}
@InProceedings{wang2018sftgan,
    author = {Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
    title = {Recovering realistic texture in image super-resolution by deep spatial feature transform},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2018}
}
@article{Hui-PPON-2019,
    title={Progressive Perception-Oriented Network for Single Image Super-Resolution},
    author={Hui, Zheng and Li, Jie and Gao, Xinbo and Wang, Xiumei},
    booktitle={arXiv:1907.10399v1},
    year={2019}
}
@InProceedings{Liu2019abpn,
    author = {Liu, Zhi-Song and Wang, Li-Wen and Li, Chu-Tak and Siu, Wan-Chi},
    title = {Image Super-Resolution via Attention based Back Projection Networks},
    booktitle = {IEEE International Conference on Computer Vision Workshop(ICCVW)},
    month = {October},
    year = {2019}
}
@inproceedings{bahat2020explorable,
    title={Explorable Super Resolution},
    author={Bahat, Yuval and Michaeli, Tomer},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={2716--2725},
    year={2020}
}
@inproceedings{lugmayr2020srflow,
    title={SRFlow: Learning the Super-Resolution Space with Normalizing Flow},
    author={Lugmayr, Andreas and Danelljan, Martin and Van Gool, Luc and Timofte, Radu},
    booktitle={ECCV},
    year={2020}
}
@inproceedings{zhang2021designing,
    title={Designing a Practical Degradation Model for Deep Blind Image Super-Resolution},
    author={Zhang, Kai and Liang, Jingyun and Van Gool, Luc and Timofte, Radu},
    booktitle={arxiv},
    year={2021}
}
@Article{wang2021realesrgan,
    title={Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
    author={Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
    journal={arXiv:2107.10833},
    year={2021}
}
@inproceedings{CycleGAN2017,
    title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
    author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
    booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
    year={2017}
}
@inproceedings{isola2017image,
    title={Image-to-Image Translation with Conditional Adversarial Networks},
    author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
    booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
    year={2017}
}
@InProceedings{Wang_2020_CVPR,
    author = {Wang, Xinrui and Yu, Jinze},
    title = {Learning to Cartoonize Using White-Box Cartoon Representations},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2020}
}
@Article{Wang2020tip,
    author    = {Longguang Wang and Yulan Guo and Li Liu and Zaiping Lin and Xinpu Deng and Wei An},
    title     = {Deep Video Super-Resolution using {HR} Optical Flow Estimation},
    journal   = {{IEEE} Transactions on Image Processing},
    year      = {2020},
}
@InProceedings{wang2019edvr,
    author = {Wang, Xintao and Chan, Kelvin C.K. and Yu, Ke and Dong, Chao and Loy, Chen Change},
    title = {EDVR: Video Restoration with Enhanced Deformable Convolutional Networks},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2019}
}
@article{zhu2017real,
    title={Real-time Deep Video Deinterlacing},
    author={Zhu, Haichao and Liu, Xueting and Mao, Xiangyu and Wong, Tien-Tsin},
    journal={arXiv preprint arXiv:1708.00187},
    year={2017}
}
@article{huang2020rife,
    title={RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation},
    author={Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang},
    journal={arXiv preprint arXiv:2011.06294},
    year={2020}
}