Reading-List

Reading list on deep learning.

Basic Network and Techniques

AlexNet: MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. ⭐⭐⭐⭐⭐
Dropout: Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958. ⭐⭐⭐⭐
VGG: Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). ⭐⭐⭐⭐⭐
GoogLeNet: Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. ⭐⭐⭐⭐⭐
Batch Normalization: Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015). [Inception v2] ⭐⭐⭐⭐⭐
PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. ⭐⭐⭐⭐⭐
InceptionV3: Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
Identity ResNet: He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐⭐
CReLU: Shang, Wenling, et al. "Understanding and improving convolutional neural networks via concatenated rectified linear units." Proceedings of the International Conference on Machine Learning (ICML). 2016. ⭐⭐⭐
InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016). ⭐⭐⭐⭐
ResNeXt: Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016). ⭐⭐⭐⭐
Batch Renormalization: Ioffe, Sergey. "Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models." arXiv preprint arXiv:1702.03275 (2017). ⭐⭐⭐⭐
Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016). ⭐⭐⭐
MobileNets: Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). ⭐⭐⭐
DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016). ⭐⭐⭐⭐⭐
PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides ⭐⭐⭐⭐
IRNN: Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. "A simple way to initialize recurrent networks of rectified linear units." arXiv preprint arXiv:1504.00941 (2015). ⭐⭐⭐
ReNet: Visin, Francesco, et al. "ReNet: A recurrent neural network based alternative to convolutional networks." arXiv preprint arXiv:1505.00393 (2015). ⭐⭐⭐⭐
Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017). ⭐⭐⭐⭐
Group Normalization: Wu, Yuxin, and Kaiming He. "Group normalization." In ECCV (2018). ⭐⭐⭐⭐⭐
SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018). ⭐⭐⭐⭐⭐
Rethinking ImageNet Pre-training： He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018). ⭐⭐⭐⭐
CBAM： Woo, Sanghyun, et al. "CBAM: Convolutional block attention module." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐
Network generator: Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. Exploring Randomly Wired Neural Networks for Image Recognition. arXiv:1904.01569 (2019). ⭐⭐⭐⭐⭐
GCNet: Cao, Yue, et al. "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond." arXiv preprint arXiv:1904.11492 (2019). ⭐⭐⭐⭐
SqueezeNet: Forrest N. Iandola, etal. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. In ICLR, 2017. ⭐⭐⭐⭐
Dynamic Filter: Jia, Xu, et al. "Dynamic filter networks." Advances in neural information processing systems. 2016. ⭐⭐⭐⭐
CondConv: Yang, Brandon, et al. "Condconv: Conditionally parameterized convolutions for efficient inference." Advances in Neural Information Processing Systems. 2019. ⭐⭐⭐⭐
SimSiam: Chen, X., & He, K. (2020). Exploring Simple Siamese Representation Learning. In CVPR 2021. ⭐⭐⭐⭐
CycleMLP: Chen, S., Xie, E., Ge, C., Liang, D., & Luo, P. (2021). CycleMLP: A MLP-like Architecture for Dense Prediction. arXiv preprint arXiv:2107.10224. ⭐⭐⭐⭐
EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International Conference on Machine Learning. PMLR, 2019. ⭐⭐⭐⭐
ConvNeXt: Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545 ⭐⭐⭐⭐⭐
CoAtNet: Dai, Z., Liu, H., Le, Q., & Tan, M. (2021). CoAtNet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems, 34. ⭐⭐⭐⭐⭐
Large_Kernel: Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., & Sun, J. (2022). Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. arXiv preprint arXiv:2203.06717. ⭐⭐⭐⭐⭐
MPViT: Lee, Y., Kim, J., Willette, J., & Hwang, S. J. MPViT: Multi-Path Vision Transformer for Dense Prediction. In CVPR 2022. ⭐⭐⭐
Deformable Attention: Xia, Z., Pan, X., Song, S., Li, L. E., & Huang, G. (2022). Vision Transformer with Deformable Attention. arXiv preprint arXiv:2201.00520. ⭐⭐⭐⭐
EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019. ⭐⭐⭐⭐⭐
HaloNets: Vaswani, Ashish, et al. "Scaling local self-attention for parameter efficient visual backbones." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
SLaK: Liu S, Chen T, Chen X, et al. More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity[J]. arXiv preprint arXiv:2207.03620, 2022. ⭐⭐⭐⭐
MetaFormer: Yu, Weihao, et al. "Metaformer is actually what you need for vision." In CVPR. 2022. ⭐⭐⭐
Resnet strikes back: Wightman, R., Touvron, H., & Jégou, H. (2021). Resnet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476. ⭐⭐⭐
VSA: Zhang, Qiming, et al. "VSA: Learning Varied-Size Window Attention in Vision Transformers." arXiv preprint arXiv:2204.08446 (2022). ⭐⭐⭐⭐

Object Detection

Overfeat: Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013). ⭐⭐⭐⭐
RCNN: Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. ⭐⭐⭐⭐⭐
SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. ⭐⭐⭐⭐⭐
Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐⭐⭐
Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. ⭐⭐⭐⭐⭐
R-CNN minus R: Lenc, Karel, and Andrea Vedaldi. "R-cnn minus r." arXiv preprint arXiv:1506.06981 (2015). ⭐
End-to-end people detection in crowded scenes: Stewart, Russell, Mykhaylo Andriluka, and Andrew Y. Ng. "End-to-end people detection in crowded scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐
YOLO: Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
ION: Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
MultiPath: Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016). ⭐⭐⭐
SSD: Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐⭐
OHEM: Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
HyperNet: Kong, Tao, et al. "HyperNet: towards accurate region proposal generation and joint object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
SDP: Yang, Fan, Wongun Choi, and Yuanqing Lin. "Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
SubCNN: Xiang, Yu, et al. "Subcategory-aware convolutional neural networks for object proposals and detection." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017. ⭐⭐⭐
MSCNN: Cai, Zhaowei, et al. "A unified multi-scale deep convolutional neural network for fast object detection." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐
RFCN: Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems. 2016. ⭐⭐⭐⭐⭐
Shallow Network: Ashraf, Khalid, et al. "Shallow networks for high-accuracy road object-detection." arXiv preprint arXiv:1606.01561 (2016). ⭐⭐
Is Faster R-CNN Doing Well for Pedestrian Detection: Zhang, Liliang, et al. "Is Faster R-CNN Doing Well for Pedestrian Detection?." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐
GCNN: Najibi, Mahyar, Mohammad Rastegari, and Larry S. Davis. "G-cnn: an iterative grid based object detector." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
LocNet: Gidaris, Spyros, and Nikos Komodakis. "Locnet: Improving localization accuracy for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
PVANet: Kim, Kye-Hyeon, et al. "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection." arXiv preprint arXiv:1608.08021 (2016). ⭐⭐⭐⭐
FPN: Lin, Tsung-Yi, et al. "Feature Pyramid Networks for Object Detection." arXiv preprint arXiv:1612.03144 (2016). ⭐⭐⭐⭐⭐
TDM: Shrivastava, Abhinav, et al. "Beyond Skip Connections: Top-Down Modulation for Object Detection." arXiv preprint arXiv:1612.06851 (2016). ⭐⭐⭐⭐
YOLO9000: Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint arXiv:1612.08242 (2016). ⭐⭐⭐⭐
Speed/accuracy trade-offs for modern convolutional object detectors: Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016). ⭐⭐
GDB-Net: Zeng, Xingyu, et al. "Crafting GBD-Net for Object Detection." arXiv preprint arXiv:1610.02579 (2016). Slides ⭐⭐⭐⭐
WRInception: Lee, Youngwan, et al. "Wide-Residual-Inception Networks for Real-time Object Detection." arXiv preprint arXiv:1702.01243 (2017). ⭐
DSSD: Fu, Cheng-Yang, et al. "DSSD: Deconvolutional Single Shot Detector." arXiv preprint arXiv:1701.06659 (2017). ⭐⭐⭐⭐
A-Fast-RCNN (Hard positive generation): Wang, Xiaolong, Abhinav Shrivastava, and Abhinav Gupta. "A-fast-rcnn: Hard positive generation via adversary for object detection." arXiv preprint arXiv:1704.03414 (2017). ⭐⭐⭐ code
RRC: Ren, Jimmy, et al. "Accurate Single Stage Detector Using Recurrent Rolling Convolution." arXiv preprint arXiv:1704.05776 (2017). ⭐⭐⭐
Deformable ConvNets: Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017). ⭐⭐⭐⭐
RSSD: Jeong, Jisoo, Hyojin Park, and Nojun Kwak. "Enhancement of SSD by concatenating feature maps for object detection." arXiv preprint arXiv:1705.09587 (2017). ⭐⭐
Perceptual GAN: Li, Jianan, et al. "Perceptual Generative Adversarial Networks for Small Object Detection." arXiv preprint arXiv:1706.05274 (2017). ⭐⭐⭐
RetinaNet (Focal Loss): Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. "Focal Loss for Dense Object Detection." In ICCV. 2017. ⭐⭐⭐⭐⭐
YOLOv3: Redmon, Joseph, and Ali Farhadi. "YOLOv3: An Incremental Improvement." arXiv preprint arXiv:1804.02767 (2018). ⭐⭐⭐
Domain Adaptive Faster R-CNN: Chen, Yuhua, et al. "Domain adaptive faster r-cnn for object detection in the wild." In CVPR, 2018. ⭐⭐⭐⭐
OMNIA Faster R-CNN： Rame, Alexandre, et al. "OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation." arXiv preprint arXiv:1812.02611 (2018). [Omni-Supervised across different datasets for object detection] ⭐⭐⭐⭐
Libra R-CNN: Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019). Libra R-CNN: Towards Balanced Learning for Object Detection. arXiv preprint arXiv:1904.02701. ⭐⭐⭐⭐
FCOS: Tian, Zhi, et al. "FCOS: Fully Convolutional One-Stage Object Detection." arXiv preprint arXiv:1904.01355 (2019). ⭐⭐⭐⭐⭐
POTO: Prediction-aware OneTo-One (POTO) label assignment: Wang, J., Song, L., Li, Z., Sun, H., Sun, J., & Zheng, N. (2020). End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544. ⭐⭐⭐⭐
FaPN: FaPN: Feature-aligned Pyramid Network for Dense Image Prediction. Shihua Huang etal. 2021. arXiv preprint arXiv:2021.07058. ⭐⭐⭐
DETR: Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. End-to-end object detection with transformers. In ECCV, 2020. ⭐⭐⭐⭐⭐
Sparse R-CNN: Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., ... & Luo, P. Sparse r-cnn: End-to-end object detection with learnable proposals. In CVPR (pp. 14454-14463), 2021. ⭐⭐⭐⭐⭐

Semantic Segmentation

FCN: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. ⭐⭐⭐⭐⭐
Deconvolution Network for Segmentation: Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐
U-Net: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015. ⭐⭐⭐⭐⭐
CRF as RNN: Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." In ICCV. 2015. ⭐⭐⭐⭐
PSPNet: Zhao, Hengshuang, et al. "Pyramid scene parsing network." arXiv preprint arXiv:1612.01105 (2016). ⭐⭐⭐
Deeplab v1v2: Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2018): 834-848. ⭐⭐⭐⭐⭐
Deeplab v3: Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017). ⭐⭐⭐
Deeplab v3+: Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." arXiv preprint arXiv:1802.02611 (2018). ⭐⭐⭐
PSANet: Zhao, Hengshuang, et al. "PSANet: Point-wise Spatial Attention Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐ [good summary of context information]
OCNet: Yuan, Yuhui, and Jingdong Wang. "OCNet: Object Context Network for Scene Parsing." arXiv preprint arXiv:1809.00916 (2018). ⭐⭐⭐
ReSeg: Visin, Francesco, et al. "Reseg: A recurrent neural network-based model for semantic segmentation." In CVPR Workshops. 2016. ⭐⭐
CCNet: Huang, Zilong, et al. "CCNet: Criss-Cross Attention for Semantic Segmentation." arXiv preprint arXiv:1811.11721 (2018). ⭐⭐⭐
Depth-aware CNN: Wang, Weiyue, and Ulrich Neumann. "Depth-aware CNN for RGB-D Segmentation." In ECCV, 2018. ⭐⭐⭐⭐⭐
DFANet: Li, H., Xiong, P., Fan, H., & Sun, J. (2019). DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. arXiv preprint arXiv:1904.02216. ⭐⭐
DADA: Vu, Tuan-Hung, et al. "DADA: Depth-aware Domain Adaptation in Semantic Segmentation." arXiv preprint arXiv:1904.01886 (2019). ⭐⭐⭐⭐
CFNet： Zhang, Hang, et al. "Co-Occurrent Features in Semantic Segmentation." In CVPR, 2019. ⭐⭐⭐
PointRend Kirillov, A., Wu, Y., He, K., & Girshick, R. (2019). PointRend: Image Segmentation as Rendering. arXiv preprint arXiv:1912.08193. ⭐⭐⭐⭐
Trans2Seg: Xie, E., Wang, W., Wang, W., Sun, P., Xu, H., Liang, D., & Luo, P. (2021). Segmenting transparent object in the wild with transformer. arXiv preprint arXiv:2101.08461. ⭐⭐⭐⭐
Swin-Unet: Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2021). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv preprint arXiv:2105.05537. ⭐⭐⭐⭐
SegFormer: Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv preprint arXiv:2105.15203. ⭐⭐⭐⭐

Instance Segmentation

MNC: Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
InstanceFCN: Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." arXiv preprint arXiv:1603.08678 (2016). ⭐⭐⭐⭐
FCIS: Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." arXiv preprint arXiv:1611.07709 (2016). ⭐⭐⭐⭐⭐
Mask R-CNN: He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." In ICCV. 2017. ⭐⭐⭐⭐⭐
Learning to Segment Every Thing (Mask^X R-CNN): Hu, Ronghang, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. "Learning to Segment Every Thing." arXiv preprint arXiv:1711.10370 (2017). ⭐⭐⭐⭐⭐
PANet: Liu, Shu, et al. "Path aggregation network for instance segmentation." arXiv preprint arXiv:1803.01534 (2018). ⭐⭐⭐⭐
Panoptic Segmentation: Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2018). Panoptic Segmentation. arXiv preprint arXiv:1801.00868. ⭐⭐⭐⭐
Panoptic FPN: Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic Feature Pyramid Networks. arXiv preprint arXiv:1901.02446. ⭐⭐⭐⭐⭐
Mask Scoring R-CNN: Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask Scoring R-CNN. arXiv preprint arXiv:1903.00241. ⭐⭐⭐⭐
TensorMask： Chen, X., Girshick, R., He, K., & Dollár, P. (2019). TensorMask: A Foundation for Dense Object Segmentation. arXiv preprint arXiv:1903.12174. ⭐⭐⭐⭐
SSAP: Gao, Naiyu, et al. "SSAP: Single-shot instance segmentation with affinity pyramid." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐
EmbedMask: Ying, H., Huang, Z., Liu, S., Shao, T., & Zhou, K. (2019). EmbedMask: Embedding Coupling for One-stage Instance Segmentation. arXiv preprint arXiv:1912.01954. ⭐⭐⭐⭐⭐
CondInst Tian, Z., Shen, C., & Chen, H. (2020). Conditional Convolutions for Instance Segmentation. In ECCV 2020. ⭐⭐⭐⭐⭐
MaskFormer: Cheng, B., Schwing, A. G., & Kirillov, A. (2021). Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv preprint arXiv:2107.06278. ⭐⭐⭐⭐⭐ (Semantic+Instance)
SOLQ: Dong, B., Zeng, F., Wang, T., Zhang, X., & Wei, Y. (2021). SOLQ: Segmenting Objects by Learning Queries. arXiv preprint arXiv:2106.02351. ⭐⭐⭐
QueryInst： Yang, S., Fang, Y., Wang, X., Li, Y., Shan, Y., Feng, B., & Liu, W. (2021). Tracking Instances as Queries. arXiv preprint arXiv:2106.11963. ⭐⭐⭐⭐⭐
ISTR： Hu, J., Cao, L., Lu, Y., Zhang, S., Wang, Y., Li, K., ... & Ji, R. (2021). ISTR: End-to-End Instance Segmentation with Transformers. arXiv preprint arXiv:2105.00637. ⭐⭐⭐

Weakly Supervised

Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning: Cinbis, Ramazan Gokberk, Jakob Verbeek, and Cordelia Schmid. "Weakly supervised object localization with multi-fold multiple instance learning." IEEE transactions on pattern analysis and machine intelligence 39.1 (2017): 189-203. ⭐⭐⭐
Weakly Supervised Deep Detection Networks: Bilen, Hakan, and Andrea Vedaldi. "Weakly supervised deep detection networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
Weakly- and Semi-Supervised Learning: Papandreou, George, et al. "Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐⭐
Image-level to pixel-level labeling: Pinheiro, Pedro O., and Ronan Collobert. "From image-level to pixel-level labeling with convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Weakly Supervised Localization using Deep Feature Maps: Bency, Archith J., et al. "Weakly supervised localization using deep feature maps." arXiv preprint arXiv:1603.00489 (2016).
WELDON: Durand, Thibaut, Nicolas Thome, and Matthieu Cord. "Weldon: Weakly supervised learning of deep convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
WILDCAT: Durand, Thibaut, et al. "WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
SGDL: Lai, Baisheng, and Xiaojin Gong. "Saliency guided dictionary learning for weakly-supervised image parsing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Unsupervised/Self-supervised

Learning Features by Watching Objects Move: Pathak, Deepak, et al. "Learning Features by Watching Objects Move." arXiv preprint arXiv:1612.06370 (2016). ⭐⭐⭐⭐⭐
SimGAN: Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016). ⭐⭐⭐
OPN: Lee, Hsin-Ying, et al. "Unsupervised Representation Learning by Sorting Sequences." arXiv preprint arXiv:1708.01246 (2017). ⭐⭐⭐
Transitive Invariance for Self-supervised Visual Representation Learning: Wang, Xiaolong, et al. "Transitive Invariance for Self-supervised Visual Representation Learning" Proceedings of the IEEE International Conference on Computer Vision. 2017. ⭐⭐⭐ code
Omni-Supervised Learning: Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. Data Distillation: Towards Omni-Supervised Learning. In CVPR, 2018. ⭐⭐⭐⭐⭐
MAE: He, Kaiming, et al. "Masked autoencoders are scalable vision learners." In CVPR 2022. ⭐⭐⭐⭐⭐
SimMIM Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., ... & Hu, H. (2021). Simmim: A simple framework for masked image modeling. arXiv preprint arXiv:2111.09886 ⭐⭐⭐⭐⭐
ConvMAE: Gao, P., Ma, T., Li, H., Dai, J., & Qiao, Y. (2022). ConvMAE: Masked Convolution Meets Masked Autoencoders. arXiv preprint arXiv:2205.03892. ⭐⭐⭐

Semi-supervised

Adversarial Self-Supervised Learning: Si, C., Nie, X., Wang, W., Wang, L., Tan, T., & Feng, J. (2020). Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition. In ECCV 2020. ⭐⭐⭐
Directional Context-Aware Consistency: Lai, Xin, et al. "Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Cross Pseudo Supervision: Chen, X., Yuan, Y., Zeng, G., & Wang, J. (2021). Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2613-2622). ⭐⭐⭐⭐
CutMix: French, G., Laine, S., Aila, T., & Mackiewicz, M. (2019). Semi-supervised semantic segmentation needs strong, varied perturbations. In BMCV. ⭐⭐⭐
CGT： Ke, Z., Qiu, D., Li, K., Yan, Q., & Lau, R. W. (2020). Guided collaborative training for pixel-wise semi-supervised learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16 (pp. 429-445). Springer International Publishing. ⭐⭐⭐⭐
Robust Mutual Learning: Zhang, P., Zhang, B., Zhang, T., Chen, D., & Wen, F. (2021). Robust Mutual Learning for Semi-supervised Semantic Segmentation. arXiv preprint arXiv:2106.00609. ⭐⭐⭐⭐

Domain Adaptation

Learning from Synthetic Animals: Mu, J., Qiu, W., Hager, G. D., & Yuille, A. L. (2020). Learning from Synthetic Animals. In CVPR 2020. ⭐⭐⭐⭐
CD3A: Kurmi, Vinod Kumar, et al. "Curriculum based dropout discriminator for domain adaptation." arXiv preprint arXiv:1907.10628 (2019). ⭐⭐⭐
Open compound domain adaptation: Liu, Z., Miao, Z., Pan, X., Zhan, X., Lin, D., Yu, S. X., & Gong, B. (2020). Open compound domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12406-12415). ⭐⭐⭐⭐⭐

Domain Generalization

Extrinsic and Intrinsic: Wang, Shujun, et al. "Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization." In ECCV (2020). ⭐⭐⭐⭐
DoFE: Wang, Shujun, et al. "DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets." IEEE Transactions on Medical Imaging (2020). ⭐⭐⭐⭐
Self-Challenging: Huang, Zeyi, et al. "Self-Challenging Improves Cross-Domain Generalization." arXiv preprint arXiv:2007.02454 (2020). ⭐⭐⭐⭐
Generate Novel Domains: Zhou, Kaiyang, et al. "Learning to Generate Novel Domains for Domain Generalization." arXiv preprint arXiv:2007.03304 (2020). ⭐⭐⭐
Jigsaw puzzles: Carlucci, Fabio M., et al. "Domain generalization by solving jigsaw puzzles." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐

Video

Semi-suepervised, memory network: Oh, S. W., Lee, J. Y., Xu, N., & Kim, S. J. (2019). Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9226-9235). ⭐⭐⭐⭐

Saliency

DHSNet: Liu, Nian, and Junwei Han. "Dhsnet: Deep hierarchical saliency network for salient object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
RFCN: Wang, Linzhao, et al. "Saliency detection with recurrent fully convolutional networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐
RACDNN: Kuen, Jason, Zhenhua Wang, and Gang Wang. "Recurrent attentional networks for saliency detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
NLDF: Luo, Zhiming, et al. "Non-Local Deep Features for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. ⭐⭐⭐
DSS: Hou, Qibin, et al. "Deeply supervised salient object detection with short connections." arXiv preprint arXiv:1611.04849 (2016). ⭐⭐⭐⭐
MSRNet: Li, Guanbin, et al. "Instance-Level Salient Object Segmentation." arXiv preprint arXiv:1704.03604 (2017). ⭐⭐⭐⭐
Amulet: Zhang, Pingping, et al. "Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection." arXiv preprint arXiv:1708.02001 (2017). ⭐⭐⭐⭐
UCF: Zhang, Pingping, et al. "Learning Uncertain Convolutional Features for Accurate Saliency Detection." arXiv preprint arXiv:1708.02031 (2017). ⭐⭐⭐⭐
SRM: Wang, Tiantian, et al. "A Stagewise Refinement Model for Detecting Salient Objects in Images." In ICCV. 2017. ⭐⭐⭐⭐
S4Net: Fan, Ruochen, et al. "$ S^ 4$ Net: Single Stage Salient-Instance Segmentation." arXiv preprint arXiv:1711.07618 (2017). ⭐⭐⭐⭐⭐
Deep Edge-Aware Saliency Detection： Zhang, Jing, Yuchao Dai, Fatih Porikli, and Mingyi He. "Deep Edge-Aware Saliency Detection." arXiv preprint arXiv:1708.04366 (2017). ⭐⭐⭐
Bi-Directional Message Passing Model: Zhang, Lu, et al. "A Bi-Directional Message Passing Model for Salient Object Detection." In CVPR. 2018. ⭐⭐⭐
PiCANet: Liu, Nian, Junwei Han, and Ming-Hsuan Yang. "PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection." In CVPR. 2018. ⭐⭐⭐⭐⭐
Detect Globally, Refine Locally: A Novel Approach to Saliency Detection: Wang, Tiantian, et al. "Detect Globally, Refine Locally: A Novel Approach to Saliency Detection." In CVPR. 2018. ⭐⭐⭐
PAGRN： Zhang, Xiaoning, et al. "Progressive Attention Guided Recurrent Network for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. ⭐⭐⭐
Reverse Attention for Salient Object Detection: Chen, Shuhan, et al. "Reverse Attention for Salient Object Detection." In ECCV, 2018. ⭐⭐
CA-Fuse: Chen, Hao, and Youfu Li. "Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection." In CVPR. 2018. ⭐⭐⭐
SOC dataset: Fan, Deng-Ping, et al. "Salient objects in clutter: Bringing salient object detection to the foreground." In ECCV. 2018. ⭐⭐⭐⭐⭐ [complex dataset + instance level]
DNA: Liu, Yun, et al. "DNA: Deeply-supervised Nonlinear Aggregation for Salient Object Detection." arXiv preprint arXiv:1903.12476 (2019). ⭐⭐⭐
SE2Net： Zhou, S., Wang, J., Wang, F., & Huang, D. SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection. ⭐⭐⭐⭐⭐
PFAN: Zhao, T., & Wu, X. (2019). Pyramid Feature Selective Network for Saliency detection. In CVPR 2019. ⭐⭐
PoolNet: Liu, Jiang-Jiang, et al. "A Simple Pooling-Based Design for Real-Time Salient Object Detection." In CVPR 2019. ⭐⭐⭐⭐

Attention

SRN: Zhu, Feng, et al. "Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification." arXiv preprint arXiv:1702.05891 (2017). ⭐⭐⭐⭐
Zoom-in-Net: Wang, Zhe, et al. "Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection." arXiv preprint arXiv:1706.04372 (2017). ⭐⭐⭐⭐
Multi-context attention: Chu, Xiao, et al. "Multi-context attention for human pose estimation." arXiv preprint arXiv:1702.07432 (2017). ⭐⭐⭐

Depth Information and Stereo Vision

HFM-Net: Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep Surface Normal Estimation with Hierarchical RGB-D Fusion. arXiv preprint arXiv:1904.03405. ⭐⭐⭐
MADNet: Tonioni, Alessio, et al. "Real-time self-adaptive deep stereo." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐ (offline domain adaption)
Geometry-Aware Distillation: Jiao, Jianbo, et al. "Geometry-Aware Distillation for Indoor Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
DiverseDepth: Yin, W., Wang, X., Shen, C., Liu, Y., Tian, Z., Xu, S., ... & Renyin, D. (2020). DiverseDepth: Affine-invariant depth prediction using diverse data. arXiv preprint arXiv:2002.00569. ⭐⭐⭐⭐

Shadow Detection/Removal

DeshadowNet: Qu, Liangqiong, et al. "DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. ⭐⭐⭐
scGAN: Nguyen, Vu, et al. "Shadow Detection with Conditional Generative Adversarial Networks." In ICCV. 2017. ⭐⭐
Patched CNN: Hosseinzadeh, Sepideh, Moein Shakeri, and Hong Zhang. "Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network." arXiv preprint arXiv:1709.09283 (2017). ⭐
ST-CGAN: Wang, Jifeng, et al. "Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal." arXiv preprint arXiv:1712.02478 (2017). ⭐⭐ (ISTD dataset)
A+D Net: Le, Hieu, et al. "A+ D net: Training a shadow detector with adversarial shadow attenuation." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐
Lazy annotation for immature SBU： Vicente, Yago, et al. "Noisy label recovery for shadow detection in unfamiliar domains." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
StackedCNN + SBU: Vicente, Tomás F. Yago, et al. "Large-scale training of shadow detectors with noisily-annotated shadow examples." European Conference on Computer Vision. Springer, Cham, 2016. ⭐⭐⭐⭐ (SBU dataset)
CPAdv-Net: Mohajerani, Sorour, and Parvaneh Saeedi. "Shadow Detection in Single RGB Images Using a Context Preserver Convolutional Neural Network Trained by Multiple Adversarial Examples." IEEE Transactions on Image Processing (2019). ⭐⭐
Color Constancy: Sidorov, Oleksii. "Conditional GANs for Multi-Illuminant Color Constancy: Revolution or Yet Another Approach?." CVPR workshop, 2019. ⭐⭐
DSDNet: Zheng, Quanlong, et al. "Distraction-aware Shadow Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
ARGAN: Ding, Bin, et al. "ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal." In ICCV, (2019). ⭐⭐⭐
SP+M-Net: Le, H., & Samaras, D. (2019). Shadow removal via shadow image decomposition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8578-8587). ⭐⭐⭐⭐
Portrait Shadow Manipulation: Zhang, Xuaner Cecilia, et al. "Portrait Shadow Manipulation." In SIGGRAPH (2020). ⭐⭐⭐⭐⭐
Weakly-supervised shadow decomposition: Le, Hieu, and Dimitris Samaras. "From Shadow Segmentation to Shadow Removal." arXiv preprint arXiv:2008.00267 (2020). ⭐⭐⭐⭐⭐ (Video Shadow Removal Dataset)
AEF: Fu, Lan, et al. "Auto-Exposure Fusion for Single-Image Shadow Removal." CVPR 2021. ⭐⭐⭐
G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." arXiv preprint arXiv:2103.12997 (2021). ⭐⭐⭐⭐⭐
Video Shadow: Chen, Z., Wan, L., Zhu, L., Shen, J., Fu, H., Liu, W., & Qin, J. (2021). Triple-cooperative Video Shadow Detection. In CVPR 2021. ⭐⭐⭐⭐
Removing Objects and their Shadows: Zhang, E., Martin-Brualla, R., Kontkanen, J., & Curless, B. L. (2021). No Shadow Left Behind: Removing Objects and their Shadows using Approximate Lighting and Geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16397-16406). ⭐⭐⭐⭐⭐
Shadow Generation + DE-SOBA dataset: Hong, Y., Niu, L., Zhang, J., & Zhang, L. (2021). Shadow Generation for Composite Image in Real-world Scenes. arXiv preprint arXiv:2104.10338. ⭐⭐⭐⭐
G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Temporal Feature Warping: Hu, S., Le, H., & Samaras, D. (2021). Temporal Feature Warping for Video Shadow Detection. arXiv preprint arXiv:2107.14287. ⭐⭐⭐⭐
CANet: Chen, Z., Long, C., Zhang, L., & Xiao, C. (2021). CANet: A Context-Aware Network for Shadow Removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4743-4752). ⭐⭐⭐⭐⭐
FDRNet: Zhu, L., Xu, K., Ke, Z., & Lau, R. W. (2021). Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4702-4711). ⭐⭐⭐⭐⭐
SADC: Xu, Yimin, et al. "Shadow-Aware Dynamic Convolution for Shadow Removal." arXiv preprint arXiv:2205.04908 (2022). ⭐⭐

Image Restoration

DRRN: Tai, Ying, Jian Yang, and Xiaoming Liu. "Image super-resolution via deep recursive residual network." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. ⭐⭐⭐⭐
DID-MDN: Zhang, He, and Vishal M. Patel. "Density-aware Single Image De-raining using a Multi-stream Dense Network." arXiv preprint arXiv:1802.07412 (2018). ⭐⭐
IDN: Hui, Zheng, Xiumei Wang, and Xinbo Gao. "Fast and Accurate Single Image Super-Resolution via Information Distillation Network." In CVPR. 2018. ⭐⭐⭐
SFT-GAN: Wang, X., Yu, K., Dong, C., & Loy, C. C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR. 2018. ⭐⭐⭐
Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring：Nah, Seungjun, Tae Hyun Kim, and Kyoung Mu Lee. "Deep multi-scale convolutional neural network for dynamic scene deblurring." In CVPR, 2017. ⭐⭐⭐
Enhanced Deep Residual Networks for Single Image Super-Resolution: Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." The CVPR workshops, 2017. ⭐
AGAN for Raindrop Removal: Qian, Rui, et al. "Attentive Generative Adversarial Network for Raindrop Removal from A Single Image." In CVPR. 2018. ⭐⭐⭐⭐⭐
DCPDN: Zhang, He, and Vishal M. Patel. "Densely connected pyramid dehazing network." In CVPR, 2018. ⭐⭐⭐
GFN: Ren, W., Ma, L., Zhang, J., Pan, J., Cao, X., Liu, W., & Yang, M. H. (2018). Gated fusion network for single image dehazing. In CVPR, 2018. ⭐⭐⭐⭐
SIDCGAN: Li, Runde, et al. "Single Image Dehazing via Conditional Generative Adversarial Network." In CVPR, 2018. ⭐⭐
Dehaze Benchmark: Li, Boyi, et al. "Benchmarking Single Image Dehazing and Beyond." IEEE Transactions on Image Processing (2018). ⭐⭐⭐⭐⭐
Cityscapes + Haze: Sakaridis, Christos, Dengxin Dai, and Luc Van Gool. "Semantic foggy scene understanding with synthetic data." International Journal of Computer Vision (2018): 1-20. ⭐⭐⭐⭐⭐
RESCAN: Li, Xia, et al. "Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining." European Conference on Computer Vision. Springer, Cham, 2018. ⭐⭐⭐
UD-GAN: Jin, Xin, et al. "Unsupervised Single Image Deraining with Self-supervised Constraints." arXiv preprint arXiv:1811.08575 (2018). ⭐⭐⭐⭐⭐
Deep Tree-Structured Fusion Model: Fu, Xueyang, et al. "A Deep Tree-Structured Fusion Model for Single Image Deraining." arXiv preprint arXiv:1811.08632 (2018). ⭐⭐
Dual CNN: Pan, J., Liu, S., Sun, D., Zhang, J., Liu, Y., Ren, J., ... & Yang, M. H. Learning Dual Convolutional Neural Networks for Low-Level Vision. In CVPR, 2018 (pp. 3070-3079). ⭐⭐⭐
RAM: Kim, Jun-Hyuk, et al. "RAM: Residual Attention Module for Single Image Super-Resolution." arXiv preprint arXiv:1811.12043 (2018). ⭐⭐⭐
DNSR (Bi-cycle GAN): Zhao, Tianyu, et al. "Unsupervised Degradation Learning for Single Image Super-Resolution." arXiv preprint arXiv:1812.04240 (2018). ⭐⭐⭐⭐⭐
Cycle-Defog2Refog：Liu, Wei, et al. "End-to-End Single Image Fog Removal using Enhanced Cycle Consistent Adversarial Networks." arXiv preprint arXiv:1902.01374 (2019). ⭐⭐
SPANet： Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, Rynson W.H. Lau. "Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset." In CVPR 2019. ⭐⭐⭐⭐
remove rain streaks and rain accumulation： Ruoteng Li, Loong-Fah Cheong, and Robby T. Tan. "Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning." In CVPR 2019. ⭐⭐⭐⭐⭐
Rain O’er Me: Huangxing Lin, Yanlong Li, Xinghao Ding, Weihong Zeng, Yue Huang, John Paisley: "Rain O’er Me: Synthesizing real rain to derain with data distillation." arXiv preprint arXiv:1904.04605 (2019). ⭐⭐⭐⭐
RNAN: Zhang, Y., Li, K., Li, K., Zhong, B., & Fu, Y. (2019). Residual Non-local Attention Networks for Image Restoration. arXiv preprint arXiv:1903.10082. ⭐⭐⭐⭐⭐
Perceptual GAN loss + TV loss： Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 4681-4690).(code) ⭐⭐⭐⭐⭐
PReNet: Ren, Dongwei, et al. "Progressive Image Deraining Networks: A Better and Simpler Baseline." In CVPR, 2019. ⭐⭐⭐
Zoom to Learn, Learn to Zoom: Zhang, Xuaner, et al. "Zoom to Learn, Learn to Zoom." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
Derain Beachmark: Li, Siyuan, et al. "Single image deraining: A comprehensive benchmark analysis." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
Dual residual block: Liu, Xing, et al. "Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
Semi-supervised Transfer Learning for Image Rain Removal: Wei, Wei, et al. "Semi-Supervised Transfer Learning for Image Rain Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
UMRL： Yasarla, Rajeev, and Vishal M. Patel. "Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining." CVPR 2019. ⭐⭐⭐⭐
NASNet: Qin, Xu, and Zhilin Wang. "NASNet: A Neuron Attention Stage-by-Stage Net for Single Image Deraining." arXiv preprint arXiv:1912.03151 (2019). ⭐⭐⭐⭐
DerainCycleGAN: Wei, Yanyan, et al. "DerainCycleGAN: An Attention-guided Unsupervised Benchmark for Single Image Deraining and Rainmaking." arXiv preprint arXiv:1912.07015 (2019). ⭐⭐⭐⭐
Physics-Based Rain Rendering: HALDER, Shirsendu Sukanta; LALONDE, Jean-François; CHARETTE, Raoul de. Physics-Based Rendering for Improving Robustness to Rain. In: ICCV, 2019. pp. 10203-10212. ⭐⭐⭐⭐⭐
Partial Convolution (mask-guided): Liu, Guilin, et al. "Image inpainting for irregular holes using partial convolutions." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐⭐
Derain Survey: Wang, H., Li, M., Wu, Y., Zhao, Q., & Meng, D. (2019). A Survey on Rain Removal from Video and Single Image. arXiv preprint arXiv:1909.08326. ⭐⭐⭐⭐
Deep Adversarial Decomposition: Zou, Zhengxia, et al. "Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. ⭐⭐⭐⭐
CARN: Ahn, Namhyuk, Byungkon Kang, and Kyung-Ah Sohn. "Fast, accurate, and lightweight super-resolution with cascading residual network." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐
Semi-supervised derain with Gaussian processes: Yasarla, Rajeev, Vishwanath A. Sindagi, and Vishal M. Patel. "Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes." In CVPR. 2020. ⭐⭐⭐⭐
EPDN: Qu, Yanyun, et al. "Enhanced pix2pix dehazing network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
PEPSI: Shin, Yong-Goo, et al. "PEPSI++: Fast and lightweight network for image inpainting." IEEE Transactions on Neural Networks and Learning Systems (2020). ⭐⭐⭐
holistic attention network: Niu, Ben, et al. "Single image super-resolution via a holistic attention network." European Conference on Computer Vision. Springer, Cham, 2020. ⭐⭐⭐
SNet, VNet, and ANet： Wang, Yinglong, et al. "Rethinking image deraining via rain streaks and vapors." European Conference on Computer Vision. Springer, Cham, 2020. ⭐⭐⭐
JRGR (Disentangled): Ye, Y., Chang, Y., Zhou, H., & Yan, L. (2021). Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation. In CVPR 2021. ⭐⭐⭐⭐⭐
ACER-Net: Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., ... & Ma, L. (2021). Contrastive Learning for Compact Single Image Dehazing. In CVPR 2021. ⭐⭐⭐
MPRNet: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M. H., & Shao, L. (2021). Multi-stage progressive image restoration. arXiv preprint arXiv:2102.02808.
AdderSR: Song, D., Wang, Y., Chen, H., Xu, C., Xu, C., & Tao, D. (2020). AdderSR: Towards energy efficient image super-resolution. In CVPR 2021. ⭐⭐⭐⭐
RICNet： Ni, Siqi, et al. "Controlling the Rain: From Removal to Rendering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐
Video rain streaks+fog: Yan, W., Tan, R. T., Yang, W., & Dai, D. (2021). Self-Aligned Video Deraining With Transmission-Depth Consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11966-11976). ⭐⭐⭐⭐
Uformer: Wang, Z., Cun, X., Bao, J., & Liu, J. (2021). Uformer: A General U-Shaped Transformer for Image Restoration. arXiv preprint arXiv:2106.03106. ⭐⭐⭐
Real Video Dehaze Data: Zhang, Xinyi, et al. "Learning To Restore Hazy Video: A New Real-World Dataset and a New Method." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Hybrid Local-Global Transformer: Zhao, D., Li, J., Li, H., & Xu, L. (2021). Hybrid Local-Global Transformer for Image Dehazing. arXiv preprint arXiv:2109.07100. ⭐⭐⭐⭐
Restormer: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Restormer: Efficient Transformer for High-Resolution Image Restoration. arXiv preprint arXiv:2111.09881. ⭐⭐⭐⭐
MAXIM: Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., & Li, Y. (2022). MAXIM: Multi-Axis MLP for Image Processing. arXiv preprint arXiv:2201.02973. ⭐⭐⭐
NAFNet: Chen, L., Chu, X., Zhang, X., & Sun, J. (2022). Simple Baselines for Image Restoration. arXiv preprint arXiv:2204.04676. ⭐⭐⭐⭐⭐
KCKE: Chen, Wei-Ting, et al. "Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model." CVPR. 2022. ⭐⭐⭐⭐

Nighttime & Low-light

dehaze + nighttime: Yan, Wending, Robby T. Tan, and Dengxin Dai. "Nighttime defogging using high-low frequency decomposition and grayscale-color networks." In ECCV, 2020. ⭐⭐⭐⭐⭐
Nighttime Visibility Enhancement: Sharma, Aashish, and Robby T. Tan. "Nighttime Visibility Enhancement by Increasing the Dynamic Range and Suppression of Light Effects." In CVPR. 2021. ⭐⭐⭐⭐

Image Synthesis

Let there be Color!: Iizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. "Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification." ACM Transactions on Graphics (TOG) 35.4 (2016): 110. ⭐⭐⭐⭐⭐
Colorful Image Colorization: Zhang, Richard, Phillip Isola, and Alexei A. Efros. "Colorful image colorization." European Conference on Computer Vision. Springer, Cham, 2016. ⭐⭐⭐⭐
Neural Style: Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015). ⭐⭐⭐⭐⭐
Texture Synthesis: Gatys, Leon, Alexander S. Ecker, and Matthias Bethge. "Texture synthesis using convolutional neural networks." Advances in Neural Information Processing Systems. 2015. ⭐⭐⭐⭐
Semantic Annotation Artwork: Champandard, Alex J. "Semantic style transfer and turning two-bit doodles into fine artworks." arXiv preprint arXiv:1603.01768 (2016). ⭐⭐⭐
MRC+CNN Image Synthesis: Li, Chuan, and Michael Wand. "Combining markov random fields and convolutional neural networks for image synthesis." In CVPR. 2016. ⭐⭐⭐⭐
More Experiments on Neural Style: Novak, Roman, and Yaroslav Nikulin. "Improving the neural algorithm of artistic style." arXiv preprint arXiv:1605.04603 (2016). ⭐⭐
Deep Photo Style Transfer: Luan, Fujun, et al. "Deep photo style transfer." In CVPR. 2017. ⭐⭐⭐⭐⭐
Pretraining is All You Need + Diffusion: Wang, Tengfei, et al. "Pretraining is All You Need for Image-to-Image Translation." arXiv preprint arXiv:2205.12952 (2022). ⭐⭐⭐

Computational Photography

Multi-Illumination Dataset: Murmann, Lukas, et al. "A Dataset of Multi-Illumination Images in the Wild." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐⭐⭐
WESPE: Ignatov, Andrey, et al. "WESPE: weakly supervised photo enhancer for digital cameras." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018. ⭐⭐⭐
Zurich RAW to RGB dataset + PyNet: Ignatov, Andrey, Luc Van Gool, and Radu Timofte. "Replacing Mobile Camera ISP with a Single Deep Learning Model." arXiv preprint arXiv:2002.05509 (2020). ⭐⭐⭐⭐

GAN

GAN: Goodfellow, Ian, et al. "Generative adversarial nets." In NIPS. 2014. ⭐⭐⭐⭐⭐
cGAN: Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014). ⭐⭐⭐⭐⭐
Image-to-Image Translation with Conditional Adversarial Networks: Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint (2017). ⭐⭐⭐⭐⭐
cycleGAN：Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." arXiv preprint (2017). ⭐⭐⭐⭐⭐
StartGAN: Choi, Yunjey, et al. "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation." In CVPR 2018. ⭐⭐⭐⭐
E-GAN: Wang, C., Xu, C., Yao, X., & Tao, D. (2018). Evolutionary Generative Adversarial Networks. arXiv preprint arXiv:1803.00657. ⭐⭐⭐⭐
DCGAN: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). ⭐⭐⭐⭐
GANtruth： Bujwid, Sebastian, et al. "GANtruth-an unpaired image-to-image translation method for driving scenarios." arXiv preprint arXiv:1812.01710 (2018). ⭐⭐⭐
AttentionGAN: Tang, H., Liu, H., Xu, D., Torr, P. H.S., & Sebe, N. (2019). AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks. arXiv preprint arXiv:1911.11897. ⭐⭐⭐⭐
Multiclass Sketch-to-Image Translation： Ghosh, A., Zhang, R., Dokania, P. K., Wang, O., Efros, A. A., Torr, P. H.S., & Shechtman, E. (2019). Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1171-1180). ⭐⭐⭐
RealnessGAN: Yuanbo Xiangli, etal. Real or not real, that is a question. In ICLR 2020. ⭐⭐⭐⭐
Domain-bridged GAN: Pizzati, Fabio, et al. "Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation." The IEEE Winter Conference on Applications of Computer Vision. 2020. ⭐⭐⭐⭐
SinGAN: Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4570-4580). ⭐⭐⭐⭐⭐
CUT: Park, T., Efros, A. A., Zhang, R., & Zhu, J. Y. (2020, August). Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision (pp. 319-345). Springer, Cham. ⭐⭐⭐⭐⭐

Disentangled

Deblur+Disentangled: Lu, Boyu, Jun-Cheng Chen, and Rama Chellappa. "Unsupervised domain-specific deblurring via disentangled representations." In CVPR. 2019. ⭐⭐⭐⭐⭐
One-Shot Unsupervised Image Translation: Cohen, Tomer, and Lior Wolf. "Bidirectional One-Shot Unsupervised Domain Mapping." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐⭐

AR/VR

Indoor Lighting Estimation: Garon, Mathieu, et al. "Fast Spatially-Varying Indoor Lighting Estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐

Person Re-ID

IANet: Hou, Ruibing, et al. "Interaction-And-Aggregation Network for Person Re-Identification." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐⭐
AlignedReID: Zhang, Xuan, et al. "AlignedReID: Surpassing human-level performance in person re-identification." arXiv preprint arXiv:1711.08184 (2017). ⭐⭐⭐⭐⭐

Distillation

Knowledge Distillation: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. ⭐⭐⭐⭐⭐
Deep Mutual Learning: Zhang, Ying, et al. "Deep mutual learning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. ⭐⭐⭐⭐⭐
Cooperative learning: Batra, Tanmay, and Devi Parikh. "Cooperative learning with visual attributes." arXiv preprint arXiv:1705.05512 (2017). ⭐⭐⭐
Deeply-supervised Knowledge Synergy: Sun, D., Yao, A., Zhou, A., & Zhao, H. (2019). Deeply-supervised Knowledge Synergy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6997-7006). ⭐⭐⭐⭐⭐
ONE: Lan, Xu, Xiatian Zhu, and Shaogang Gong. "Knowledge distillation by On-the-fly Native Ensemble." Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2018. ⭐⭐⭐⭐⭐
Segmentation Distillation: Liu, Yifan, et al. "Structured Knowledge Distillation for Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐

Uncertainty

aleatoric uncertainty and epistemic uncertainty: Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." Advances in neural information processing systems. 2017. ⭐⭐⭐⭐⭐
Learning Model Confidence： Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez. "Addressing Failure Prediction by Learning Model Confidence" NeurIPS, 2019. ⭐⭐⭐⭐

Transformer

Transformer: Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017). ⭐⭐⭐⭐⭐
Pre-trained image processing transformer: Chen, Hanting, et al. "Pre-trained image processing transformer." arXiv preprint arXiv:2012.00364 (2020). ⭐⭐⭐⭐
texture transformer for Super-resolution: Yang, Fuzhi, et al. "Learning texture transformer network for image super-resolution." In CVPR, 2020. ⭐⭐⭐⭐
TransUnet: Chen, Jieneng, et al. "TransUnet: Transformers make strong encoders for medical image segmentation." arXiv preprint arXiv:2102.04306 (2021). ⭐⭐⭐⭐
Swin transformer: Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. ⭐⭐⭐⭐⭐
VOLO: Yuan, Li, et al. "VOLO: Vision Outlooker for Visual Recognition." arXiv preprint arXiv:2106.13112 (2021). ⭐⭐⭐⭐⭐
Video Swin Transformer: Liu, Ze, et al. "Video Swin Transformer." arXiv preprint arXiv:2106.13230 (2021). ⭐⭐⭐
Focal Transformer:Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., & Gao, J. (2021). Focal Self-attention for Local-Global Interactions in Vision Transformers. arXiv preprint arXiv:2107.00641. ⭐⭐⭐⭐⭐
Pyramid vision transformer: Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122. ⭐⭐⭐⭐
Pyramid vision transformer V2： Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). PVTv2: Improved Baselines with Pyramid Vision Transformer. arXiv preprint arXiv:2106.13797. ⭐⭐⭐⭐
Swin Transformer V2: Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2021). Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv preprint arXiv:2111.09883. ⭐⭐⭐⭐
DeiT: Touvron, Hugo, et al. "Training data-efficient image transformers & distillation through attention." International Conference on Machine Learning. 2021. ⭐⭐⭐⭐

General Perception

Perceiver: Jaegle, Andrew, et al. "Perceiver: General perception with iterative attention." International Conference on Machine Learning. PMLR, 2021. ⭐⭐⭐⭐⭐
Perceiver IO: Jaegle, Andrew, et al. "Perceiver IO: A general architecture for structured inputs & outputs." arXiv preprint arXiv:2107.14795 (2021). ⭐⭐⭐⭐
Florence: Yuan, Lu, et al. "Florence: A New Foundation Model for Computer Vision." arXiv preprint arXiv:2111.11432 (2021). ⭐⭐⭐⭐⭐
Unified-IO： Unified-IO: A Unified Model for Vision Language and Multi-modal tasks. arXiv:2206.08916 (2022). ⭐⭐⭐⭐
CoCa: Yu, Jiahui, et al. "CoCa: Contrastive captioners are image-text foundation models." arXiv preprint arXiv:2205.01917 (2022). ⭐⭐⭐⭐⭐

Traditional Method

Rolling Guidance Filter: Zhang, Q., Shen, X., Xu, L., & Jia, J. Rolling guidance filter. In ECCV, 2014. ⭐⭐⭐⭐⭐

Talks

G-RMI: Google. (Object Detection) slides
2017 CVPR Tutorial: video and slides
16-18 Computer Vision Conferences: https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reading-List

Basic Network and Techniques

Object Detection

Semantic Segmentation

Instance Segmentation

Weakly Supervised

Unsupervised/Self-supervised

Semi-supervised

Domain Adaptation

Domain Generalization

Video

Saliency

Attention

Depth Information and Stereo Vision

Shadow Detection/Removal

Image Restoration

Nighttime & Low-light

Image Synthesis

Computational Photography

GAN

Disentangled

AR/VR

Person Re-ID

Distillation

Uncertainty

Transformer

General Perception

Traditional Method

Talks

About

Releases

Packages

xw-hu/Reading-List

Folders and files

Latest commit

History

Repository files navigation

Reading-List

Basic Network and Techniques

Object Detection

Semantic Segmentation

Instance Segmentation

Weakly Supervised

Unsupervised/Self-supervised

Semi-supervised

Domain Adaptation

Domain Generalization

Video

Saliency

Attention

Depth Information and Stereo Vision

Shadow Detection/Removal

Image Restoration

Nighttime & Low-light

Image Synthesis

Computational Photography

GAN

Disentangled

AR/VR

Person Re-ID

Distillation

Uncertainty

Transformer

General Perception

Traditional Method

Talks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages