awesome-computer-vision-resources

A curated list of awesome resources on computer vision and deep learning for easy study and reference.

Table of Contents

Books
Lessons
Tutorials
Toolbox
Repos
FunnyDemo/APP
Dataset

[TOC]

Books

Machine Learning

Watermelon Book: Machine Learning (Chinese)

A guidebook to machine learning, written by Zhi-Hua Zhou [PDF] [Notes] [HandNotes]

Pumpkin book: A program for formula derivation analysis of watermelon book [github] [ReadOnline]

Flower Book: Deep Learning
A classic deep learning book by Yoshua Bengio and Ian GoodFellow, published in 2017 [PDF]
Pattern Recognition and Machine Learning (PRML)
A classic guide book. [Homepage] [PDFv2006] [PDF-Chinese] [Algorithms_Python] [Algorithms_Matlab]
Methods of Statistical Learning
A classic book by Li Hang for machine learning [PDF] [Codes]
Neural Network and Deep Learning (in Chinese)
The book was compiled by Xi-Peng Qiu, based on his lectures in CS, Fudan University. It includes books, slides, exercises and sample source code. [github] [Homepage]
Dive into Deep Learning
Written by Mu LI, et al. [ReadOnline-Chinese] [Video-Bili] [Video-Youtube]

Codes-MXNet [github]

Codes-PyTorch-Eng [github]

Codes-PyTorch-Chinese [github] [Online] [PDF]

Codes-TensorFlow2.0 [Online] [github]

PDF is from OUCMachineLearning/OUCML. [Others]

Machine Learning Yearning
Written by Andrew Ng. [Homepage] [github-Chinese] [ReadOnline] [PDF]
Mathematics for Machine Learning [Homepage] [PDF]
GANs in Action [ReadOnline] [Codes]
A definetive guide for GANs, written by Jakub Langr and Vladinir Bok.
Neural Networks and Deep Learning, Michael Nielsen [ReadOnline] [ReadOnline-Chinese]
Machine Learning for OpenCV: Intelligent image processing with Python

M. Beyeler (2017). [github]
解析卷积神经网络—深度学习实践手册 [Homepage&PDF]
Limitations of Interpretable Machine Learning Methods, Tomas Altmann [Online]

This book explains limitations of current methods in interpretable machine learning. The methods include partial dependence plots (PDP), Accumulated Local Effects (ALE), permutation feature importance, leave-one-covariate out (LOCO) and local interpretable model-agnostic explanations (LIME). All of those methods can be used to explain the behavior and predictions of trained machine learning models.
Interpretable Machine Learning, Christoph Molnar. [Online] [VChinese]
Grokking Deep Learning, Andrew W. Trask, 2019 [Homepage] [BookCode]

Grokking Deep Learning teaches you to build deep learning neural networks from scratch
AI算法工程师手册, 阿里华校专 [Online]

包含数学基础、统计学习、深度学习、常用工具，是作者多年以来学习总结的笔记整理而来。
The Elements of Statistical Learning, 2008 [[PDF]]([https://esl.hohoweiya.xyz/book/The%20Elements%20of%20Statistical%20Learning.pdf](https://esl.hohoweiya.xyz/book/The Elements of Statistical Learning.pdf)) [CN-Online] [Code]

Computer Vision

14 lectures on visual SLAM
The book is published in 2019 for second version by Xiang Gao. [Codes]
An Invitation to 3-D Vision From Images to Models
The book is written by Yi Ma, 2001. [PDF]

MatheMatics

Introduction to Linear Algebra, Gilbert Strang, 2016 [Homepage]
Immersive Linear Algebra. A linear algebra book with fully interactive figures. [Homepage]
Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Engineering, by Jean Gallier, 2019, P1900. [PDF]

Lessons

Hung-yi Lee: Machine Learning/Deep learning
[Slides] [Video-Youtube] [Video-Bili] [Notes] Lessons' homework analysis [github]
Stanford CS231n: Convolutional Neural Networks for Visual Recognition
A course of Stanford university by Fei-Fei Li. [CoursePage] [Video] [Slides] [Notes] [Notes-Chinese] [One Class Project-VideoObSeg]
Andrew Ng: Deep Learning Five Lessons, Machine Learning Courses
Deep Learning course: [CoursePage] [Notes-Chinese] [Notes2] [Homepage1] [Homepage2]

Deep learning specialization [CoursePage] [Notes]

Machine learning course [Homepage] [Note]

MIT 6.S191 深度学习入门 [CoursePage] [Video] [CourseCode]
Google Machine learning crash course (机器学习速成课程) [CoursePage]
Artificial Intelligence/Machine Learning/Deep Learning (Stanford's CS 221/229/230) [Cheatsheet] [Chearsheet-git]
MIT Deep Learning course [CoursePage]

It includes deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence taught by Lex Fridman
Stanford: Analyses of Deep Learning (STATS 385): Experimental and theoretical analyses of deep learning [CoursePage]
Theories of Deep Learning [CoursePage]

Courses by Neural Networks and Deep Learning Lab., Moscow Institute of Physics and Technology (MIPT). Videos are in Russian and slides are in Egnlish.
CMU Optimization (10-725, Fall 2012), Geoff Gordon and Ryan Tibshirani [CoursePage] [Video]
Numerical Methods, New York University, 2010Fall, Aleksandar Donev [Homepage&Slides]
UIUC: IE598-ODL Optimization Theory for Deep Learning, Ruoyu Sun [Homepage]
谭平: 从相机标定到视觉SLAM [Video] [Slides]
深度学习实践，旷视研究院和北大数学科学学院机器学习实验室联合出品 [Video]
培训课程：深度学习框架Tensorflow学习与应用，by 炼数成金 [Video] [Note&CourseCode]

Tutorials

Valse Webinar: Computer Vision Talking [Vedio] [Slides]
人工智能的现状、任务、架构与统一，朱松纯, 201710
中国人工智能40年, 蔡自兴, 2016
Deep learning Blogs:

colah's blog BAIR blog Distill OpenAI blog Adit Deshpande blog NeuralDesigner blog

Computer Vision

ICCV2019 Tutorial [Homepage]

Everything You Need to Know to Reproduce SOTA Deep Learning Models, Hang Zhang (Amazon)

From Image Restoration to Enhancement and Beyond, Radu Timofte (ETHZ)

Global Optimization for Geometric Understanding with Provable Guarantees, Luca Carlone (MIT)

Interpretable Machine Learning for Computer Vision, Bolei Zho (CUHK)

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision, Michael Brow (York University)

Holistic 3D Reconstruction: Learning to Reconstruct Holistic 3D Structures from Sensorial Data, Zihan Zhou (Penn State University)

Visual Recognition for Images, Video, and 3D, Alexander Kirillov (FAIR)

Large-Scale Visual Place Recognition and Image-Based Localization, Torsten Sattler (Chalmers University of Technology)

Accelerating Computer Vision with Mixed Precision, Ming-Yu Liu (NVIDIA)

3D Deep Learning and Applications in Autonomous Driving, Li Erran Li (Scale AI/Columbia U)

Second- and Higher-order Representations in Computer Vision, Piotr Koniusz (ANU)

Visual Learning with Limited Labeled Data, Rogerio S. Feris (IBM)
Deep Learning for Objects and Scenes, CVPR2017 Tutorial

Bolei Zhou, Kaiming He, Ross Girshick, Xiaogang Wang [HomePage]
Large-scale 3D Reconstruction Tutorial
- ICIG2019: SLAM and 3D Reconstruction, Shu-Han Shen [Page]
- 3D Modeling from unstructured imagery. Thesis of Schönberger, Johannes, 2018 (Colmap) [Page] [PDF]
- 三维视觉领域年度进展, VALSE2018 [Page]
- 基于图像的大规模场景三维重建, CCCV2017 Tutorial [Slides]
- Large-scale 3D Modeling from Crowd-sourced Data, CVPR 2017 tutorial [Homepage&Slides]
Organizers: Johannes Schönberger, Jared Heinly, Enrique Dunn, Jan-Michael Frahm, Marc Pollefeys.
- Large-scale 3D Reconstruction from Images, ACCV 2016 tutorial [Homepage$Slides]
  
  Organizers: Tianwei Shen, Jinglu Wang, Tian Fang, Long Quan.
Learning to see, Antonio Torralba, 2016 [Slides]

Appendix

[Report] 图像搜索的前世今生, 华先胜, 2016 [Page]
[Blog] A Simple Guide to Semantic Segmentation [Page]
[Blog] 图像语义分割之FCN和CRF [Notes]
[Blog] An overview of semantic image segmentation, 2018 [Page] [Chinese]
[Blog] Weakly Supervision based image segmentation by deep learning [Page]
[Blog] Computer Vision Tutorial: A Step-by-Step Introduction to Image Segmentation Techniques [Page]
[Blog] Camera calibration guidelines [Page]
[Blog] Face detection in Megvii [Note]
[Blog] AI探测全球原油储备
[Blog] 为什么会出现双摄像头手机?
[Blog] 智能手机双摄像头原理解析：RGB + RGB/Mono
[Blog] 智能手机双摄像头原理解析：广角+长焦
[Blog] 智能手机双摄像头原理解析：RGB +Depth

Machine Learning

Traditional ML

The Complete Hands-On Machine Learning Crash Course [Page]
Tutorial: 100 Days of ML Coding [Homepage] [Chinese-Version]. 100 Days of Machine Learning Coding as proposed by Siraj Raval.
PCA Tutorial: PCA Principle and Implementation by sklearn [Page]
The basic distribution probability Tutorial for Deep Learning Researchers with Code [github]
Super Machine Learning Revision Notes, CreateMoMo, 2019 [Homepage]
机器学习算法常用指标总结 [Page]
一文详解机器学习模型评估指标 [Page]
43 Rules of Machine Learning: Best Practices for ML Engineering, Martin Zinkevich, Google [PDF] [Note]
Tutorial: Choosing the Right Metric for Evaluating Machine Learning Models, Alvira Swalin, 2018 [Page]
Fundamental of Machine Learning

Some basic concept Bayes Method Notes Receptive field

线性回归特征图尺寸和感受野计算
[Blog] 图解支持向量机

Deep Learning

Repo: DeepLearning-500-questions: 深度学习基础大全

Repo: 理解CNN、CNN可视化 CNN-Visualization

1天搞懂深度学习, 李宏毅 [Slides]
Everything You Need to Know to Reproduce SOTA Deep Learning Models, ICCV 2019 Tutorial, by MXNet Team [Homepage]
Advancements in Graph Neural Networks, Jure Leskovec [Slides]
A review of CNN: What Do We Understand About Convolutional Networks? [PDF]
A Tutorial on Deep Learning, Quoc V. Le, 2015 [Part1] [Part2]
26 Things I Learned in the Deep Learning Summer School (organized by Yoshua Bengio), 2015 [Page] [Note]
Deep learning models tutorial [github]

A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks (models implement tutorial). It is developed by Sebastian Raschka. Mainly includes traditional ML, Multilayer Perceptron, CNNs, Metric Learning, Autocoders, GANs, RNNs, Training, Pytorch and TensorFlow.
The Neural Network Zoo, Fjodor Van Veen [Homepage] [Page-Chinese]

A Overview of Neural network architecture: from fundamentals to derivations
Stacked Capsule Autoencoders, Geoffrey Hinton, 2019AAAI report. [Page]
An introduction of GCN (in Chinese) [Page]

Blog

深度学习工程师生存指南 [Page] : 配置DL工作站、各个操作系统、DL库环境、数据集、经典模型等.
一张图了解深度学习的前世今生
An overview of gradient descent optimization algorithms
构建深度神经网络，我有20条「不成熟」的小建议
深度学习调参有哪些技巧?
cnn结构设计技巧-兼顾速度精度与工程实现
[Modules]
A Glimpse of CNN Backborne [Page]
A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN [Homepage]
Siamese Network方法概述：从SiamFC到SiamRPN系列 [Page]
Siamese network 孪生神经网络 [Page]
UNet and medical segmentation [Page]
Advances of Softmax-based Loss for Face Recognition [Page]
60+ SOTA ImageNet Models [Page]
CNN Tips for Custom Modes [Slides]
Complexity Analysis of CNN [Page]
Analysis of Dropout [Page] [Chinese]
Convolutional Autoencoders [Note1] [Note2] [Note3]
A Step by Step Backpropagation Example, 2015 [Page]
Neural Networks and the Backpropagation Algorithm, 2012 [Page] [Code-related]
Transposed Convolution, Fractionally Strided Convolution or Deconvolution [Homepage]
A guide to receptive field arithmetic for Convolutional Neural Networks [Page] [Note]
[Transfer learning]
迁移学习教程: 小王爱迁移系列, 王晋东 [Page]
迁移学习简明手册, 王晋东, 2019 [PDF] [github] [勘误]
Transfer learning and the art of using Pre-trained Models in Deep Learning [Page] [Chinese-version]
[GAN]
CVPR 2018 Tutorial on GANs [Homepage]
GAN学习指南：从原理入门到制作生成Demo [Page]
从动力学角度看优化算法：GAN的第三个阶段 [Page]

Tools Guidance

Code Practice

Opencv-Python Tutorial in Chinese (OpenCV3.0, 2014) [PDF] [ReadOnline] [Download] Original officer tuotrial for OpenCV (for Python), translated by Lihui Duan.
CV and DL Tutorial based on Opencv+Python [Homepage] A online tutorial for CV and DL based on OpenCV-Python
Face swapping with Python, dlib, and OpenCV [Homepage] [Codes]
PCA usage in scikit learn library [Page] [github]
An Introduction to Machine Learning Algorithms [Page]
Deep Learning Using C++ [Page]
Snooper for debug python code without print [PySnooper-Note] [PySnooper-github] [TorchSnooper-Note] [TorchSnooper-github]
homemade-machine-learning [github]

Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained, including linear/logistic regression, K-means clustering, anomaly detection using gaussian distribution, multilayer perceptron (MLP). Each algorithm has interactive Jupyter Notebook demo that allows you to play with training data, algorithms configurations and immediately see the results, charts and predictions right in your browser. In most cases the explanations are based on machine learning course by Andrew Ng.
NumPy手写全部主流机器学习模型(深度学习模块) [github]
使用vscode进行远程炼丹 [Page]
Docker+VSCode配置属于自己的炼丹炉 [Page]
[CodeExercise]
30 seconds of Code: includes Python, CSS, PHP, a snippet collections in various programming languages for developers for high-quality learning.
[Blog] How to analyse 100 GB of data on your laptop with Python [English] [Chinese]
优雅你的Python代码的15个tips

DL Scratch

ICCV2019 Tutorial: Everything You Need to Know to Reproduce SOTA Deep Learning Models, Hang Zhang, Mu Li, etc. Amazon [Homepage]
The keys of Deep Learning in 100 lines of code: Predict malignancy in cancer tumors with a neural network. Build it from scratch in Python [Homepage]
Build your own powerful deep learning environment quickly hand by hand [Homepage]
A numpy implementation of a Convolutional Neural Network [example1] [example2] [example3] [Blog]
An Introduction to Dropout for Regularizing Deep Neural Networks [Page]
Faster RCNN Source Code Practice [Page]
CNN introduction and practice [Page]
An Introduction to BP [Page]
Activation functions [ReluNote] [26Functions] [Overview]
Evaluation of mAP and Code [Page]
Convolution structure [DeformConv Note] [12+ Convolution Method]
Transformer model and code example [Page]
Variant CNN models and codes [Page] [github]
Gradient Deviation and Code [Page]
Run FCN Network Efficiently for large Images [Page]
Image Augmentation for Deep Learning using PyTorch – Feature Engineering for Images [github]
Computer Vision Tutorial: Implementing Mask R-CNN for Image Segmentation (with Python Code) [github]
深度学习开发环境配置 [Page]
Visualizing Convolution Neural Networks using Pytorch Page
[Blog] Building Convolutional Neural Network using NumPy from Scratch, 2018 [Page] [Chinese] [Code]

Training Tricks

The complete beginner’s guide to data cleaning and preprocessing [Homepage] [Chinese]
Practical Advice for Building Deep Neural Networks [Page-Eng] [Page-Chinese]
Tricks for Training Networks [Page]
A Recipe for Training Neural Networks [Page1] [Page2]
语义分割数据增强技巧 [Page]
深度学习项目经验与建议(To do & Not to do) [Page]
How to Train Your ResNet (8 parts) [Page] [Chinese]
Loss issues in validation and training set [Page]
用Pytorch训练快速神经网络的9个技巧 [Page]
Batch Size 大小对训练过程的影响 [Page]
Keep Calm and train a GAN. Pitfalls and Tips on training Generative Adversarial Networks [Page] [Note]
深度神经网络构建建议 [Page]
Deep Learning Rules of Thumb [Blog] [Chinese]
4 Proven Tricks to Improve your Deep Learning Model’s Performance [Page]

Toolbox

TawbaWare: Collection of Digital Camera Software and Photography

PyTorch

Awesome-PyTorch-Chinese: Resource warehouse
Awesome pytorch list [Page]. A comprehensive list of pytorch related content on github,such as different models, implementations, helper libraries, tutorials etc.
PyTorch Tutorial Chinese [Homepage]
Hands-on tour to deep learning with PyTorch [Homepage]
Experiences in Using PyTorch [Page]
PyTorch-OpCounter: 统计模型参数量与FLOPs [Intro] [github]
PyTorch常见的坑 [Page1] [Page2]
Mixed Precision Training Accelerator for PyTorch: Apex [Page]
Accelerate PyTorch Dataloader [Page]
PyTorch Tricks (11 triacks) [github]
An Introduction to PyTorch DataLoader [Page]
PyTorch parallel training based on multiple GPU [Page]
PyTorch Cookbook: common used code [Page]
从头开始了解PyTorch的简单实现 [Page]
Pytorch Handbook (Pytorch中文手册) [github]
Getting Started: Pytorch模型训练实用教程, 余霆嵩, 2018.12 [PDF] [ExampleCode]

以实际应用和工程开发为目的，讲解模型训练过程的操作、实际问题和方法，非常适合Pytorch应用入门。本书详细讲解了22个数据增强方法、10个权值初始化方法、17个损失函数、6个优化器和13个TensorboardX方法。
深度学习框架PyTorch常用代码段 [Page]

TensorFlow

Deep Learning with TensorFlow book [github] [PDF]

Open source Deep Learning book in Chinese, based on TensorFlow 2.0 framework. (Chinese: TensorFlow 2.0深度学习). It contains books and supporting source codes. It is written by Longlong from AI training institute of AI101Edu.
Torch2TRT [github]

torch2trt is a PyTorch to TensorRT converter which utilizes the TensorRT Python API.

Label tools

Summary of label tools [Page1] [Page2]

labelme
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation). It is suitable for image segmentation task. Reference: LabelMe: a Database and Web-based Tool for Image Annotation, IJCV2018
labelImg
LabelImg is a graphical image annotation tool and label object bounding boxes in images.
Annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet.
It is written in Python and uses Qt for its graphical interface.
Besides, it also supports YOLO format.
Curve-GCN

是来自多伦多大学的一种高效交互式图像标注方法，基于Web的统一AI标注工具，其性能优于Polygon-RNN++。在自动模式下运行时间为29.3ms，在交互模式下运行时间为2.6ms，比Polygon-RNN ++分别快10倍和100倍。可以标注实例分割、视频标注、激光雷达或点云数据交互式网格标记等。参考论文: Fast Interactive Object Annotation with Curve-GCN, CVPR2019.
Polygon-RNN++

Efficient Annotation of Segmentation Datasets, released in CVPR2018
coco-annotator [github]

COCO Annotator is a web-based image annotation tool designed for versatility and efficiently label images to create training data for image localization and object detection. It provides many distinct features including the ability to label an image segment (or part of a segment), track object instances, labeling objects with disconnected visible parts, efficiently storing and export annotations in the well-known COCO format.
FIAT: Fast Image Data Annotation Tool [github]

Fast Image Data Annotator Tool (FIAT) enables image data annotation, data augmentation, data extraction, and result visualization/validation.
LOST: Label Objects and Save Time [github]

LOST (Label Object and Save Time) is a flexible web-based framework for semi-automatic image annotation. It provides multiple annotation interfaces for fast image annotation. From paper LOST: A flexible framework for semi-automatic image annotation.
Vatic: Video Annotation Tool from Irvine, California [Homepage]

It is designed for video object annotation. You can only annotate object location each 100 fps, and object location in other frames can be detected automatically by opencv tracking.
yolo_mark [github]

Image detection annotation tool used in yolo2 for training models, and it depends on opencv.
Imglab [github] [WebPage]

A web based tool to label images for objects that can be used to train dlib or other object detectors. It can speedup and simplify image labeling/ annotation process with multiple supported formats.
CVAT: Computer Vision Annotation Tool [github]

CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to annotate million of objects with different properties, e.g. detection and segmentation.
MRLabeler: A wonderful annotation tool of objection detection, used for VOCO/YOLO-etc dataset, depending on OpenCV.
LabelImg2 [github]

It is a graphical image annotation tool, written in Python and uses Qt for its graphical interface. Annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet. It supports rotated box.
roLabelImg [github]

It is a graphical image annotation tool can label ROTATED rectangle regions, which is rewrite from 'labelImg'.

Libraries

Collection

[Blog] Top 20 Python Libraries for Data Science in 2018. Common used core libs, visualization, ML, DL, Distributed DL, NLP, Data scraping.

Machine Learning

All Algorithms implemented in Python

Including data structure and machine learning [Homepage] [github]
Dimensionality reduction Codes [github]

Includes 11 data dimensionality reduction algorithms by Python, such as 1) linear DR: PCA, ICA, LDA, LFA, LPP (linear version of LE); 2) non-linear DR: kernel-based (KPCA, KICA, KDA) and eigenvalue-based/manifold learning (ISOMAP, LLE, LE, LPP, LTSA, MVU, AutoEncoder). Just demos for learning, with limited performance.
EnsembleSVM [Homepage] [github]

A Library for Ensemble Learning Using Support Vector Machines. The EnsembleSVM library offers functionality to perform ensemble learning using Support Vector Machine (SVM) base models. In particular, we offer routines for binary ensemble models using SVM base classifiers.
Dlib [Homepage] [github]

It is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is often used to detect vehicles, persons, etc.

Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research, 2009
Graph Cut for Image Segmentation [Matlab]

CVIP Libs

Peter's MATLAB Functions for Computer Vision and Image Processing [Homepage]
BoofCV

BoofCV is an open source library written from scratch for real-time computer vision. Its functionality covers a range of subjects, low-level image processing, camera calibration, feature detection/tracking, structure-from-motion, fiducial detection, and recognition.
TeleSculptor/MAP-Tk: Aerial Photogrammetry Application powered by KWIVER [Code] [Intro]

An open source C++ collection of libraries and tools for making measurements from aerial video. It focuses on estimating the camera flight trajectory and a sparse 3D point cloud of the scene. Unlike Bundler, VisualSFM and OpenMVG, MAP-Tk exploits temporal order and continuity in video from GPS or GCPs.
VIGRA: Vision with Generic Algorithms [Homepage] [github]

Generic Programming for Computer Vision. It's an image processing and analysis library that puts its main emphasis on customizable algorithms and data structures. VIGRA is especially strong for multi-dimensional images, because many algorithms (e.g. filters, feature computation, superpixels) are implemented for arbitrary high dimensions.
GSL: GNU Scientific Library [Homepage]

The GSL is a numerical library for C and C++ programmers. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.
AutoFlip: an automatic video cropping pipeline built on top of MediaPipe by Google AI [Page] [Blog] [Intro]

NN Framwork

All-in-One, ComputerVision, ImageProc, NLP

- NN Models Zoo

ONNX Model Zoo [github]

这是一个通用的深度学习预训练模型集。该项目汇集了目前最好的深度学习预训练模型，这些模型全部由 Facebook 和 Microsoft 提供，以 ONNX (开放式神经网络交换) 的格式推出，并允许模型在不同框架之间进行迁移。每个模型都有一个相应的 Jupyter Notebook，其中包含模型训练，操作推理，数据集和参考等信息。
Model Zoo: Discover open source deep learning code and pretrained models
Deep Learning Algorithms with Tensorflow [github]

The deeplearning algorithms are carefully implemented by tensorflow.

Algorithms: logistic regression, MLP, CNN, AE, SDA, RBM, DBN;

CNN models: MobileNet, SqueezeNet, ResNet, ShuffleNet, DenseNet, YOLO, SSD.
Pretrained-models.pytorch [github]

Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.
PytorchInsight [github]

This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results. It includes: Attention Models: SENet, SKNet, CBAM, GCNet, BAM, SGENet, SRMNet; Non-Attention Models: OctNet, imagenet_tricks.py, e-shifted L2 Regularizer, Generalization Bound Regularizer, mixup, CutMix.
Pytorch-image-models [github]

PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2/V1, MNASNet, Single-Path NAS, FBNet, and more.

- All-in-One

MagNet: 基于Pytorch的深度学习高级API，旨在减少模板代码量并优化深度学习项目开发效率 [github]
Darwin: A Framework for Machine Learning Research and Development [github]
NAS-Projects [github]

11 neural architecture search (NAS) algorithms implemented in PyTorch.
Augmentor [Homepage] [github]

Image augmentation library in Python for machine learning.
//
TFMA: TensorFlow Model Analysis [github]

由Google发布的一个TensorFlow模型分析的开源项目，旨在帮助 TensorFlow 用户分析训练好的模型。用户可以使用 Trainer 中定义的指标，以分布式的方式来评估大量数据的模型
Flashtorch: 神经网络可视化 [github] [Note]
//
PyTorch Elastic: Pytorch-based framework for distributed training [github]

PyTorch Elastic (torchelastic) is a framework that enables distributed training jobs to be executed in a fault tolerant and elastic manner. It provides the primitives and interfaces for you to write your distributed PyTorch job in such a way that it can be run on multiple machines with elasticity; that is, your distributed job is able to start as soon as min number of workers are present and allowed to grow up to max number of workers without being stopped or restarted.
GPipe: Training Giant Neural Nets using Pipeline Parallelism [PDF] [Note]

GPipe is written in TensorFlow and will be open sourced by Google.
//
Dopamine: 基于 TensorFlow 的强化学习框架, 小型易访问代码库 [github]
TensorLayer [Docs] [github]

It is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extensive collection of customizable neural layers to build complex AI models. TensorLayer is awarded the 2017 Best Open Source Software by the ACM Multimedia Society. Reference paper: TensorLayer: A Versatile Library for Efficient Deep Learning Development, ACMMM2017
//
TransmogrifAI: 由Salesforce提供的Scala编写的端到端的 AutoML 库, 可Spark 运行 [github]
NNI: Neural Network Intelligence [github]

An open source AutoML toolkit for neural architecture search and hyper-parameter tuning. NNI is a toolkit to help users run automated machine learning (AutoML) experiments.
AutoGluon：AutoML Toolkit for Deep Learning, developed by MXNet/Gluon. [Page] [github]
//
GraphPipe [github]

由Oracle开源的一个通用深度学习模型部署框架，旨在帮助用户简化机器学习模型的部署，并使用户摆脱特定框架的模型实现。此外，GraphPine 还提供跨深度学习框架模型的通用API，开箱即用的部署解决方案和强大的性能
AI Lab Container [github]

All-in-one AI development container for rapid prototyping, compatible with the nvidia-docker GPU-accelerated container runtime as well as JupyterHub. This is designed as a lighter and more portable alternative to various cloud provider "Deep Learning Virtual Machines". Get up and running with a wide range of machine learning and deep learning tasks by pulling and running the container on your workstation, on the cloud or within JupyterHub.
//
Tiny-dnn [github]

header only, dependency-free deep learning framework in C++14. It is suitable for deep learning on limited computational resource, embedded systems and IoT devices.
QNNPACK [github]

QNNPACK (Quantized Neural Networks PACKage) is a mobile-optimized library for low-precision high-performance neural network inference. QNNPACK provides implementation of common neural network operators on quantized 8-bit tensors. QNNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives (e.g. convolution, pooling, sigmoid, ReLU, etc.) for high-level deep learning frameworks.
//
Pytorch Project Template [github]

A scalable template for PyTorch projects, with examples in image segmentation, object classification, GANs and reinforcement learning.
Pytorch-template [github]

A best practice for pytorch project template architecture, help you get into your main project faster and just focus on your core model. The corresponding Tensorflow Template: Tensorflow Project Template.
//
tfpyth [github]

Putting TensorFlow back in PyTorch, back in TensorFlow (with differentiable TensorFlow PyTorch adapters). It allows you to wrap a TensorFlow graph to make it callable (and differentiable) through PyTorch, and vice-versa, using simple functions.
Neural Tangents: Fast and Easy Infinite Neural Networks in Python [github]

Neural Tangents is a high-level neural network API for specifying complex, hierarchical, neural networks of both finite and infinite width. Neural Tangents allows researchers to define, train, and evaluate infinite networks as easily as finite ones. Infinite (in width or channel count) neural networks are Gaussian Processes (GPs) with a kernel function determined by their architecture.
Talos: Hyperparameter Optimization for Keras [github]

Talos radically changes the ordinary Keras workflow by fully automating hyperparameter tuning and model evaluation. Talos exposes Keras functionality entirely and there is no new syntax or templates to learn.
FAIRScale [github]

a PyTorch extension library for high performance and large scale training released by Facebook. It suppots pipeline parallal, optimization of state sharding, etc. Refer to Intro

- ComputerVision

PySlowFast: Video recognition and detection codebase in PyTorch by FAIR in 2019ICCV [github] [Tutorial]
TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision [github] [Intro]

Implement lots of computer vision papers by pytorch, including Image Classification ...
PyRobot: An Open Source Robotics Research Platform [Homepage] [github]

PyRobot is a light weight, high-level interface which provides hardware independent APIs for robotic manipulation and navigation. Reference: PyRobot: An Open-source Robotics Framework for Research and Benchmarking, arXiv2019.6
SLM Lab [Homepage] [github]

SLM Lab is a software framework for reproducible reinforcement learning (RL) research. It enables easy development of RL algorithms using modular components and file-based configuration. It also enables flexible experimentation completed with hyperparameter search, result analysis and benchmark results.
Classy Vision: A PyTorch framework for image and video classification [Homepage] [github]

Classy Vision is a new end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models, released by Facebook in Dec, 2019.

Previous computer vision libraries have been focused on providing components for users to build their own frameworks for their research. While this approach offers flexibility for researchers, in production settings it leads to duplicative efforts, and requires users to migrate research between frameworks and to relearn the minutiae of efficient distributed training and data loading. Our PyTorch-based CV framework offers a better solution for training at scale and for deploying to production. It offers several notable advantages:
TorchSat: Pytorch-based satellite imagery analysis framework [Homepage] [github]

TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch. This project is started in 2019 and is still work in progress. Highlight:
- Support multi-channels(> 3 channels, e.g. 8 channels) images and TIFF file as input;
- Data augmentation method for classification, sementic segmentation and object detection;
- Models for satellite vision tasks, e.g. ResNet, DenseNet, UNet, PSPNet, SSD, Faster RCNN...
- Lots of common satellite datasets loader;
- Training script for common satellite vision tasks.
TorchGAN [Docs] [github]

TorchGAN is a Pytorch based framework for designing and developing Generative Adversarial Networks. This framework has been designed to provide building blocks for popular GANs and also to allow customization for cutting edge research.

Reference: TorchGAN: A Flexible Framework for GAN Training and Evaluation, arXiv2019
Evolute: 一个易于使用的进化算法框架, 它定义了个体和种群等基础的结构，还能实现一些常见的进化算法操作，如选择，重现，突变和更新等 [github]
Tensorcom：将训练数据快速地加载到深度学习框架 [github]

注：NVIDIA Tensorcom is a way of loading training data into deep learning frameworks quickly and portably. You can write a single data loading/augmentation pipeline and train one or more jobs in the same or different frameworks with it. Keras and PyTorch can use it.
KAIR: Image Restoration Toolbox based on PyTorch. [github]

Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, IMDN.

- ImageProc

Image Quality Assessment [github] [Usage]

Convolutional Neural Networks to predict the aesthetic and technical quality of images.

- NLP

NLP.js: 一个基于 Node.js 的自然语言处理工具包, 支持一系列的自然语言处理任务，包括单词分割, 词干提取, 句子分析, 命名实体识别, 文本分类和文本生成等任务 [github]
Texar: 一个基于 Tensorflow 的文本生成工具包，它能够支持诸如机器翻译，对话系统，文本总结和语言模型等任务，并允许研究者和开发者快速构建实验协议 [github]

Task-specific Libs

Face, detection, segmentation, 3D vision, crowd counting, optimization

Face Related

libfacedetection, MTCNN, ZQCNN4
InsightFace: 针对2D与3D人脸分析（含检测、识别、对齐、属性识别等）的开源库，包括RetinaFace等算法.
face.evoLVe: High-Performance Face Recognition Library based on PyTorch
LFFD: A Light and Fast Face Detector

轻量级快速人脸检测器，不仅可用于人脸检测，也是一款优秀的单类目标检测器，其最大特点是在精度接近SOTA的同时，速度非常快。

From LFFD: A Light and Fast Face Detector for Edge Devices, 2019 [github]

@SyGoing 基于C++语言和NCNN、MNN、OpenVINO等优化实现了更利于部署的LFFD [NCNN Version] [MNN Version] [OpenVINO]
Mini-caffe [github]

Minimal runtime core of Caffe, Forward only, GPU support and Memory efficiency.
ZQCNN [github]

一款比mini-caffe更快的Forward库，参照mini-caffe来写，包含快速人脸检测模型、106点landmark、人头检测模型、更准的106点模型。
HyperLandmark [github]

基于深度学习的人脸标定算法 (106 个人脸关键点)。这是一个强大的人脸标定的开源项目，包括面部美容，美容化妆，Crycocelle vivo 检测和人脸标定等预处理步骤。这个项目是基于传统的 SDM 算法，可在 Windows 平台上运行，并通过修改开源代码来简化部分测试代码及优化代码结构。
Pytorch Face Landmark Detection [github] Implementation of face landmark detection with PyTorch. The models were trained using coordinate-based or heatmap-based regression methods. Different face detetors were supported. Support 68-point/39-point landmark inference, different backbone networks and face detectors, ONNX inference, heatmap-based inference.

Detection

[Text Detection] Tesseract: OCR library, released by Google in 2005.

Note: 利用OpenCV和Tesseract实现OCR和文本识别

Segmentation

Semseg: PyTorch Semantic Segmentation [github]

This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to use for training and testing on various datasets. The codebase mainly uses ResNet50/101/152 as backbone and can be easily adapted to other basic classification structures. Implemented networks including PSPNet and PSANet, which ranked 1st places in ImageNet Scene Parsing Challenge 2016 @ECCV16, LSUN Semantic Segmentation Challenge 2017 @CVPR17and WAD Drivable Area Segmentation Challenge 2018 @CVPR18. Sample experimented datasets are ADE20K, PASCAL VOC 2012 and Cityscapes.
BodyPix 2.0: Person Segmentation in the Browser, 2019 [github]

This package contains a standalone model called BodyPix, as well as some demos, for running real-time (multiple) person and body part segmentation in the browser using TensorFlow.js. It can segment an image into pixels that are and are not part of a person, and into pixels that belong to each of twenty-four body parts.
Lightweight-Segmentation Libs [github]

Lightweight models for real-time semantic segmentation(include mobilenetv1-v3, shufflenetv1-v2, igcv3, efficientnet).
PixelLib: a simple segmentation library [github]

Pixellib is a library for performing segmentation of objects in images and videos. It supports semantic segmentation and instance segmentation. Intro

3D Vision

3D/Point Cloud Processing
- Geometry++ [Homepage] [Docs]
  
  Geometry++是一个关于三维数据(点云，网格)处理的几何库. 它包含了三维数据处理最基础的算法, 可以作为三维数据处理软件的几何引擎来使用.
- Cilantro: C++ based point cloud processing library. A Chinese introduction can be seen [page]
  
  Other point cloud processing library, such as PCL: Point Cloud Library , Open3D , SLAM6D
- Point cloud registration: Libicp , libpointmatcher , g-icp , n-icp
- TeaseR++: a fast and certifiably-robust point cloud registration library written in C++, with Python and MATLAB bindings. [github] Intro
3D Reconstruction
- MonocularSfM [github]
- OpenVSLAM [github] [Tutorial]
  
  Developed by Japan National Institute of Advanced Industrial Science and Technology. OpenVSLAM是一套单目、立体、RGB-D视觉SLAM系统，其主要特点：兼容多种相机类型，并可以轻松定制兼容其他类型相机；可以存储和加载创建的地图，然后OpenVSLAM可以基于预先构建的地图定位新图像；系统完全模块化的；提供了一些代码片段来理解该系统的核心功能。
  
  OpenVSLAM基于具有稀疏特征的间接SLAM算法构建的，例如ORB-SLAM，ProSLAM和UcoSLAM。该系统可以处理多种相机模型捕获的图像，如透视相机、鱼眼相机和equirectangular相机（环绕平行多相机系统）。而且，用户可以轻松实现支持其他的相机模型（如双鱼眼、catadioptric等）.
- Exiv2: Image metadata library and tools
3D DeepLearning
- Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research [github] [PDF] It is released in 2019 by NVIDIA, including differentiable 3D modules, e.g. DGCNN, DIB-R, GEOMetrics, Image2Mesh, Occupancy Network, Pixel2Mesh, PointNet, PointNet++, MeshEncoder, GraphResNet, SoftRas, Neural 3D Mesh Renderer. Main support 3D data type: Triangle Meshes, Quad Meshes, Voxel Grids, Point Clouds, Signed Distance Functions (SDF)，and their transformation.
- kornia [github] [Doc] [Note]
  
  It is a open-source differentiable compouter vision library based on OpenCV and Pytorch. OpenCV is non-differentiable, which means it does more things for preprocessing and not being embedded in the whole training process. Kornia is used to solve this problem.
- PyTorch3D [Homepage] [github]
  
  PyTorch3D is FAIR's library of reusable components for deep learning with 3D data. It provides efficient, reusable components for 3D Computer Vision research with PyTorch.
- OpenPCDet: 有MMLAB开发的基于PyTorch的点云3D目标检测代码库, 具体参见Intro
SLAM
- pySLAM [github]
  
  pySLAM contains a python implementation of a monocular Visual Odometry (VO) pipeline. It supports many classical and modern local features, and it offers a convenient interface for them. Moreover, it collects other common and useful VO and SLAM tools.

Crowd Counting

C^3 Framework [github]

An open-source PyTorch code for supervised crowd counting. It provides the performances of some basic networks and classic algorithms on the mainstream datasets.

Reference: C$^3$ Framework: An Open-source PyTorch Code for Crowd Counting, arXiv2019. Chinese blog: C^3 Framework系列之一: 一个基于PyTorch的开源人群计数框架 [Link]]

Optimization

PyMaxflow [Code] [Docs]

PyMaxflow is a Python library for graph construction and maxflow computation (commonly known as graph cuts). The core of this library is the C++ implementation by Vladimir Kolmogorov, which can be downloaded from his homepage.

Code Tools

KnockKnock [github]

A small library to get a notification when your training is complete or when it crashes during the process with two additional lines of code.

When training deep learning models, it is common to use early stopping. Apart from a rough estimate, it is difficult to predict when the training will finish. Thus, it can be interesting to set up automatic notifications for your training. It is also interesting to be notified when your training crashes in the middle of the process for unexpected reasons.
Streamlit: The fastest way to build custom ML tools [Homepage] [github]

Streamlit is the first app framework specifically for Machine Learning and Data Science teams. So you can stop spending time on frontend development and get back to what you do best. Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts. It supports hot-reloading, so your app updates live as you edit and save your file. No need to mess with HTTP requests, HTML, JavaScript, etc. All you need is your favorite editor and a browser.

研究人员们开发出了一套面向机器学习工程师的工具套件，可以交互式的、高效地将python脚本转换为可以工程实用的app。除了交互式调试外，其中还包括了GPU、网络接口、客户端和线程等工具;除了简单的机器学习算法，还包括大型图像分割、自动驾驶等任务。
fitlog: fitlog = fast + git + log, 是一款用于辅助用户记录日志和管理代码的工具，主要支持 Linux 和 macOS，也支持在 Windows 的 Git Bash 中使用。

Research Tools

Tools for Networks Drawing Used in Paper [Page]

NN-SVG

该工具开发者来自麻省理工学院弗兰克尔生物工程实验室, 该实验室通常开发可视化和机器学习工具用于分析生物数据。这个工具可以导出非常高清的SVG图。该工具可以绘制以节点形式展开的网络模型、以方块平铺的网络和以三维block形式展开的网络模型(目前只支持卷积层和全连接层)，该工具可以导出非常高清的SVG图。

github地址：https://github.com/zfrenchee

画图工具体验地址：http://alexlenail.me/NN-SVG/

PlotNeuralNet

这个工具是萨尔大学计算机科学专业的一个学生开发的，使用的门槛相对来说就高一些了，用LaTex语言编辑。

github地址: https://github.com/HarisIqbal88/PlotNeuralNet

相似的工具还有：https://github.com/jettan/tikz_cnn

ConvNetDraw

ConvNetDraw是一个使用配置命令的CNN神经网络画图工具，开发者是香港的一位程序员Cédric cbovar。只需输入模型结构中各层的参数配置，调整x，y，z等3个维度。但是它目标分辨率太低了，放大之后不清晰，达不到印刷的需求。github地址: https://cbovar.github.io/ConvNetDraw/

Draw_Convnet

该工具由Borealis公司的员工Gavin Weiguang Ding提供。简单直接，是纯用python代码画图的。核心工具是matplotlib，图不酷炫，但是好在规规矩矩，可以严格控制，论文用挺合适的。

github地址: https://github.com/gwding/draw_convnet

类似的工具还有：https://github.com/yu4u/convnet-drawer

Netscope

它是caffe的网络结构可视化工具，在线编辑，大名鼎鼎的netscope，由斯坦福AILab的Saumitro Dasgupta开发，左边放配置文件，右边出图，非常方便进行网络参数的调整和可视化。这种方式好就好在各个网络层之间的连接非常的方便。支持Caffe的prototxt文件可视化；支持自定义网络结构，可视化输出结构，只要按照prototxt的格式写即可。github地址: https://github.com/ethereon/netscope

VisualDL

Visual DL是由百度开发的，支持PaddlePaddle，PyTorch和MXNet等主流框架。github：https://github.com/PaddlePaddle/VisualDL

CNN Explainer

来自论文《CNN Explainer: learning convolutional neural network with interactive visualization》，可以可视化训练过程。CNN Explainer 使用 TensorFlow.js 加载预训练模型进行可视化效果，交互方面则使用 Svelte 作为框架并使用 D3.js 进行可视化。[github] [Proj]

再分享一个有意思的，不是画什么正经图，但是把权重都画出来了。

http://scs.ryerson.ca/~aharley/vis/conv/

Repos

awesome-machine-learning repo

A curated list of awesome Machine Learning frameworks, libraries and software.
Mikoto10032/DeepLearning
A huge number of deep learning resources related books, blog, tutorials, courses, resources, etc.
3D Machine Learning

A resource repository for 3D machine learning, including 3D dataset, 3D pose estimation, 3D object detection, object-level 3D reconstruction, semantic understanding.
Deep vision
Awesome computer vision
Collection of Papers for Image/Video Super-resolution
AI Learning repo
awesome-local-global-descriptor
3D-Machine-Learning
awesome-scene-strcuture-understanding
DeepLearningAnimePapers

FunnyDemo/APP

Nightmare: MIT于2016年万圣节前夕上线的恐怖图片生成网站，在这个网站上研究者展示了利用人工智能算法生成恐怖风格的图片，其中包括埃菲尔铁塔等地标建筑和人脸等一些结果.
Deep Dream Generator: Is a set of tools which make it possible to explore different AI algorithms. It focus on creative tools for visual content generation like those for merging image styles and content or such as Deep Dream which explores the insight of a deep neural network.
License Plate Detection Pytorch:

Detection based on MTCNN and LPRNet Chinese License Plate Detection
Face2Face: Real-time Face Capture and Reenactment of RGB Videos, 2016CVPROral [Proj]
Real-Time 3D Object Detection on Mobile Devices with MediaPipe by Google AI, 2020 [Proj] [Blog]
2D photo to 3D on mobile devives by Facebook [Blog] [Intro]
3D Photography from RGB-D using Context-aware Layered Depth Inpainting [Proj]

Dataset

How to build a good dataset [Page]

Google Dataset Search Engine

CVonline: Image Databases

从图像处理到语音识别，25款数据科学家必知的深度学习开放数据集 [Page]

Paper: How Good Is My Test Data? Introducing Safety Analysis for Computer Vision, IJCV2017

Comprehensive Dataset

Open Images Dataset [Homepage]

OpenImages is a public dataset for large-scale multi-label and multi-class image classification, released by Google in 2017.

Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships.

It contains a total of 16M bounding boxes for 600 object classes on 1.9M images, making it the largest existing dataset with object location annotations.

The images are very diverse and often contain complex scenes with several objects (8.3 per image on average) and the dataset is annotated with 36.5M image-level labels spanning 19,969 classes. Specifically,
- 15,851,536 boxes on 600 categories;
- 2,785,498 instance segmentations on 350 categories;
- 36,464,560 image-level labels on 19,959 categories;
- 391,073 relationship annotations of 329 relationships;
- Extension - 478,000 crowdsourced images with 6,000+ categories.
Reference:

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale, submitted to IJCV, 2018.11

Large-scale interactive object segmentation with human annotators. CVPR, 2019.
Face Dataset [Page]

Semantic seg for 3D

Semantic3D.net: Large-Scale Point Cloud Classification Benchmark [Homepage]

Released by ETHZ in 2017. It provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total. It also covers a range of diverse urban scenes: churches, streets, railroad tracks, squares, villages, soccer fields, castles to name just a few. The point clouds we provide are scanned statically with state-of-the-art equipment and contain very fine details.

Reference: SEMANTIC3D.NET: A new large-scale point cloud classification benchmark, ISPRS Congress, 2017
ScanNet RGB-D Video dataset [Homepage] [Code]

ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. It is related to several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.

Reference: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, CVPR2017
Stanford 2D-3D-Semantics Dataset (2D-3D-S) [Homepage]

Released by CVGL, Standford. The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations.

the dataset is collected in 6 large-scale indoor areas that originate from 3 different buildings of mainly educational and office use.

It covers over 6,000 m2 and contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. In addition, the dataset contains the raw RGB and Depth imagery along with the corresponding camera information per scan location. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces.

Reference: Joint 2D-3D-Semantic Data for Indoor Scene Understanding, arXiv2017

3D point clouds with ground truth annotations are constructed as S3DIS Dataset (Stanford Large-Scale 3D Indoor Spaces Dataset). These 3D point clouds are included in the 2D-3D-S dataset. It covers several buildings with a covered area of over 6,000 m2 and over 215 million points.

Reference: 3D Semantic Parsing of Large-Scale Indoor Spaces, CVPR2016
NYU V2, S3DIS, KITTI

3D Dataset

SLAM Dataset [github]

KITTI Odometry dataset、EuRoC MAV dataset、TUM RGBD dataset

Aerial Action/Event Dataset

Okutama Action dataset
An Aerial View Video Dataset for Concurrent Human Action Detection.
Reference paper: Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection, CVPRW2017. [Dataset] [Codes]
Stanford Drone Dataset
It is used for target tracking or trajectory forecasting.
Reference paper: Learning Social Etiquette: Human Trajectory Prediction In Crowded Scenes, ECCV2016. [Dataset]
UCLA Aerial Event Dataset
Reference paper: Joint Inference of Groups, Events and Human Roles in Aerial Videos, CVPR2015. [Dataset]
MMSPG mini drone video dataset
The dataset contents can be clustered in three categories: normal, suspicious, and illicit behaviors.
Reference paper: Privacy in Mini-drone Based Video Surveillance, 2015
[Dataset]
VIRAT: VIRAT Video Dataset
It is used for video surveillance (action recognition).
Reference paper: A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video, CVPR2011 [Dataset]

Object Tracking Dataset

Visual Tracking Paper List [github]

VOT2013-/2019 It includes (1) VOT short-term challenge, (2) VOT short-term realtime and (3) VOT long-term challenge, (4) VOT-RGBT challenge (RGB + Thermal/infra-red), (5) VOT-RGBD challenge (RGB + Depth). Paper: A Novel Performance Evaluation Methodology for Single-Target Trackers, PAMI2016
The Seventh Visual Object Tracking VOT2019 Challenge Results, ICCVW2019
[Dataset]
LaSOT
Large-scale Single Object Tracking (LaSOT) aims to provide a dedicated platform for training data-hungry deep trackers as well as assessing long-term tracking performance. Paper: LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking, CVPR2019. [Dataset]
OxUvA: Long-term Tracking
Long-term single-object Tracking in the Wild. Paper: Long-term Tracking in the Wild: A Benchmark, ECCV2018 [Dataset]
TrackingNet
A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. Paper: TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild, ECCV’18. [Dataset]
TLP: Long-Term Visual Object Tracking Benchmark
Paper: Long-Term Visual Object Tracking Benchmark, ACCV2018. [Dataset]
DTB70
A Drone Tracking Benchmark. Paper: Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models, AAAI2017. [Dataset] [Codes]
UAV 123
A Benchmark and Simulator for UAV Tracking. Paper: A Benchmark and Simulator for UAV Tracking, ECCV2016. [Dataset]
OTB-50/100: Visual Tracker Benchmark Paper: Online Object Tracking: A Benchmark, CVPR2013 Object Tracking Benchmark, PAMI2015
[Dataset] [Temple-Color-128]
VIVID Tracking
Paper: An Open Source Tracking Testbed and Evaluation Web Site, PETS 2005 [Dataset]

Scene Understanding

GCC Crowd Counting [Homepage] [PDF] [Dataset&Tools]

Learning from Synthetic Data for Crowd Counting in the Wild, CVPR2019 [Homepage] [Dataset]
CrowdHuman: Human detection

旷视研究院构建更大更拥挤的行人检测数据集，也就是CrowdHuman，其中包括了15000张训练图片，4370张验证图片以及5000张测试图片，平均每张图含有22.64个人，两两交叉大于0.5人框的比例达到了2.4，这些都远远高于目前已有的数据集，也足够说明这个数据集的拥挤程度.

其他的 Caltech, KITTI, CityPersions, COCOPersons
The iMaterialist Fashion Attribute Dataset, 2019.6 [Homepage] [Page]

iMaterialist (Fashion): iMat-fashion

AutoDriving/Transport

KITTI
Mapillary dataset: Mapillary Vistas Dataset (street-view) and Mapillary Traffic Sign Dataset [Page]
HDD dataset: HRI Driving Dataset [Homepage]

Honda Research Institute (HRI) Driving Dataset. It is a challenging dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors. Reference paper: Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning, CVPR2018

Medical Dataset

MedicalNet: A 3D medical dataset with diverse modalities, target organs, and pathologies. It is proposed by Tencent for semantic analysis. Ref: Med3D: Transfer Learning for 3D Medical Image Analysis, 2019

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Docs		Docs
Imgs		Imgs
.DS_Store		.DS_Store
README.md		README.md

tzxiang/awesome-computer-vision-resources

Folders and files

Latest commit

History

Repository files navigation

awesome-computer-vision-resources

Books

Machine Learning

Computer Vision

MatheMatics

Lessons

Tutorials

Computer Vision

Machine Learning

Traditional ML

Deep Learning

Tools Guidance

Code Practice

DL Scratch

Training Tricks

Toolbox

PyTorch

TensorFlow

Label tools

Libraries

Collection

Machine Learning

CVIP Libs

NN Framwork

Task-specific Libs

Code Tools

Research Tools

Repos

FunnyDemo/APP

Dataset

Comprehensive Dataset

Semantic seg for 3D

3D Dataset

Aerial Action/Event Dataset

Object Tracking Dataset

Scene Understanding

AutoDriving/Transport

Medical Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages