<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction-to-Computer-Vision" data-toc-modified-id="Introduction-to-Computer-Vision-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction to Computer Vision</a></span><ul class="toc-item"><li><span><a href="#Computer-Vision-Overview" data-toc-modified-id="Computer-Vision-Overview-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Computer Vision Overview</a></span></li><li><span><a href="#Computer-Vision-Tasks" data-toc-modified-id="Computer-Vision-Tasks-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Computer Vision Tasks</a></span></li></ul></li><li><span><a href="#GluonCV-Toolkit" data-toc-modified-id="GluonCV-Toolkit-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>GluonCV Toolkit</a></span><ul class="toc-item"><li><span><a href="#GluonCV" data-toc-modified-id="GluonCV-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>GluonCV</a></span></li><li><span><a href="#GluonCV-Model-Zoo" data-toc-modified-id="GluonCV-Model-Zoo-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>GluonCV Model Zoo</a></span></li></ul></li><li><span><a href="#Apache-MXNet-Framework" data-toc-modified-id="Apache-MXNet-Framework-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Apache MXNet Framework</a></span><ul class="toc-item"><li><span><a href="#Apache-MXNet" data-toc-modified-id="Apache-MXNet-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Apache MXNet</a></span></li><li><span><a href="#Imperative-vs-Symbolic" data-toc-modified-id="Imperative-vs-Symbolic-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Imperative vs Symbolic</a></span><ul class="toc-item"><li><span><a href="#Imperative-Programming" data-toc-modified-id="Imperative-Programming-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Imperative Programming</a></span></li><li><span><a href="#Symbolic-Programming" data-toc-modified-id="Symbolic-Programming-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Symbolic Programming</a></span></li></ul></li></ul></li><li><span><a href="#Summary" data-toc-modified-id="Summary-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Summary</a></span></li><li><span><a href="#Quiz" data-toc-modified-id="Quiz-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Quiz</a></span></li></ul></div>

# Key Concepts
- Explain common computer vision tasks
- Understand what tasks can be solved with GluonCV
- Understand the benefits of Apache MXNet

## Introduction to Computer Vision

### Computer Vision Overview

Computer vision is a technological field to extract high level understanding from digital images/videos for various application. Computer vision is considered as a subfield of Artificial Intelligence (AI), because the goal is to automate tasks that the human visual system performs which involve the ability to recognize, describe, and interpret visual.

Example: object detection from an image.

<img src="assets/module1/object-detection-intro.png" width="200">

### Computer Vision Tasks

History:
- David Hubel and Torsten Wiesel (1959): visual processing starts with a neuron
- Kunihiko Fukushima: Neocognitron (now it is called as Convolutional Neural Network)
- Yann LeCun: LeNet5 (apply Machine Learning technique to CNN)
- Stop for a while, needs more robust hardware
- 2012 was the resurgence of Computer Vision due to the availability of more powerful accelerators (GPU)

Computer vision tasks, from simple to complex:
1. Image classification: **what** is the main object in the image? Limitation: multiple object classification.
<img src="assets/module1/image-classification.png" width="200">

2. Object detection: localize each object with a bounding box and classify it. Limitation: not every object is rectangular.
<img src="assets/module1/object-detection.png" width="200">

3. Semantic segmentation: seperate objects with its background with a mask. Limitation: objects will have the same mask.
<img src="assets/module1/semantic-segmentation.png" width="200">

4. Instance segmentation: generates masks for each objects. Limitation: needs a lot of manually labeled instance image.
<img src="assets/module1/instance-segmentation.png" width="200">

## GluonCV Toolkit

### GluonCV

Problems with open source tools:
- Limited choices of models
- Incorrect implementation
- Discontinue of maintenance
- Gap between research and deployment

**Knowledge Check**

What common issues in training and deploying computer vision models are addressed by the introduction of the GluonCV toolkit?
- [ ] Limited choice
- [ ] Incorrect implementations
- [ ] Discontinued support
- [ ] All of the above

GluonCV created by AWS scientist to address above issues.
- Model for multiple tasks: image classification, object detection, semantic segmentation, instance segmentation, and pose estimation.
- Implementation with high accuracy.
- Official maintenance and development.
- Easy-to-use APIs for experimentation.

Additional: https://gluon-cv.mxnet.io/

### GluonCV Model Zoo

<img src="assets/module1/model-illustration.png" width="400">

Model in Computer Vision = Convolutional Neural Network

**Knowledge Check**

Why does GluonCV have various models for the same computer vision task?
- [ ] To prevent overfitting
- [ ] To allow for tradeoffs between model accuracy and model complexity
- [ ] To reduce model complexity
- [ ] To increase computational cost and model accuracy for simple tasks

GluonCV model zoo: a chart to help us to choose the best pretrained model.

<img src="assets/module1/model-zoo.png" width="400">

Reference: https://gluon-cv.mxnet.io/model_zoo/classification.html

- X-axis: speed
- Y-axis: accuracy
- Dot: model
- Size: memory consumption

## Apache MXNet Framework

### Apache MXNet

MXNet is a deep learning framework which provides software tool for building and training neural network models.

Originated from Distributed Machine Learning Community (DMLC). Project from DMLC:
- XGBoost
- TVM (deep learning compiler)
- MXNet

Feature:
- Multi-language support: Python, R, Perl, Java, Julia, Scala, Clojure, C++. 
- Rich ecosystem: model zoo, GluonCV, GluonNLP, Keras, MXBoard (visualize training process), model server (AWS), TensorRT (NVIDIA), TVM (deployment), ONNX (for interchangeable AI models).

**Knowledge Check**

What programming language is the MXNet backend engine implemented in?
- [ ] Python
- [ ] C++
- [ ] Java
- [ ] Scala

### Imperative vs Symbolic
Consider the following computational graph:

<img src="assets/module1/comp-graph.png" width="200">

We are going to compare two paradigms of deep learning programming interface:

#### Imperative Programming

In [1]:
from mxnet import nd
a = nd.ones(10)
b = nd.ones(10) * 2
c = b * a
print(c)
d = c + 1


[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
<NDArray 10 @cpu(0)>


Advantage:
- Straightforward and flexible (easy to debug, by printing)
- Use language natively (no new syntax)

Disadvantage:
- Hard to optimize (computational time, memory usage)

#### Symbolic Programming

In [2]:
from mxnet.symbol import Variable # for placeholder
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
# no execution above, only placeholder
# f = compile(D)
# d = f(A=nd.ones(10), B=nd.ones(10)*2)

Advantage:
- Optimized
- Easy to serialize models (to be saved) so that can be deserialize across languages

Disadvantage:
- Hard to debug
- Unsuitable for dynamic models

**Discussion**

Why is debugging a computational graph easier in imperative programs than in symbolic programs?

> Because we can print the results right after each computation step, to check whether the calculation is correct or wrong.

Gluon API: hybrid programming interface (combination of imperative and symbolic programming). There are two stages:
1. Develop model using imperative programming
2. Optimize by convert into symbolic programming

Convert imperative to symbolic by `net.hybridize()`

## Summary

1. Computer vision: 
    - Extracting high-level understanding from digital images.
    - Example: classification, detection, segmentation.
    - Using Convolutional Neural Network.

2. GluonCV: 
    - Open-source computer vision toolkit from AWS.
    - Has model zoo with pre-trained model.

3. Apache MXNet: 
    - Deep learning framework, backend engine for GluonCV.
    - Key: portable, efficient, scalable. 
    - Use hybrid programming (imperative + symbolic)

## Quiz

1. Which of the following is not a computer vision task?
    - [ ] Volumetric analysis
    - [ ] Pose estimation
    - [ ] Object detection
    - [ ] Semantic segmentation

2. Which deep learning framework is the GluonCV toolkit based on?
    - [ ] Pytorch
    - [ ] Apache MXNet
    - [ ] Caffe
    - [ ] Chainer

3. Which of the following is untrue about the symbolic paradigm in deep learning frameworks?
    - [ ] Symbolic programs do not need to be compiled before they can be executed
    - [ ] Symbolic programs provide opportunities to optimize computational graphs
    - [ ] Symbolic programs can be hard to debug when they throw an error
    - [ ] Symbolic programs are often constructed with variable placeholders

4. What command in the Gluon API of MXNet converts an imperative computational graph to a symbolic graph?
    - [ ] .convert()
    - [ ] .to_symbol()
    - [ ] .hybridize()
    - [ ] .optimize()

5. What area of machine learning currently achieves State of the Art performance in computer vision tasks?
    - [ ] Reinforcement Learning
    - [ ] Metric Learning
    - [ ] Similarity Learning
    - [ ] Deep Learning

6. What do image classification models predict?
    - [ ] A cluster centroid for the class of objects in the image
    - [ ] A hierarchy for objects in the image
    - [ ] Another image that is similar to the input image
    - [ ] A predefined label for the image

7. Which computer vision tasks predicts pixel level masks for each distinct class of objects in the image?
    - [ ] Object extraction
    - [ ] Semantic Segmentation
    - [ ] Instance Segmentation
    - [ ] Super-resolution imaging

8. What discovery by Hubel and Wiesel and implemented by Fukushima in the Neocognitron is crucial to the success of modern deep learning based computer vision systems?
    - [ ] Vision is intimately tied to recognition and understanding
    - [ ] Vision is achieved by convolution in the human brain
    - [ ] Vision is hierarchical and local at each level
    - [ ] Vision involves extensive feature engineering

9. What exactly led to the resurgence of neural network models and deep learning for computer vision tasks in 2012?
    - [ ] Availability of large datasets thanks to the internet
    - [ ] More powerful computational software and resources
    - [ ] Hardware accelerators like GPUs
    - [ ] All of the above

10. Which computer vision task is most appropriate for localizing appearances of barcodes in an image?
    - [ ] Image classification
    - [ ] Object Detection
    - [ ] Semantic Segmentation
    - [ ] Instance Segmentation