# Deep Learning & Artificial Intelligence
## Deep Learning with Python, Chapter 1
### Dr. Jie Tao, Fairfield University

# Chapter 1: What is Deep Learning

- Machine Learning and Artificial Intelligence have become ubiquitous in our lives
- As data scientists, we should be more __keen__ distinguishing __signals__ from __noises__
  - Will AI replace human workers
- With the knowledge gained in this course
  - can we develop intelligent agents as a job?

# Background of Deep Learning
## Deep Learning with Python, Chapter 1
### Dr. Jie Tao, Fairfield University

## AI, ML, & Deep Learning



- Artificial Intelligence is:
  - the effort to __automate intellectual tasks__ normally performed by humans.
  - it is intractable to figure out explicit rules for solving more complex, fuzzy problems

![image.png](https://drive.google.com/uc?export=view&id=1jLttTfRI-nbIQgd2PUCXAtbpF5zxRG6g)

__Figure 1-1: AI, ML & DL__

## Review of Machine Learning

- Could a computer go beyond “what we know how to order it to perform” and learn on its own how to perform a specified task?

  - could a computer automatically learn data-processing rules by looking at data?
  - A machine-learning system is trained rather than explicitly programmed.
  - Compared to machine learning, deep learning is much less math-oriented

<!--![ML vs. Programming](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig02.jpg)-->

![image.png](https://drive.google.com/uc?export=view&id=1QQoVTOmXOtgW4-t3NQEYL8K-MUfftfmS)

**Figure 1-2. Machine Learning vs. Classical Programming**

## Learning Representations from Data

- What are the fundamentals of machine learning?
  - Input data points: text analytics (_text_); image processing (_images_);
  - Examples of the expected output: what we used to know as __labels__;
  - A way to measure whether the algorithm is doing a good job: what we use to know as __accuracy, f1-score, AUC__ ...
- A machine-learning model transforms its _input data_ into _meaningful outputs_

## What is data representation?

- Machine learning/Deep learning targets at __meaningfully transform data__
- Representation is the intermediate state between inputs and outputs
  - transformations of the data that make it more amenable to the task at hand

<!--![coordinate change](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig04.jpg)-->

![image.png](https://drive.google.com/uc?export=view&id=1hNPkgjiE58r-oUd3pqoE_LNxYyquXXw5)

- This is a __good__ representation - but we did it __manually__
  - this process cannot be __scaled__

## So what does that mean for machine learning?

- All machine-learning algorithms consist of automatically finding such transformations
  - that turn data into more-useful representations for a given task.
    - coordinate changes
    - linear projections
    - translations
    - nonlinear operations

## So what is machine learning after all?



> searching for __useful representations__ of some input data, within a __predefined space of possibilities__, using guidance from a __feedback signal__

In other words:
- Transform training data
- based on predefined labels
- use evaluation metrics to adjust



# Definition of Deep Learning
## Deep Learning with Python, Chapter 1
### Dr. Jie Tao, Fairfield University

## 'Deep' in Deep Learning

- One of the important goals of _deep learing_ is to learn the __data representations__ of the data
  - Successive layers are used for this purpose
- Number of layers is called __depth__
  - Similar to decision tree models
- Tens or hundreds of layers are used to transform the training data
  - Traditional machine learning use 1 or 2 layers to learn data representations
- The deep learning models have their roots in __neural networks__
  - Different layers stacked on top of each other
  - Original idea is to mimic the human brian


 ## Deep Learning Defined

 - Now we define deep learning as:

 > deep learning is a mathematical framework for learning representations from data

 - But how is that possible?

 <!--![CNN example1](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig05.jpg)-->

 ![image.png](https://drive.google.com/uc?export=view&id=15yX7ILYp3FQXE6P8VcHZ-exjefyP-Pvv)

 ## Deep Learning Defined

 - Transforms the digit image into
  - representations that are increasingly different from the original image and
  - increasingly informative about the final result

<!--![CNN Example2](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig06_alt.jpg)-->

![image.png](https://drive.google.com/uc?export=view&id=1r8P1dCJqMuxOOGMlBpdizHsXE10a9gNI)

## How Deep Learning Works

- How (_supervised_) machine learning works:
  - mapping inputs (e.g. _images_) to labels (e.g. `4`) by observing many example of the input-label pairs
- Deep learning completes this process through a __sequence of layers__
  - what a layer does to its input are through the __weights__
  - which is a sequence of numbers
  - amount of weights is determined by the number of __neurons__ in each layer
  - These weights are the __parameters__ of deep learning models

## How Deep Learning Works

- _learning_ in deep learning refer to the process that:
  - finding a set of values for the __weights__
  - to __best__ mapping the _inputs_ to their respective _labels_
- However, a deep learning model may contain millions, or even tens of millions of parameters
  - Searching the best set of values is a difficult and time consuming task
  - particularly changing one parameter may affect the behaviors of others

## How Deep Learning Works

- Below figure shows how deep learning models work
  - but the question is that "how do we know that the predicted value of the target ($\hat{y}$) is close enough to the true value?"

<!--![ch1-5](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig07.jpg)-->

![image.png](https://drive.google.com/uc?export=view&id=1AMa4bKF81LYZ7GYApIpBj7N_x3S9Mirq)

## How Deep Learning Works

- To control you have to observe first
  - we need to measure how close the predicted value ($\hat{y}$) is from the actual value $y$.
  - We call this as the __loss function__
  - which measures the __performance__ of your deep learning models

![ch1-6](https://drive.google.com/uc?export=view&id=1MCL5DcaVPG1T8QxdeB4KWVBct3WsvE37)

## How Deep Learning Works

- The most important trick in deep learning is to use the score from the __loss function__  as the feedback signal to _adjust the weights_
  - in order to lower the __loss score__
- This trick is termed as the __optimizer__
  - we will explain how this works later
- This is the _training process_ for any NN
  - Objective: __minimize__ loss score

<!--![ch1-7](https://drek4537l1klr.cloudfront.net/chollet/Figures/01fig09.jpg)-->


![image.png](https://drive.google.com/uc?export=view&id=1ISxOOzTbS6OKtHAL1hTAe9vIYBW-xRM-)

## How Deep Learning Works

- Initially, random values are assigned as the weights in the model
  - Since it is _random_ transformation, the loss score might be _high_
  - As we move through the training data, the weights should be optimized
  - meaning the loss scoere should decrease
  - This is called a __training loop__ (aka. _batch_)
    - with enough batches, the loss score should be __minimized__
  - That's why we need a fairly large training set

## Deep Learning Applications

- remarkable results on perceptual problems such as seeing and hearing—problems involving skills that seem natural and intuitive to humans but have long been elusive for machines.
  - image classification
  - speech recognition
  - machine translation
  - digital assistants
  - autonomous driving
  - ...

## Deep Learning is not a Short Term Hype

- In 1970s, the holy grail of AI is to build __human-level generic inteligence_
  - which means intelligent agents can be as functional as human
  - that led us into the first _AI winter_
- Now we believe that modern AIs should be like

> smart animals, which are specialized in vision, reading, ...

## The Promise of AI

- We’re only getting started in applying deep learning to many important problems for which it could prove transformative
- AI is coming, whether you embrace it or not
- AI will end up being applied to nearly every process that makes up our society and our daily lives, much like the internet is today.
- It may take a while for AI to be deployed to its true potential—a potential the full extent of which no one has yet dared to dream - AI will transform our lives in a holistic way

# Review of Machine Learning
## Deep Learning with Python, Chapter 1
### Dr. Jie Tao, Fairfield University

## Review of Machine Learning (again)

- Most of the machine-learning algorithms used in the industry today are __NOT__ deep-learning algorithms
- Deep learning isn’t always the right tool for the job
  - sometimes there isn’t enough data for deep learning to be applicable
  - sometimes the problem is better solved by a different algorithm
- In other words, deep learning should NOT be the only tool on your belt

## Machine Learning Knowledge #1: Probabilistic Modeling

- the application of the principles of statistics to data analysis
  - one of the earliest forms and still widely applicable today
  - one good example is Naive Bayes
  - since we are inferring from the data - there is always uncertainty
  - take classification as an example
    - the classification results is always there is an `85%` probability that this customer is going to purchase from us


## Machine Learning Knowledge #2: Artificial Neural Networks

- Artificial Neural Networks (ANNs), as the very primitive form of neural networks, are quite different from the variants discussed in this course
- Back in the day we do not have an efficient way to train very large networks
- until we invented gradient-descent optimization (which will be discussed later in this course)

## Machine Learning Knowledge #3: Kernel Methods

- First discussed in Support Vector Machines (SVMs)
- in real life, a decision boundary __may not be linear__
  - a decision boundary separates your data points into different classes
  - a good decision boundary maximizes the __margin__
- To solve the non-linear problems, we need to map data to a high-dimensional representation
  - Again, sounds good on paper but impractical
- This is known as the kernel methods (function), where original data points are transformed
  - also why SVMs became popular (and hard to scale)

## Machine Learning Knowledge #4: Other Machine Learning Techniques

- Decision trees are flowchart-like structures that let you classify input data points or predict output values given inputs
- Random Forest algorithm introduced a robust, practical take on decision-tree learning that
  - involves building a large number of specialized decision trees and then ensembling their outputs.
- gradient boosting machine is a machine-learning technique based on ensembling weak prediction models, generally decision trees.
  - most popular one is eXtreme Gradient Boosting (XGBoost)
- Aside from deep learning, Random Forest and XGBoost are performing very well
  - We often use them as __baseline models__ when building our deep learning models

## What makes Deep Learning so Different?

- Superior performance is not the only reason deep learning is so popular right now
  - it also automates the most crucial step in machine learning: __feature engineering__
- Since traditional machine learning usually let the input data go through at most _two_ transformations
  - modeling complicated problems is too crude
- Humans have to spend time tweaking the data so that they are:
  - suitable for the modeling technique
  - representative of the analytical problem

## What makes Deep Learning so Different?

- So can we stack multiple traditional machine learning models so that they can be more successful?
  - This is called __stacking-based ensemble__
  - finding the best representation in the multi-layer model is very delicate
- Deep learning provides _joint feature representation_
  - if representation of one feature changes, the representation for the other feature also changes accordingly
  - Also the features are learned in successive layers

## Modern Machine Learning Landscape

- Top performing models are either XGBoost or some type of deep learning, per Kaggle
- In terms of Python packages, you need to be familiar with `keras` and `xgboost`
  - most popularly, we use `tensorflow` as the backend of `keras` in building our deep learning models
  - another package called `pytorch` is on the rising now

## Deep Learning Hardware

- In 2007, Nvidia launched [CUDA](https://developer.nvidia.com/about-cuda), a programming interface that use Nvdia GPUs for computing purposes
  - Nvidia Titan X is 350 times faster than a CPU in your laptop
  - Companies are investing on clusters of GPUs for training deep learning models
- In 2016 Google launched its TPU chips that are possibly 10 times faster than the GPUs
  - since we are using Colab, the TPU option is free-of-charge for us

## Deep Learning Algorithms

- As said before, due to the lack of an efficient training algoritm, ANN remained fairly shallow
  - with 1 or 2 hidden layers
- The main issue is __gradient diminishing/explosion__
  - The weights are either too small/big after a few layers
- We need a better training algorithm that includes:
  - better __activation functions__
  - better __weight initialization schemes__
  - better __optimization methods__

## Deep Learning Software

- `Theano` and `Tensorflow` are two symbolic tensor-manipulation frameworks for Python
- `Keras` as an API building on `Theano` or `Tensorflow`
  - which is very user friendly so that building deep learning models is like manipulating LEGO bricks
  - quickly became the go-to deep-learning solution for deep learning
- `pytorch` is a new API/framework for deep learning
  - a lot of researchers are using it now
  - some companies are adopting it as well
- We will be using `keras` and `tensorflow` mainly in this course
  - Since they are easy when building your own NNs from scratch
  - it might not be a bad idea to try `torch` since it fits complex networks well

## Will this Hype Last?

- For deep learning to last, its applications have to pertain the following three properties:
  - _Simplicity_: good implementation should contain only __five or six different tensor operations__.
  - _Scalability_: deep learning models needs to be trained on __multiple__ GPUs/TPUs at the same time, so that they can be trained on datasets of __arbitrary size__.
  - _Versatility and reusability_: deep-learning models can be trained on additional data without restarting from scratch, making them viable for __continuous online learning__.

## Hands-on: Python Review & Starts with Keras

Let's move on to the hands-on part of this class, in which we are going to do a bit of review on Python and starts our first `keras` code.

# Deep Learning & Artificial Intelligence
## Deep Learning with Python, Chapter 1
### Dr. Jie Tao, Fairfield University