<table align="center">
   <td align="center"><a target="_blank" href="https://colab.research.google.com/github/umbcdata602/spring2021/blob/master/introduction.ipynb">
<img src="http://introtodeeplearning.com/images/colab/colab.png?v2.0"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>

# Introduction

* Artificial Intelligence (AI)
    * Subfield of computer science
    * Computer systems that perform tasks normally involving human intelligence
    * Includes algorithms that play chess and started beating human masters 40 years ago
    * Includes theory of artificial neurons dating back more than 60 years
* Machine learning (ML)
    * Subfield of AI
    * Self-learning algorithms use data to make predictions
        * ML models have adjustable parameters that are "trained" with data.
        * Trained models use new data to make predictions
* Deep Learning
    * Subfield of ML often involving big data
    * Neural networks can have many (millions, or more) trainable parameters
    * Massively parallel calculations require modern GPUs
    * [MIT 6.S191](http://introtodeeplearning.com/) lectures

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_01.png" width="600" />

# Statistical Learning

* Statistical and mathematical foundations are centuries old
    * Bayes Theorem (1700s)
    * Calculus, linear algebra, optimization (1600s)
    * [Hastie & Tibshirani lectures](https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) and applications in R
* Why now?
    * Data (Big Data)
    * Hardware (GPUs)
    * Software (Open Source)

# Some applications

* Handwritten text recognition (e.g., ZIP codes)
* Detecting tumors in images (brain scans, mammograms, skin photos, etc.)
* Spam detection
* Classifying news articles, social networks
* Voice and image recognition
* Chatbots and personal assistants (e.g., Siri)
* Text translation
* Genomic data analysis
* Robotics
* Self-driving cars
* Jeopardy (IBM Watson in 2011) & Go (AlphaGo beat Lee Sedol in 2016)
* And the list goes on...


# Machine Learning

* Three types
    * Supervised, 
    * Unsupervised 
    * Reinforcement
* Image credits: [*Python Machine Learning, 3rd Edition* (2019) by Raschka & Mirjalili](https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_01.png)
    * In general, \<img\> tags point to original sources.



# Supervised learning -- classification

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_02.png" width="600"/>

* Qualitative predictions (labels & features)
* Binary classes, e.g., Yes or No, Malignant or Benign
* Multiple classes, e.g., Iris species: Setosa, Versicolor or Virginica


<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_03.png" width="400"/>

* Mathematically, this binary classification model can be written
\begin{align*}
  y =  \left\lbrace
  \begin{array}{r@{}l}
    \mathrm{yes}, & z \geq 0 \\
    \mathrm{no}, & z < 0
  \end{array}
  \right.
\end{align*}
where
$$
z = w_0 + w_1 x_1 + w_2 x_2
$$
* The class $y$ can take on a finite set of qualitative values.
* The dependent variable $z$ is a function of $x_1$ and $x_2$.
* The independent variables $x_1$ and $x_2$ are called "features".
* $w_0$, $w_1$, and $w_2$ are "trainable" parameters
    * $w_1$ and $w_2$ are called weights. 
    * $w_0$ is called a bias term.
* The dashed line corresponds to the equation $z = 0$.
* Trainable parameters are "learned" from data or observations.
    * Training data have known values of $y$ and $x_i$



# Fisher's Iris dataset

* This is a classic dataset for ML classification.
* [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) --  wikipedia.org

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_08.png" width="600"/>


# Supervised Learning -- regression

* Quantitative dependent variable(s) take on any value
* Mathematically, the regression problem in the figure is
$$
y = f(x) + \epsilon
$$
* $f(x)$, the model of interests, is a function of $x$
* $\epsilon$ represents random noise (e.g., measurement error)
* If the model is a linear function of $x$, then
$$
f(x) = w_0 + w_1 * x
$$
* You can perform linear regression with a nonlinear model such as
$$
f(x) = w_0 + w_1 x + w_2 x^2
$$



<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_04.png" width="400"/>



# Logistic Regression & Linear Regression
* You may have started seeing notational similaries between classification and regression.
* "Statistical Learning" perspective clarifies the relationship...
    * Weights are not known *a priori* and must be estimated (learned) from data.
    * Learning requires an objective function that measures "goodness of fit" between model and data
    * Least-squares (linear) regression involves minimizing the sum of squared errors between $y$ and $f$
    * Classification (logistic regression) involves minimizing a quantity known as "cross entropy"
    * In fact, both techniques maximize a statistical measure of "likelihood" given a set of observations.
* Reference: [Chapter 8 of "Elements of Statistical Learning, 2nd Edition"](https://web.stanford.edu/~hastie/ElemStatLearn/) (2009) by Hastie, Tibshirani & Friedman
    * 12th printing (2017) is freely available on the web


# Synonyms

* **Training example**: row in a table representing the dataset; an observation, record, instance, or sample.
* **Feature**: $x$, column in a data table; predictor, variable, input, attribute, or covariate.
* **Training**: Model fitting, parameter estimation.
* **Target**: $y$, outcome, output, response variable, dependent variable, (class) label, and ground truth.
* **Loss function**: cost function, objective function, error function used to train a model.

# Unsupervised learning

* Clustering -- discovering structure/patterns

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_06.png" width="400"/>

* Dimensionality reduction

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_07.png" width="600"/>

# Reinforcement learning

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_05.png" width="600"/>

# Typical workflow


<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch01/images/01_09.png" width="600"/>


# Our tools

* Python
    * R, JavaScript, C, C++, Julia, Matlab, SAS, Fortran
    * [tiobe index](https://www.tiobe.com/tiobe-index/)
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)
    * Check out their [introductory video](https://www.youtube.com/watch?v=inN8seMm7UI) featuring Jake VanderPlas
* Python science libraries
    * numpy, pandas, matplotlib
    * scikit-learn
    * [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) (2016) by VanderPlas
    * Tensorflow & Keras -- [tensorflow.org](www.tensorflow.org)

# Our texts

* [Python Machine Learning, 3rd Edition (2019) by Raschka & Mirjalili](https://github.com/rasbt/python-machine-learning-book-3rd-edition) -- github
* [Introduction to Statistical Learning](https://www.statlearning.com/) (2013) by James, Witten, Hastie and Tibshirani -- statlearning.com
* and others listed in the syllabus