# Introduction to Machine Learning from an App Dev Perspective


"If I have seen further it is by standing on ye shoulders of giants" - Sir Isaac Newton, 1675

And so it is with this material.

I have brought together material from multiple sources:

### DataSchool.io 
*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*

The owner, Kevin Markham, has a great video series along with accompanying Jupyter notebooks.  I have taken his [Machine Learning with Text in Python](http://www.dataschool.io/learn/) which I thought was well worth it.

- Email: <kevin@dataschool.io>
- Website: http://dataschool.io
- Twitter: [@justmarkham](https://twitter.com/justmarkham)

### Alyssa Batula, PyLadies Remote presentation
- Alyssa gave a great overview of machine learning.  You can see that on [YouTube - Introduction to Machine Learning in Python with Alyssa Batula](https://www.youtube.com/watch?v=-BIGzKmxMGY).  
- You can find her material on [GitHub](https://github.com/abatula/MachineLearningIntro)

### Introduction to Machine Learning with Python: A Guide for Data Scientists
- Book by Andreas Muller and Sarah Guido is a very consumeable guide to machine learning with great insight into the field.

- You can find that on [Amazon](https://www.amazon.com/Introduction-Machine-Learning-Python-Scientists/dp/1449369413/ref=sr_1_2?ie=UTF8&qid=1511371961&sr=8-2&keywords=machine+learning+oreilly).

- Andreas [GitHub](https://github.com/amueller/introduction_to_ml_with_python) material for the book.



# Overall Agenda
- Introduction to Machine Learning
- How to setup a Python 3 environment with scikit-learn
- Getting started with scikit data sets
- How to train a supervised classification model
- How to evaluate a supervised classification model
- How to use a linear regression model
- Using Grid Search to find the optimal model parameters
- Classification metrics
- Machine learning with Text
- Unsupervised KMeans clustering


# What is machine learning, and how does it work?

From, Andreas Muller:

"Machine learning is the art and science of giving computers the ability to learn to make decisions from data... *without* the being explicitly programmed."


![DataScience Venn Diagram](images/data-science-venn-diagram-computer-science-machine.png)

## Agenda

- What is machine learning?
- What are the two main categories of machine learning?
- What are some examples of machine learning?
- How does machine learning "work"?

## What is machine learning?

**Definition**: "Machine learning is the semi-automated extraction of knowledge from data"

**Definition**: "Field of study that gives computers the ability to learn without being explictly programmed" - Arthur Samuel

**Definition**: Predicting the future based on past data


Learn by observations

- **Knowledge from data**: Starts with a question that might be answerable using data
- **Automated extraction**: A computer provides the insight
- **Semi-automated**: Machine learning still requires many smart decisions by a human

![Machine learning](images/machineLearning-what.png)

`https://pythonprogramming.net/machine-learning-python-sklearn-intro`

# What kinds of problems can Machine Learning solve?

- Classification
    - Predict a category
        - Ham or Spam email
        - At risk for Diabetes (true/false)
- Regression
    - Predict a quantity
        - Price of a house
        - Amount of sales
- Clustering
    - Group similar things together, but you dont necessarily know what the things are

## What are the two main categories of machine learning?

**Supervised learning**: Making predictions using data

Used for Classification and Regression
    
- Example: Is a given email "spam" or "ham"?
- There is an outcome we are trying to predict

![Spam filter](images/01_spam_filter.png)

**Unsupervised learning**: Extracting structure from data

Used for Clustering

- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors
- There is no "right answer"

![Clustering](images/01_clustering.png)

## How does machine learning "work"?

High-level steps of **supervised learning**:

1. Gather and Clean Data

2. Feature Selection
    - A feature is typically a column in a data set
    - What data is relevant to question trying to be answered
    - Some features just add noise and actually produce worse accuracy

3. Select Machine Learning Model / Algorithm
    - Different algorithms have very different prediction characteristics
    - You will likely try many different kinds of Machine Learning Algorithms and determine the best one

4. First, train a **machine learning model** using **labeled data**

    - "Labeled data" has been labeled with the outcome
    - "Machine learning model" learns the relationship between the attributes of the data and its outcome

5. Then, make **predictions** on **new data** for which the label is unknown


## Keep in Mind

### <font color=green>Garbage In -> Garbage Out</font>


### <font color=green>Change Anything -> Change Everything</font>


### Data is responsible for how well the model performs.

![Supervised learning](images/01_supervised_learning.png)

The primary goal of supervised learning is to build a model that "generalizes": It accurately predicts the **future** rather than the **past**!

## Questions about machine learning

- Does your data set **accurately** represent what you are trying to predict?
- Does a feature contribute to the **signal** or the **noise** in the data?
- How do I choose **which attributes** of my data to include in the model?
- How do I choose **which model** to use?
- How do I **optimize** this model for best performance?
- How do I ensure that I'm building a model that will **generalize** to unseen data?
- Can I **estimate** how well my model is likely to perform on unseen data?

## Resources

- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)
- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)