# CA ASTRO - Data Science and ML Workshop

**Author** : Victor Calderon

**Date** : 13th and 14th of June, 2020

## Intro to Machine Learning (ML)

In this notebook, we will
- discuss some of the main topics about machine learning
- create a simple ML model using out-of-the-box datasets
- evaluate the performance of the model

### Types of Machine Learning

The are of `machine learning` is very vast and it is comprised of **many** different
subcategories of ML models / algorithms, as it's nicely represented in the following
figure:

![ML_usages](./images/Data_Science_Diagram.png)
<small>Image source: ([http://www.cognub.com/index.php/cognitive-platform/](http://www.cognub.com/index.php/cognitive-platform/))</small> and
<small>article [link](https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d)</small>

There are typically 3 main different types of ML algorithms:

- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning

#### Supervised Learning

This type of algorithm consists of a target which is to be predicted given a set of *predictors* (or features), which are
independent from the `target` variable. Using these variables, one can create a **mapping** between the different
**features** and the **target variable**.

This is the same as

$$y = f(\textrm{features})$$

The algorithm / model undergoes a **training** process, in which the model *learns* the functional form of *f* as a function
of the features, and it tries to predict $y$ based on this mapping.

Supervised learning can be broken into 2 main branches:

1. **Classification**: Model is trained to *classify* something into certain classes
    - Examples:
        - Clasifying whether a patient has a certain disease or not
        - Classify between elliptical and spiral galaxies
        - Determine types of stars (*OBAFGKM*)

2. **Regression**: A model is trained to *predict* some value like prices, weight, number of surgeries, etc.
    - Examples:
        - Predicting stock prices based on historical data
        - Predict how much a car will cost in 2, 3, or 4 years from now, based on its model and specifications.
        - Estimate the mass of the dark matter halo based on galaxy- and group-related properties ($M_{\odot}$, color, velocity, etc.)

#### Unsupervised Learning

This type of algorithm consists of using different *features* to determine a **class**, to which an object belongs. Using this
algorithm, we do not have any target variables or outcome variable to predict.

Unsupervised learning can be split into two categories:

- **Clustering**: A clustering scenario is where you want to determine the groupings in a given dataset
    - Examples
        - Grouping customers into those who buy a lot and do not spend much
        - Recommended Systems (Netflix, Amazon, etc.)
        - Cluster different books o the bases of topics and information

- **Association**: An association rule learning scenario (not that common) is where one wants to discover the underlying rules governing large portions of the dataset.
    - Examples:
        - If `Person A` buys item `X`, they also tend to buy item `Y`.
      

#### Reinforcement Learning

Using this type of algorithm, the model is trained to make specific decisions after *learning* from its *environment*.
The model is first exposed to a *training environment*, in which it learns to make the correct decision by trial and error.

An example taken from [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2017/01/introduction-to-reinforcement-learning-implementation/)
considers Reinforcement Learning as when a child is learning how to walk:

1. The first thing the child will observe is to **notice** how you are walking. You use two legs, taking a step at a time in order to walk. Grasping this concept, the child tries to replicate you.
2. But soon he/she will understand that before walking, the child has to stand up! This is a challenge that comes along while trying to walk. So now the child **attempts to get up**, staggering and slipping but still determinant to get up.

3. Then there’s another challenge to cope up with. Standing up was easy, but to **remain still** is another task altogether! Clutching thin air to find support, the child manages to stay standing.

4. Now the real task for the child is to start walking. But it’s easy to say than actually do it. There are so **many things to keep in mind**, like balancing the body weight, deciding which foot to put next and where to put it.

---

## Examples of using ML Learning using Scikit-Learn

There are several useful libraries in Python that provide a solid implementation of machine learning algorithms.
Possibly, the most famous one is [Scikit-Learn](https://scikit-learn.org/stable/), a package that provides
efficient and user-friendly version of the most common ML algorithms, including but not limited to

- Random Forest
- Logistic Regression
- Principal Component Analysis (PCA)
- TSNE (another clustering algorithm)
- and many more.

This library contains very useful tools for examine datasets, create and evaluate ML models, and more.