# Machine Learning

Overview of Machine Learning

We will learn about how machine learning is a method of modeling data, typically with predictive functions. Machine learning includes many techniques, but here we will focus on only those necessary to transition into deep learning. For example, random forests, support vector machines, and nearest neighbor are widely-used machine learning techniques that are effective but not covered here.

What is about the model ?

We want a model capable of handling our `inputs` and producing something in the shape of our `ouputs`.

## Big Data
Additional Dimensions
- Complexity: multiple source and data streams
- Variability
    - Unpredictable Data flows
    - Social media trending

Why Big Data is important
- Data constains information
- information lead to insights
- Insights helps in making better decisions

How to derive insights from data?

--> Machine Leanring

Conclusions:
- Data is nothing without insights
- Machine Learning is the key for deriving inisghts from data
- Big Data and Machine Learning ha a huge potential

## Algorithm in ML

The below picture shows an overview of machine learning

<img src="image/1_1_machine-learning.png" style="width:600; align:center" />

### Supervised Learning
Given `features` we want our model to predict `label`. [See more](https://thangckt.github.io/pytorch_deep_learning/02_pytorch_classification/#2-building-a-model)

- Classification
    - Decision Trees
    - Naive Bayers Classification
- Regession
    - Ordinary Least Squares Regression
    - Logistic Regession
    - Support Vector Machines
    - Ensemble Methods

### Unsuppervised Learning
No `label` in this type
- Clustering
    - Centroid-based algorithm
    - Connectivity-based algorithm
    - Density-based algorithm
    - Probabilistic
    - Dimensionality Reduction
    - Neural network/ Deep Learning
- Pricipal Component Analysis
- Independent Component Analysis
- Singular Value Decomposition

### Reinforement Learning

## The Ingredients 

Machine learning the fitting of models $\hat{f}(\vec{x})$ to data $\vec{x}, y$ that we know came from some ``data generation'' process $f(x)$ . Firstly, definitions:

**Features** 

&nbsp;&nbsp;&nbsp;&nbsp;set of $N$ vectors $\{\vec{x}_i\}$ of dimension $D$. Can be reals, integers, etc.

**Labels** 

&nbsp;&nbsp;&nbsp;&nbsp;set of $N$ integers or reals $\{y_i\}$. $y_i$ is usually a scalar
  
**Labeled Data** 

&nbsp;&nbsp;&nbsp;&nbsp;set of $N$ tuples $\{\left(\vec{x}_i, y_i\right)\}$ 

**Unlabeled Data** 

&nbsp;&nbsp;&nbsp;&nbsp;set of $N$ features  $\{\vec{x}_i\}$  that may have unknown $y$ labels

**Data generation process**

&nbsp;&nbsp;&nbsp;&nbsp;The unseen process $f(\vec{x})$ that takes a given feature vector in and returns a real label $y$ (what we're trying to model)

**Model**

&nbsp;&nbsp;&nbsp;&nbsp;A function $\hat{f}(\vec{x})$ that takes a given feature vector in and returns a predicted $\hat{y}$

**Predictions**

&nbsp;&nbsp;&nbsp;&nbsp; $\hat{y}$, our predicted output for a given input $\vec{x}$.

```{note}
The content in this part is primary from: 
- [Deep Learning for molecules & materials](https://dmol.pub/ml)
```

```{seealso}
1. [<ins>Introductory Machine Learning</ins>](https://ai.stanford.edu/~nilsson/mlbook.html)
2. Two reviews of machine learning in materials{cite}`fung2021benchmarking,balachandran2019machine`
3. A review of machine learning in computational chemistry{cite}`gomez2020machine`
4. A review of machine learning in metals{cite}`nandy2018strategies`
```

## Terminologies in ML

- The patterns: the learned parameters in model, or the parameters to find in the relationship between inputs and outputs. For e.g., in linear model $y = ax +b$, the learned patterns (paramters to be found) are the weight `a` and the bias `b`.
- Hidden units: neurons in hidden layers
- Hypeparameters: are all user-choice parameters in model (e.g., learning rate, number of layers, number of neuron in layers,...)
- Epoch: step
- Loss function: measures how wrong your model predictions are. The higher the loss, the worse your model. It is sometimes calles "loss criterion", "criterion", or "cost function".

## Workflow in ML

This workflow work with PyTorch. See [this lesson](https://thangckt.github.io/pytorch_deep_learning/01_pytorch_workflow/)

### 1. Prepare data
1. Prepare inputs and output in the format suitable for ML framework will be used (e.g., Pytorch only work with data in the form of torch.tensor)
2. Split data into sets of train and test (somtimes are: strain, validation, test)

### 2. Build model
1. Constructing a model by subclassing `nn.Module` 
2. Defining a loss function and optimizer.

May consider more step: Setting up device agnostic code (so our model can run on CPU or GPU if it's available).

### 3. Train model

PyTorch steps in training:
1. **Forward pass** - The model goes through all of the training data once, performing its `forward()` function calculations (compute `model(x_train)`).
2. **Calculate the loss** - The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are (`loss = loss_fn(y_pred, y_train)`).
3. **Zero gradients** - The optimizers gradients are set to zero (they are accumulated by default) so they can be recalculated for the specific training step (`optimizer.zero_grad()`).
4. **Perform backpropagation on the loss** - Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with `requires_grad=True`). This is known as backpropagation, hence "backwards" (`loss.backward()`).
5. **Step the optimizer (gradient descent)** - Update the parameters with `requires_grad=True` with respect to the loss gradients in order to improve them (`optimizer.step()`).

## Improving a model (Hyperparameters tuning) 

When the model gives bad predictions, there are a few ways to try for making it better. See [example here](pytorch_deep_learning/02_pytorch_classification.ipynb)

| Model improvement technique* | What does it do? |
| ----- | ----- |
| **Add more layers** | Each layer *potentially* increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data, more layers is often referred to as making your neural network *deeper*. |
| **Add more hidden units** | Similar to the above, more hidden units per layer means a *potential* increase in learning capabilities of the model, more hidden units is often referred to as making your neural network *wider*. |
| **Fitting for longer (more epochs)** | Your model might learn more if it had more opportunities to look at the data. |
| **Changing the activation functions** | Some data just can't be fit with only straight lines (like what we've seen), using non-linear activation functions can help with this (hint, hint). |
| **Change the learning rate** | Less model specific, but still related, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model overcorrects, too little and it doesn't learn enough. |
| **Change the loss function** | Again, less model specific but still important, different problems require different loss functions. For example, a binary cross entropy loss function won't work with a multi-class classification problem. |
| **Use transfer learning** | Take a pretrained model from a problem domain similar to yours and adjust it to your own problem. We cover transfer learning in [notebook 06](pytorch_deep_learning/06_pytorch_transfer_learning/). |

```{note}
- Because you can adjust all of these by hand, they're referred to as **hyperparameters**. 
- And this is also where **machine learning's half art half science** comes in, there's no real way to know here what the best combination of values is for your project, best to follow the data scientist's motto of *"experiment, experiment, experiment"*.
```