# Week 2

Following Notebook Summarizes the [Week 3](https://www.coursera.org/learn/machine-learning-data-lifecycle-in-production/home/week/3) of [Machine Learning](https://www.coursera.org/learn/machine-learning-modeling-pipelines-in-production/home/week/1)

This weeks content is an extension of the previous module of *Data Pipeline*, it addresses methods to optimize the model through hyperparameter tuning, and resource management.

**Index**

- [Dimensionality](#dimensionality)
- [Manual Dimensional Reduction](#manual-dimensional-reduction)
  - [Principal Component Analysis](#principal-component-analysis)

### Dimensionality

**Problems caused by High Dimensional data**
- More dimensions -> more features
- Risk of overfitting our models
- Distances grow more and more alike as more features causes data points to grow apart
- No clear distinction between clustered objects
- Concentration phenomenon for Euclidean distance

**Technical issues of High Dimensionality**
- Runtime and System memory requirements
- Solutions take longer to reach global optima
- More dimensions raise the likelihood of correlated features


**Important Concept**
- Curse of Dimensionality
- Hughes Concept


### Manual Dimensional Reduction

There are many different Dimensionality Reduction methods, but depending on different models it requires different methods.

| Problem | Method | Example |
| --- | --- | --- |
| Classification | Maximize Separation among Classes | Linear Discriminant Analysis (LDA) |
| Regression | Maximize Correlation between projected data and response variable | Partial Least Squares (PLS) |
| Unsupervised | Retain as much data variance as possible | [Principal Component Analysis (PCA)](#principal-component-analysis) |

### Principal Component Analysis

Principal Component Analysis doesn't change the dimensionality of the data. The following steps are taken when PCA is carried out.

- PCA rotates the samples so that they are aligned with the coordinate axes.
- PCA Shifts samples so that they have a mean of zero.

PCA uses first Principal Components to maximize the variance of projections and second PC orthagonal to the first PC maximize the remaining variance.

**Dimensionality Reduction Resources**

- [Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)
- [Independent Component Analysis](https://arxiv.org/pdf/1404.2986.pdf)
- [PCA Extensions](http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/) (**Must Read Content**)

### Mobile, IoT, and similar use cases

Edge devices such as cellphones offer a lot of opportunity to utilize ML, from auto completion to facial recognition. However, generating real-time predictions can be done via hosting the model on a server, or embedding the model into the device. Both methods provide pros and cons that may be crucial component on deciding how the service is carried out.

**Inference on the cloud/server**

*Pros*

- Lots of Compute Capacity
- Scalable Hardware
- Model complexity handled by the server
- Easy to add new features and update the model
- Low-latency and batch prediction

*Cons*

- Timely inference is needed
- Constant connection to the device is required

**On-Device Inference**

*Pros*

- Improved Speed
- Performance
- Network connectivty
- No to-and-fro communication needed

*Cons*

- Less Capacity
- Tight resource constraints

There are different frameworks which provide faster and scalable model deployment.
![img](./pics/model-deployment.png)

### Quantization

Quantization is process of reducing the required Computational Resource of a model by changing the weights of the models from floating points to integers. Usually this will result in faster computation speed, but reduced overall accuracy.

There are two different methods to quantize a model, *post-quantization* and *quantization aware training*

**Resources**
- [Quantization](https://arxiv.org/abs/1712.05877)
- [Post-Training Quantization](https://medium.com/tensorflow/introducing-the-model-optimization-toolkit-for-tensorflow-254aca1ba0a3)
- [Quantization Aware Training](https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html)

### Pruning

Pruning reduces the overall parameters and operations involving in network. 



**Resources**

- [pruning](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf)
- [Lottery Ticket Hypothesis](https://arxiv.org/abs/1803.03635)


### Distributed Training

There are two different types of distributed training, *Data Parallelism* and *Model Parallelism*.

**Data Parallelism**
Data Parallelism models are replicated into different accelerators and are given subset of a data for training.

**Model Parallelism**
Model Parallelism is used when a model is too large to be fit into a single device, the model is divided into partitions and fed into different accelerators.



**Resources**

### High Performance Modeling

**High Performance Ingestion**

Accelerators are expensive and can be costly, which is why it is important to *High Performance Ingestion* for *High Performance Modeling*.

One typical ETL pipeline that demonstrates a very inefficient pipeline is as following:

<img src='./pics/typical_pipeline.png' width="700" height="340">

It is inefficient because the pipeline is not efficiently using the *Computing Resource*(CPU, GPU) because there are idle times for CPU and GPU when training a model.

The following demonstrates a more efficient pipeline that utilizes *Computing Resources* more efficiently. Though there are minute idle times between each training, it is significantly smaller compared to the typical pipeline demonstrated above. 

The following image shows parallelization of operations, by overlapping different parts of ETL using a technique known as **[Software Pipelineing](https://en.wikipedia.org/wiki/Software_pipelining)**

<img src='./pics/efficient_pipeline.png' width="700" height="340">

**How to optimize pipeline performances?**

- Prefetching(Begin loading data for the next step before current finishes)
- Parallelize data extraction and transformantion
- Caching
- Reduce Memory

One framework allows for more efficient and scalable pipeline, it is `tf.data`, which allows for a more efficient ETL(Extract, Transform, Load) pipeline. 

**Resources**
- [Distributed Training](https://www.tensorflow.org/guide/distributed_training)
- [Data parallelism](https://arxiv.org/abs/1806.03377)
- [Pipeline Parallelism](https://ai.googleblog.com/2019/03/introducing-gpipe-open-source-library.html)
- [GPipe](https://arxiv.org/abs/1811.06965)