# ML System Design Overview

## Key steps in ML system design

Types of questions that are asked in ML system design interviews are:
- Build a system that shows relevant ads for search engines.
- Recommend movies to a user on Netflix.
- Extract all persons, locations, and organizations from a given corpus of documents.

Key points of discsussion in ML system design are:

1. Problem Setting
2. Understanding the scale and latency requirements
    1. Latency
    2. Scale
3. Defining metrics
    1. Offline metrics
    2. Online metrics
4. Architecture discussion
    1. Architecting for scale
5. Offline model building and evaluation
6. Online model execution and evaluation
7. Iterative model improvements


## Problem Setting

Usually the problem statement by the interviewer might be broad and it is important to ask clarifying questions to understand the problem better. Ask clarifying questions till you feel that all aspects of the problem are clear to you and convey your understanding to the interviewer so that both of you are on the same page.

Some questions that you might ask are:

- What is the objective of the system?
- Does it work like XXX ? (where XXX is a similar system that you know of)
- How can one obtain such inputs?
- How will the output be consumed?

Remember that post this step, you should have a fair idea of the problem statement and the objective of the system.

## Understanding the scale and latency requirements

This can be useful when identifying where and how caching might need to be implemented.

### Latency
Understanding latency requirements will help in indentifying the correct ML solution / model to use. 

### Scale
Scale will help us understand how many requests we can expect for the system. 

## Defining metrics
see [here](./ml-metrics.ipynb)

### Offline metrics

These metrics are used when the model is being built and evaluated. In supervised setting these metrics would be calculated on the validation set, where we will have the ground truth. Some example metrics are:
- Accuracy
- Precision
- Recall
- F1 score
- AUC-ROC

### Online metrics

These metrics are used when the model is being run in the production environment. These are basically used to monitor the performance of the model in real time. Mainly to ensure that the model is performing as expected. Some example metrics are:
- Click through rate
- Conversion rate
- Bound rate

## Architecture discussion

Discussion on how the system will be built given the requirements and scale. Here we will talk the various components of the system and what aspects of the requirements or performance will each of the component address. 

### Architecting for scale

To handle large scale, we might need to have a funneled approach in our ML system. With each layer filtering out irreleavant samples/examples and increasing in complexity. In that way, the most complex model (which will ususally be the most computationally expensive) will only be run on a small subset of the data. Another way to handle scale for a different use case might be to perform batch predictions. This would ensure that out resources are used optimally and we are able to handle multiple requests at the same time. When making batch predictions, we need to consider what is an acceptable linger time for a request, as this will enable us to collect multiple requests and make predictions in a batch. 

## Offline model building and evaluation

This is the step where we actually build the most optimal model. For model building the first thing required is training data. 

### Training data

1. **Human annotated data** : We can have crowd sourced data or data annotated by experts. If the task is generic, we can also utilize pre-existing datasets.
2. **User Interaction data** : User interaction data could be used in creating personalisation models, where we can use user interactions to understand preferences and recommend items accordingly. 
3. **Synthetic data** : Synthetic data can be used when we do not have enough data. This can be generated using techniques like data augmentation. Sometimes we can also use an existing pretriained more complex model to generate this synthetic data, and use that data to train a smaller more robust model.

### Feature Engineering

Feature engineering is the process of transforming raw data into features that can be used in the model. This is a crucial step as the model's performance is highly dependent on the features used. Some common feature engineering techniques are:
- One hot encoding
- Normalization
- Standardization
- Binning
- Missing value imputation

### Model building

Based on offline evaluation metrics, we can train multiple models and compare / contrast their performance. Here the main choice of would be the model architecture. Usually, we would use an random forest or a GBM model for tabular data, and a CNN or Transformer for image and text data. Random forests are good for tabular data because they are robust to outliers and can handle missing data well. GBM models are good for tabular data because they are able to capture non-linear relationships in the data. CNNs are good for image data because they are able to capture spatial relationships in the data. Transformers are good for text data because they are able to capture the sequential relationships in the data.

## Online model execution and evaluation

After the model is trained and deployed, we need to monitor its performance using online metrics. It is also a good idea to store model predictions as it helps create an organic training data. This data can be used to retrain the model and improve its performance. If possible, we can also incorporate feedback from the users to improve the model.