# Chapter 1: Overview of Machine Learning Systems

- Components of an ML System

![ml components](./artifacts/1_image.png)

## When to use ML?

- Machine learning is an approach to (1) learn (2) complex patterns from (3) existing data and use these patterns to make (4) predictions on (5) unseen data.

- All of these components 1-5 must be available
    1. Your system must be able to learn (guaranteed by open source, pretty much)
    2. Patterns must be complex, or you'll be better served by rules
    3. There must be existing data, or you can't learn
    4. The problem needs prediction. No point if there is no need for the knowledge ahead of time
    5. Unseen data should also share the previous pattern

- ML **systems** are useful when
    - Problems are repetitive
    - The cost of wrong predictions is not too high (because ML systems make mistakes)
    - You need to solve this problem at scale
    - The patterns are constantly changing (so to train/retrain humans is not practical)

- ML should be avoided when
    - Ethics are an issue
    - You can use simpler solutions
    - Cost is an issue

## Understanding ML Systems

| | Research | Production | 
| - | - | - |
| Requirements | SOTA performance based on benchmark datasets | Stakeholder dependent | 
| Computational priority | Fast training, high throughput | Fast inference, low latency | 
| Data | Static | Constant shifts | 
| Fairness | Not impt | Impt | 
| Interpretability | Not impt | Impt | 

### Balancing stakeholder requirements

- Conflicting Objectives:
    - ML Engineers: Recommend restaurants that users are more likely to order from
    - Sales: Recommend more expensive restaurants for more fees
    - Product: Latency leads to dropped orders, so return in less than 100ms
    - ML Platform: As traffic grows, existing system does not scale, so they want to shelve model updates to prioritise improving platform
    - Manager: Wants to maximise margin, possibly by cutting the ML team

### Compute priorities

- Model development needs to be weighted against deployment and maintenance
- In production, priority is on fast inference (low latency), but in research, priority is on fast training
- In production, you need to decide between batch vs real-time inference
    - Batching requests can increase latency, BUT can also increase average throughput if you process your requests in large enough batches

### Data

- In production, data is noisy, messy, unstructured, and biased
- Labels may be sparse, imbalanced, and incorrect
- Privacy and regulatory concerns are also an issue
- Most of your time in production should be spent on data

### Fairness

- In development, models are not used in the research phase on people
- However, if you don't worry about it during development, you will have a problem in deployment
    - i.e. biased models for credit scoring against minorities

### Intepretability

- You need to interpret the outputs of models in production, more than in research (see `Stakeholders`)

## ML vs Software

- ML is software, so you need to know the best practises of software engineering
- BUT software engineering practises alone is not enough, because ML is the interaction between both code and data
- ML artifacts also have huge sizes, so getting them into production on devices is a huge engineering challenge
- Monitoring and debugging is also not trivial