# Models

A very important concept to understand in machine learning is the concept of models. In order to teach machines how to make good decisions, whether it is to predict whether an e-mail is a spam or how to drive a car, we teach them to build a model of the world. This model is a representation of how the world, or a specific part of it, works. Once the program has an understanding of how the world works, it can make decisions accordingly. This is also how humans work. Throughout our lives, we build up an understanding of how the world works, and every decision that we make is based on that understanding.

## What defines a model?

It seems clear that a machine learning model for a spam filter will need to be different than a machine learning model for a self-driving car. When deciding to build a model, there are typically four things to consider.

### What state the model should learn

What information is stored in a model is often referred to as the model's state. This is typically what is being saved when you want to persist a model. Models learn their state by being exposed to experience, be it data or feedback. A state is similar to a summary of what the model knows about the world. Depending on the task, a model's state might need to be more or less complex. Some model's states might need be composed of a single value and sometimes even millions.

### How the model should learn its state

Once a model knows what its state should look like, it has to learn it. In theory, a model could try out all possible states, and pick the one that performs best. But in many cases, this is not feasible as the number of possible states is either too high, or sometimes even infinite. Most papers in the field are dedicated to finding smart ways to find the best state without needing to try them all.

### How the model should use its state

As we mentioned above, a model learns a state which will then allow it to make decisions. Models make decisions in wildly different ways, some can use decision-trees, some use analogies based on previous observations, some use Bayesian methods, etc. Based on the application, some models might be faster at making predictions, some  might be better at explaining how they made their decision, or some may need less training data to generalize.

### What kind of data is available to train the model

All three aspects that we discussed above are highly dependent on the amount and the quality of the data available to train the model. Models need to be trained with good information and enough of it to be performing well at their task. If we don't have access to a lot of data, we might want to pick a simple model which is better at generalizing, whereas if we have tons of data, we might prefer a more complex model.

## What makes a model good or bad?

On top of performance measures such as accuracy or speed, there exist some core beliefs and concepts in machine learning which help define what properties a good model should posess.

### No Free Lunch Theorem

The No Free Lunch theorem states that all models, when their performance is averaged over all tasks, will be equivalent. This suggests that there isn't a "master model" which will always perform well on all tasks. This suggests that one must find a healthy balance between how well a model can perform on a specific task, and how well it could perform on more general ones.

### Occam's Razor

Occam's Razor, also called the law of parsimony, is a core principle or belief in machine learning. It suggests that a simpler model is a better model. So we should always strive to create models that are as simple as possible. This is based on the assumption that the more moving parts there are within a model, the more likely some are to be wrong.

### Bias-variance dilemna

A model needs to be general enough to be flexible and robust to noise, but it also needs to be precise and specific enough, to be accurate. This is a dilemna, as improving one will hinder the other, and therefore, one needs to find the right spot. The bias-variance dilemna is a key challenge in machine learning.