## Statistical Models
In statistics we work with a very specific type of model: likelihood models. Likelihood models take in a particular dataset (and possibly other parameters, like the mean and sd of height in the overall population, or the bias on a coin) and tell us the probability of seeing that particular dataset. For our purposses we don't really care how that probability is calculated (e.g. it may be found via simulation), we just demand a function from (dataset,parameter settings)->probability of the dataset, assuming those parameter settings are true.

We're going to be vauge about what constitutes a dataset, because that varies with context. But the key point is that the model tells you the probability of seeing your dataset, or any other dataset that you might have gotten as a result of randomness in the data collection process. Example datasets include: the result of N coin flips, one per row. The height, weight and age of N individuals, one individual per row.

Very, very often, likelihood models don't give a straight answer: the probability of seeing a particular dataset depends on the settings of the parameters mentioned above.

### As a Table
Schematically, you can picture a liklihood model sort of like a spreadsheet: plug in a dataset and the values of the parameters, get back the probability of the dataset if those parameters are correct.

| |prameter setting 1| parameter setting 2|...|
|-|-|-|-|
|**dataset 1**|$P($dataset1$\vert$parameter setting1$)$|$P($dataset1$\vert$parameter setting2$)$|...|Row Sum=?
|**dataset 2**|$P($dataset2$\vert$parameter setting1$)$|$P($dataset2$\vert$parameter setting2$)$|...|Row Sum=?
|**dataset 3**|$P($dataset3$\vert$parameter setting1$)$|$P($dataset3$\vert$parameter setting2$)$|...|Row Sum=?
|...|...|...|...|...
| |Column Sum=1|Column Sum=1|...

To be clear, any given dataset above is made up of multiple observations, and thus each dataset, if written out, has multiple rows. BUT each possble dataset is condensed and just takes one row in this table. We look at the probability of the dataset as a whole; the model might not specify the probability of individual rows within a dataset.

Note that there may be an infinite number of possible datasets or parameter settings, and even when the list of possibilities is finite they are often monumentally long. This format is great for intuition and is the most general form, but may be impossible to calculate from without additional knowledge or tricks.

### As math
Statistical models are often written down via particular probability distributions. For instance, if the dataset of interest is the result of a series of N coin flips, we could save a LOT of space by writing the liklihood model as
$$P(Flip1=H,Flip2=T,...,FlipN=T|\,\theta)=  \binom {N} {nHeads,nTails}\theta^{nHeads}(1-\theta)^{nTails}$$
where $nHeads,nTails$ and $N$ are the number of heads observed, the number of tails observed, and the total number of flips, and $\theta$ is the probability of a heads on each flip. The expresion in parentheses is a multinomial coeffecient; Google to find the interpretation and how to calculate it.

$\theta$ and $N$ are the parameters of this model. $N$ is a known parameter since it's obvious what N is from the dataset, and $/theta$ is an unknown parameter representing the bias of the coin in question.

Using the equation above, we can plug in a particular set of flip results and find the probability of those results under various parameter settings.

#### Sidebar: assumptions
This model happens to only care about the total number of heads and total number of tails (which of course total to N). A different model might care about the particular order the flips landed in, e.g. becuase the modeler thinks a flip depends on the outcome of the preceeding flips. Likewise, this model happens to assume that each flip has the same chance of landing heads; an alternative model may decide that the even-numbered flips have one probability of landing heads and the odd-numbered flips have a different probability of landing heads.

We will discuss selecting *among multiple models* later on. Even then, it's mostly up to the modeler and thier audience to decide what assumptions are justifed.

## Model Fitting
**Fitting** a model refers to making a semi-resonable guess about which parameter setting is the best one. Should we model the world as by setting the coins bias to .5, or is .7 more appropriate? The key questions in any parameter-setting strategy are 1) what does it mean to be "best", 2) how difficult is it to compute which setting is best, 3) and why should we like this guessing strategy over any other.

Model fitting is an enormous topic. Take a course in Statistical Inference if you want to get to details. We're going to discuss two particular fitting methods: the method of moments, and maximum likelihood estimation.