## The Method of Moments
The Method of Moments is a way of fitting a model to a dataset. That is, it's a mechanism for selecting a values for the model parameters out of all possible parameter values. Like all good fitting strategies, it suggests different parameter values for different datasets. MOM is can be particularly computationally effecient, and is often used as a first guess for other iterative fitting strategies.

The MOM works by reasoning that each row of the dataset is an independent draw from the likelihood function, and thus with many observations the mean, variance, and kurtosis (aka moments) in the collected data should be close to the mean, variance, and kurtosis of the actual distribution. MOM then works out equations relating the parameters we'd like to find to the theoretical moments and inverts the system of equations, producing drop-in estimators for the parameters: Plug in the observed moments get parameter estimates.

The method of moments finds values for the parameters in the following way:
1) Find equations for the mean, variance, and so on of the liklihood function. These equations will involve the parameters we're trying to estimate
2) Solve the above equations for the parameters. These expressions will depend on the likelihood function's true mean, varaiance and so on.
3) Since the mean and variance of the data are probably/hopefully pretty close to the mean and variance of the actual likelihood distribution, plug those in to get the parameter estimates.


#### Warning
The MOM only works on datasets where each row can be viewed as an independent draw from some (possibly multidimensional) probability distribution. This occurs, e.g., for data that are a random sample of a population, and fails for data where knowing the values in row 1 in addition to the parameters affects the probability of particular values appearing in row 2, e.g. time series data. [In terms of the likelihood function, these conditions specify that the overal probability of the dataset can be found by multiplying the probabilty of each row, rather than via the general-case chain of conditional probabilites]

### Example: Uniform distribution on (a,b)
Suppose that we have N rows of data with one value per row. Further, suppose that we're fitting a model where
$$P(X=x\,|\,a,b)=Uniform(a,b)=\frac{1}{b-a}$$

First, what are the first two moments of a uniform distribution? [We choose two moments because there are two unknown parameters: $a$ and $b$]. Via either integration, Wolfram, or wikipedia the first two moments are:

$$\mu=\frac{a+b}{2} \ \ \ \ \ \ \ \  \sigma^2=\frac{(b-a)^2}{12}$$

Solving these,
$$b=2\mu-a \ \ \ \ \ \ \ \ \sigma^2=\frac{(2\mu-2a)^2}{12}$$
$$b=2\mu-a \ \ \ \ \ \ \ \ a=\mu\pm\sqrt{3\sigma^2}$$

So if the sample mean and variance in our data are 2 and 3, respectively, we'd get:

In [11]:
import math
mu=2
sigma2=3

rad=math.sqrt(3*sigma2)
a1=mu+rad
a2=mu-rad
b=2*m1-a1
print("a:{},{} b:{}".format(a1,a2,b))

a:5.0,-1.0 b:-1.0


However, the $a=-1$ solution is degenerate, and so uniform (-1,5) is MOM estiamte for this model given these data.