# Chapter 3: Fundamentals of Estimation Theory

### Definitions and Properties

#### Problem Formulation

Estimation problem: estimate a parameter vector $\vec{\boldsymbol{\alpha}}$ from an observation vector $\vec{\mathbf{y}}$.


Definitions:

- **Parameters**: vector of random or deterministic values that need to be estimated.
- **Measurements or observations**: vector of random data $\vec{\mathbf{y}}$ that depends on the parameters $\vec{\boldsymbol{\alpha}}$ ($p(\vec{y}|\vec{\alpha})$ or $p(\vec{y};\vec{\alpha})$).
- **Estimator**: signal processing algorithm $\vec{\hat{\boldsymbol{\alpha}}}(\vec{\mathbf{y}})$ used to calculate an estimate of $\vec{\boldsymbol{\alpha}}$ from the observation vector $\vec{\mathbf{y}}$.
- **Estimate**: the estimate $\vec{\hat{\boldsymbol{\alpha}}}$ of $\vec{\boldsymbol{\alpha}}$ is the result of the estimator $\vec{\hat{\boldsymbol{\alpha}}}(\vec{\mathbf{y}})$ for a realization of $\vec{\mathbf{y}}$.

### Formulation of the Single Parameter Estimation Problem

In this chapter, we consider the estimation of a single parameter (multi-parameter estimation will be considered later).

Definitions:

- **Parameter to estimate**: $\boldsymbol{\alpha}$
- **Observation vector**: $\vec{\mathbf{y}}$
- **Estimator**: $\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})$
- **Estimate**: $\hat{\boldsymbol{\alpha}}$

### Unbiased Estimator

- An estimator $\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})$ for a *deterministic* parameter $\alpha$ is unbiased if:
  $$
  E[\hat{\boldsymbol{\alpha}}]=\alpha, \forall \alpha
  $$

- An estimator $\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})$ for a *random* parameter $\boldsymbol{\alpha}$ is conditionally unbiased if:
  $$
  E[\hat{\boldsymbol{\alpha}}|\boldsymbol{\alpha}=\alpha]=\alpha, \forall \alpha
  $$
  and unbiased if:
  $$
  E[\hat{\boldsymbol{\alpha}}]=E[\boldsymbol{\alpha}]
  $$

### Estimator with Sufficient Statistic

- **Definition**: The estimator $T(\vec{\mathbf{y}})$ is a **sufficient statistic** for $\boldsymbol{\alpha}$ if it contains all the information about $\boldsymbol{\alpha}$ (i.e., no other information can be extracted from $\vec{\mathbf{y}}$).
- The estimator $T(\vec{\mathbf{y}})$ is a sufficient statistic for $\boldsymbol{\alpha}$ if
  $$
  P(\vec{\mathbf{y}}|T(\vec{\mathbf{y}}),\boldsymbol{\alpha})=P(\vec{\mathbf{y}}|T(\vec{\mathbf{y}}))
  $$
- One method to demonstrate that $T(\vec{\mathbf{y}})$ is a sufficient statistic for $\boldsymbol{\alpha}$ is to verify if the following **factorization** is valid:
  $$
  P(\vec{\mathbf{y}}|\boldsymbol{\alpha})=g(T(\vec{\mathbf{y}}),\boldsymbol{\alpha})h(\vec{\mathbf{y}})
  $$

### Minimum Variance Estimaties

- Chebyshev's inequality tells us that for an unbiased estimator
  $$
  P(|\hat{\boldsymbol{\alpha}} - \alpha| > \epsilon) < \frac{V(\hat{\boldsymbol{\alpha}})}{\epsilon^2}
  $$
  where $V \{\hat{\boldsymbol{\alpha}}\} = E\{(\hat{\boldsymbol{\alpha}} - \alpha)^2\}$.

- We desire an estimator with minimal variance.
- The Cramér-Rao inequality provides the following *lower bound* for the variance of an estimator of $\boldsymbol{\alpha}$:
  $$
  V(\hat{\boldsymbol{\alpha}}) \geq \frac{1}{-E\left\{\frac{\partial^2}{\partial\boldsymbol{\alpha}^2}\ln P(\vec{\mathbf{y}}|{\boldsymbol{\alpha}}) \right\}}
  $$
- An estimate $\hat{\boldsymbol{\alpha}}$ achieving this bound is called the *most efficient* estimate.

### Consistent Estimator

- The estimator $\hat{\boldsymbol{\alpha}}$ is *consistent* if it converges to $\alpha$ with probability 1:
  $$
  \underset{m \rightarrow \infty}{\lim}
  p\{|\hat{\boldsymbol{\alpha}}(\vec{y}_m)-\alpha|\leq \epsilon\}=1,
  \quad \forall \epsilon>0
  $$
  or equivalently:
  $$
  \underset{m \rightarrow \infty}{\lim}
  p\{|\hat{\boldsymbol{\alpha}}(\vec{y}_m)-\alpha|\geq \epsilon\}=0,
  \quad \forall \epsilon>0
  $$
- Chebyshev's inequality tells us that if
  $$
  \underset{m \rightarrow \infty}{\lim} V\{\hat{\boldsymbol{\alpha}}(\vec{y}_m)\}=0
  $$
  then $\hat{\boldsymbol{\alpha}}$ is consistent.

### Bayes Estimator


- Similar to the detection problem, we define the **average cost** or **risk**:
  $$
  \mathcal{R}=E\{C(\boldsymbol{\alpha},\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}}))\}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}C(\alpha,\hat{\alpha}(\vec{y}))p(\alpha,\vec{y})d\alpha d\vec{y}
  $$
  where $C(\boldsymbol{\alpha},\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}}))$ is a **cost function** introduced by the estimation error $\mathbf{\alpha}_e=\alpha-\hat{\alpha}(\vec{y})$.

- The Bayes estimator is the estimator $\hat{\boldsymbol{\alpha}}_{\text{B}}(\vec{\mathbf{y}})$ that *minimizes* $\mathcal{R}$.

- It can be shown that the Bayes estimator minimizes the conditional cost and is therefore given by:
  $$
  \hat{\boldsymbol{\alpha}}_{\text{B}}(\vec{\mathbf{y}})=\arg\underset{\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})}{\min}E\{C(\boldsymbol{\alpha},\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}}))\} \\

  =\arg\underset{\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})}{\min}E\{C(\boldsymbol{\alpha},\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}}))|\vec{y}\} \\

  =\arg\underset{\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})}{\min}\int_{-\infty}^{\infty}C(\alpha,\hat{\alpha}(\vec{y}))p(\alpha|\vec{y})d\alpha
  $$


Typical examples of cost functions:

1. **Quadratic Error**:
   $$
   C_{\text{S}} (\alpha,\hat{\alpha}(\vec{y}))=\alpha_e^2=(\alpha-\hat{\alpha}(\vec{y}))^2
   $$
2. **Uniform Error** (or "hit or miss"):
   $$
   C_{\text{U}} (\alpha,\hat{\alpha}(\vec{y}))=\begin{cases}
   0, & |\alpha_e|<\Delta/2 \\ 
   1, & |\alpha_e|>\Delta/2 
   \end{cases}
   $$
3. **Absolute Error**:
   $$
   C_{\text{A}} (\alpha,\hat{\alpha}(\vec{y}))=|\alpha_e|
   $$



### MSE Estimator

- The Mean Squared Error (MSE) estimator is the Bayes estimator with the *quadratic* error cost function and is given by:
  $$
  \hat{\boldsymbol{\alpha}}_{\text{MSE}}(\vec{\mathbf{y}})=\arg\underset{\hat{\boldsymbol{\alpha}}(\vec{\mathbf{y}})}{\min}\int_{-\infty}^{\infty}(\alpha-\hat{\alpha}(\vec{y}))^2p(\alpha|\vec{y})d\alpha
  $$

- By differentiation, we obtain:
  $$
  \hat{\boldsymbol{\alpha}}_{\text{MSE}}(\vec{\mathbf{y}})=\int_{-\infty}^{\infty}\alpha p(\alpha|\vec{y})d\alpha=E\{\boldsymbol{\alpha}|\vec{y}\}
  $$

- Note: In the case where *no* observation is available, we obtain $\hat{\boldsymbol{\alpha}}_{\text{MSE}}=E\{\boldsymbol{\alpha}\}$.

### MAP Estimator

- The Maximum A Posteriori (MAP) estimator is obtained by maximizing the posterior probability (the a posteriori probability) of the parameter to be estimated:

$$
\begin{align}
a
\end{align}
$$

  $$
  \hat{\boldsymbol{\alpha}}_{\text{MAP}}(\vec{\mathbf{y}})=\arg\underset{\boldsymbol{\alpha}(\vec{\mathbf{y}})}{\max}\{p(\alpha|\vec{y})\} \\
 
  =\arg\underset{\boldsymbol{\alpha}(\vec{\mathbf{y}})}{\max}\{p(\vec{y}|\alpha)p(\alpha)\}
  $$

- It can be shown that the MAP estimator is actually a Bayes estimator with the *uniform* error cost function.

- In practice, if $p(\alpha|\vec{y})$ is differentiable, then $\hat{\boldsymbol{\alpha}}_{\text{MAP}}(\vec{\mathbf{y}})$ is the solution of
  $$
  \frac{\partial}{\partial\alpha}p(\alpha|\vec{y})= 0 \text{ or }
  \frac{\partial}{\partial\alpha}p(\vec{y}|\alpha)p(\alpha)=0
  $$
  or
  $$
  \frac{\partial}{\partial\alpha}\ln p(\alpha|\vec{y})=0  \text{ or } \frac{\partial}{\partial\alpha}\ln p(\vec{y}|\alpha)+\frac{\partial}{\partial\alpha}\ln p(\alpha)=0
  $$

### ML Estimator

- The Maximum Likelihood (ML) estimator is obtained by maximizing the likelihood function:
  $$
  \hat{\boldsymbol{\alpha}}_{\text{ML}}(\vec{\mathbf{y}})=\arg\underset{\boldsymbol{\alpha}(\vec{\mathbf{y}})}{\max}\{p(\vec{y}|\alpha)\}
  $$

- In practice, if $p(\vec{y}|\alpha)$ is differentiable, then $\hat{\boldsymbol{\alpha}}_{\text{ML}}(\vec{\mathbf{y}})$ is the solution of
  $$
  \frac{\partial}{\partial\alpha}p(\vec{y}|\alpha)=0
  $$
  or
  $$
  \frac{\partial}{\partial\alpha}\ln p(\vec{y}|\alpha)=0
  $$

- Note: The ML estimator corresponds to the MAP estimator when $\boldsymbol{\alpha}$ is very *dispersed* (i.e., uniform distribution).