## Adaptive filters

### [Interpolation based on stationary and adaptive AR(1) modeling](http://publications.lib.chalmers.se/records/fulltext/146866/local_146866.pdf)

In this paper, we describe a minimal mean square error (MMSE)
optimal interpolation filter for discrete random signals. We explicitly
derive the interpolation filter for a first-order autoregressive process
(AR(1)), and show that the filter depends only on the two adjacent
points. The result is extended by developing an algorithm called
local AR approximation (LARA), where a random signal is locally
estimated as an AR(1) process. Experimental evaluation illustrates
that LARA interpolation yields a lower mean square error than other
common interpolation techniques, including linear, spline and local
polynomial approximation (LPA).

For a discrete random signal with known spectrum, we derive
the optimal interpolation filter in a minimum mean square error
(MMSE) sense.   We derive an
explicit form of this filter for a stationary first-order autoregressive
process (AR(1)).  The resulting filter is then extended to a general
interpolation algorithm, that can be used on a larger set of signals. This is done by approximating the signal locally as an AR(1) process, where the AR(1) parameter is estimated from the data without
prior information about the original signal. We name this algorithm
local AR approximation (LARA).

To evaluate the performance of LARA interpolation, we compare it with other interpolation techniques: linear, spline and local
polynomial approximation (LPA)  interpolation.

Let us therefore introduce the filters $H_k, (k = 0, .., L − 1)$,
where $H_k$ reconstructs samples of the form
$$x_{dk}[n] = x[nL + k]$$
For each of these interpolation filters, the optimal filter, in the MMSE sense, is a Wiener filter:
$$ H_k(e^{jΩ_d} ) = \frac{P_{x_dx_{dk}} (e^{jΩd} )}{P_{x_dx_d} (e^{jΩd} ) }$$
where $Ω_d$ is the angular frequency in radians/sample, $P_{x_dx_d}$ is the
spectrum of $x_d$ and $P_{x_dx_{dk}}$ is the cross spectrum of $x_d$ and $x_{dk}$. 

#### Application to a first-order autoregressive process

Assume that the process $x[n]$ is a stationary AR(1) process,
$$x[n] = ax[n − 1] + v[n]$$

Since the AR(1) process is Markovian, it is reasonable that the
optimal interpolation filter only needs to consider information from
the two adjacent points. While downsampled AR(1) processes are
still AR(1), this is unfortunately not the case for higher-order AR
processes, thereby limiting the generalisation of Proposition about MMSE optimal interpolation filter.

#### Comparison to linear interpolation

The filter produces interpolation points by a
weighted mean of the previous and the following sample point.
Note the close relationship between the AR(1) filter and ordinary
linear interpolation, defined as
$$x[Ln + k] = \frac{(L − k)}{L}  x_d[n] + \frac{k}{L}x_d[n + 1]$$
and for the AR(1) interpolation filter.

Fig.1 shows that the coefficients of the AR(1) filter are damped versions of the linear interpolation coefficients. The AR(1) filter considers the stochastic nature of the signal and introduces a bias towards the mean, which in this case is zero. The larger $a$ is (i.e., less
stochasticity), the more our filter resembles linear interpolation.

Local AR approximation (LARA) can be seen
as a stochastic version of the LPA method, the latter being better
suited for deterministic signals.

## [Wiener filter](https://en.wikipedia.org/wiki/Wiener_filter)

In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process.

The goal of the Wiener filter is to compute a statistical estimate of an unknown signal using a related signal as an input and filtering that known signal to produce the estimate as an output. For example, the known signal might consist of an unknown signal of interest that has been corrupted by additive noise. The Wiener filter can be used to filter out the noise from the corrupted signal to provide an estimate of the underlying signal of interest. The Wiener filter is based on a statistical approach, and a more statistical account of the theory is given in the minimum mean square error (MMSE) estimator article.

Wiener filters are characterized by the following:

- Assumption: signal and (additive) noise are stationary linear stochastic processes with known spectral characteristics or known autocorrelation and cross-correlation
- Requirement: the filter must be physically realizable/causal (this requirement can be dropped, resulting in a non-causal solution)
- Performance criterion: minimum mean-square error (MMSE)

This filter is frequently used in the process of deconvolution; for this application, see Wiener deconvolution:

Given a system:
 $$y(t)=(h* x)(t)+n(t)$$
where $x(t)$  is some original signal (unknown) at time  $t$, 
$ h(t)$  is the known impulse response of a linear time-invariant system, $n(t)$  is some unknown additive noise, $y(t)$ is our observed signal

Our goal is to find some  $g(t)$ so that we can estimate  $x(t)$ as follows:
$$ \hat{x}(t)=(g∗y)(t)$$ 
where  ${\hat {x}}(t)$ is an estimate of  $ x(t)$  that minimizes the mean square error.

The Wiener deconvolution filter provides such a  $g(t)$. The filter is most easily described in the frequency domain:

$$G(f)={\frac {H^{*}(f)S(f)}{|H(f)|^{2}S(f)+N(f)}}$$
where $G(f)$ and  $H(f)$ are the Fourier transforms of  $g$  and  $h$, respectively at frequency  $f$; $ S(f)$ is the mean power spectral density of the original signal  $x(t)$;  $N(f)$ is the mean power spectral density of the noise  $n(t)$ 

The filtering operation may either be carried out in the time-domain, as above, or in the frequency domain:

 $${\hat {X}}(f)=G(f)Y(f)$$

where ${\displaystyle {\hat {X}}(f)}$ and ${\displaystyle Y(f)}$ are the Fourier transforms of ${\hat {x}}(t)$ and $y(t)$; then  inverse Fourier transform is performed on   ${\hat {X}}(f)$ to obtain  ${\hat {x}}(t)$.







## [Kalman filter](https://jyx.jyu.fi/bitstream/handle/123456789/49043/ThesisJouniHelske.pdf?sequence=1)

The linear Gaussian state space model can be written as
$$y_t = Z_tα_t + e_t, \ (observation  \ equation) \\ 
α_{t+1} = T_tα_t + R_tη_t \  (state \  equation) $$
where $e_t \sim N(0, H_t), η_t \sim N(0, Q_t),  α_1 \sim N(a_1, P_1)$ independently of each other. Here
the vector $y_t$ contains the observations at time $t$, whereas $α_t$
is the vector of the latent state
process at time $t$. The system matrices $Z_t, T_t, R_t$, together with the covariance matrices $H_t$ and $Q_t$ depend on the particular model definition, and often some of these matrices contain
unknown (hyper)parameters $ψ$ which need to be estimated. If a particular matrix such as $Z_t$ does not depend on $t$, it is said to be time-invariant. 

The main algorithms for the inference of Gaussian state space models are the Kalman filtering
and smoothing recursions. From the Kalman filtering algorithm we obtain the one-step-ahead predictions and the prediction errors
$$ a_{t+1} = E(α_{t+1}|y_t, ... , y_1)\\ v_t = y_t − E(y_t|y_{t−1}, ... , y_1)$$
and their covariance matrices:
$$P_{t+1} = Var(α_{t+1}|y_t, ... , y_1) \\ F_t = Var(v_t)$$

$a_{t+1}$ is also the minimum variance linear posterior mean
estimate. Therefore, given the hyperparameters $ψ$, the resulting predictive distributions are Bayesian posterior distributions given the prior distribution $N(a_1, P_1)$.

State space models can also be extended to non-Gaussian cases. An important special case
are exponential family state space models, where the state equation retains its linear Gaussian
form, but the observation equation has the general form
p(yt
|θt) = p(yt
|Ztαt),
where θt = Ztαt
is the signal and p(yt
|θt) is the observational density. The signal θt
is the linear
predictor which is connected to the expected value E[yt
] = µt via a link function l(µt) = θt
. In
the R package KFAS presented in Article IV, possible choices for observational distributions
are Gaussian, Poisson, binomial, negative binomial and gamma distributions. Note that it is
possible to define a multivariate model where each series has different distribution

Final estimates ˆθt correspond to the posterior mode of p(θ|y). In the Gaussian case the
mode is also the mean. In the other cases supported by KFAS the difference between the mode
and the mean is often negligible, and even the conditional variance estimate obtained from the
Kalman smoother using the approximating model provides a relatively good approximation
of the true conditional variance. This method is closely related to the iterative reweighted
least squares (IRLS) method used in a generalized linear model framework (McCullagh and
Nelder, 1989). Consequently, we can write a generalized linear model in a state space form,
and the Kalman filter algorithm for the corresponding approximating Gaussian model gives
results which are identical to the IRLS based analysis.