# Differential (Predictive) Audio Coding

## Motivation

* Usually happens that the differences between consecutive (quantized or not) (Pulse-Code Modulation) PCM [[A.V. Oppenheim, 1999](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=oppenheim+discrete+time+signal+processing&btnG=&oq=Oppenheim)] samples

  $$
    e[n] = s[n] - s[n-1]
  $$

  tend to have a smaller entropy than the original ($s$) ones:

  $$
    H(e) \leq H(s)
  $$

  which potentially provides better lossless compression ratios for $e$ than for $s$.

* Finally, notice that by definition, if any "noise" (such as quantization noise) has not beed introduced during the encoding process, differential encoding is a fully reversible process:

  $$
    s[n] = e[n] + s[n-1].
  $$

In [None]:
# Compute $e$ signal for the JFK.wav $s$ signal and plot the probability density function (or the histogram) of $e$ (TO-DO)

## DPCM (Differential PCM) [Gibson, 1998]

* Pure differential encoding strategy:

  <img src="local/DPCM.png" width="400">

  \begin{equation}
    e[n] = s[n] - P(s,n)
  \end{equation}
  where for the simplest case:
  \begin{equation}
    P(s,n) = s[n-1]
  \end{equation}

* Lossless.
* Modest compression ratio (depending on the predictor).
* Used (for example), in the [G.726 standard](https://en.wikipedia.org/wiki/G.726).

## Lossy DPCM [] C.C. Cutler. Differential Quantization for Television Signals. U.S. Patent 2 605 361 , July 29, 1952.

<img src=\"local/QDPCM.png\" width=600>

## ADPCM (Adaptive DPCM)~\cite{Gibson98}

\begin{itemize}
\item An adaptive quantizer $Q$ remove ``the noise signal'':
  \begin{equation}
    o[n] = Q\big(i[n] - P(i,n)\big)
  \end{equation}
\item Notice that:
  \begin{equation}
    Q^{-1}(Q(x)) \approx x
  \end{equation}
  i.e. the codec is irreversible (lossy). It is not possible to recover
  $i[n]$ from $o[n]$.
\item Used, e.g., by the G.726 standard.
\end{itemize}


http://www.atc-labs.com/acepages/crcchapter.pdf



http://www.atc-labs.com/acepages/crcchapter.pdf



## Analysis-by-synthesis coding (CELP)

## Delta Modulation
* Employs a first-order predictor
  $$
  p(s,k) = ap(s,k-1)
  $$
  so, $P(z) = az^{-1}$, where $z^{-1}$ represents a unit delay.
* The quantizer has only 2 levels, and the quantizer step size $\Delta(k)$ is usually adaptive.
* Prediction error signal:
  $$
  e(k) = s(k) - ap(s,k-1)
  $$
* Quantizer output:
  $$
  e_q(k) = \Delta(k)\text{sgn}(e(k)) = \left\{
      \begin{array}{ll}
        +\Delta(k), & \mbox{$e(k) \geq 0$};\\
        -\Delta(k), & \mbox{$e(k) < 0$}.
      \end{array} \right.
  $$
* Reconstructed signal:
  $$
  \hat{s}(k) = a\hat{s}(k-1)+e_q(k)
  $$
* Quantization error (noise):
  $$
  n_q(k) = e_q(k) - e(k) = \hat{s}(k) - s(k)
  $$
* Typically, step size evolves according to the rule:
  $$
  \Delta(k) = \beta\Delta(k-1)+(1-\beta)\Delta_\text{min}+f(k)
  $$
  where $\Delta_\text{min}$ is the minimum quantizer step size, $0 < \beta < 1$ is a parameter to be selected, and
  $$
  f(k) = \left\{
      \begin{array}{ll}
        (1-\beta)(\Delta_\text{max} - \Delta_\text{min}), & \mbox{if $b(k)=b(k-1)=b(k-2)$}\\
        0 & \mbox{otherwise},
      \end{array} \right.
  $$
  where
  $$
  b(k) = \left\{
      \begin{array}{ll}
        +1, & \mbox{if $e_q(k) = +\Delta(k)$}\\
        -1, & \mbox{if $e_q(k) = -\Delta(k)$}.
      \end{array} \right.
  $$
  (transmitted bit sequence), and $\Delta_\text{max}\triangleq$ maximum step size.

\section{ADPCM (Adaptive DPCM)~\cite{Gibson98}}
%{{{

\begin{itemize}
\item An adaptive quantizer $Q$ remove ``the noise signal'':
  \begin{equation}
    o[n] = Q\big(i[n] - P(i,n)\big)
  \end{equation}
\item Notice that:
  \begin{equation}
    Q^{-1}(Q(x)) \approx x
  \end{equation}
  i.e. the codec is irreversible (lossy). It is not possible to recover
  $i[n]$ from $o[n]$.
\item Used, e.g., by the G.726 standard.
\end{itemize}

%}}}

\section{CELP (Code-Excited Linear Prediction)~\cite{CELP}}
%{{{

\begin{itemize}
\item Proposed by M.R. Schroeder and B.S. Atal in 1985.
\item Specifically designed for the human voice.
\item Used mainly in VoIP applications, Speex and MPEG-4.
\item Very low bit-rates (less than 4 Kbps).
\item Encoding algorithm:
  \begin{enumerate}
  \item Split the PCM signal in small PCM chunks.
  \item For each chunk:
    \begin{enumerate}
    \item Search a set of code-words (and their corresponding gains)
      that minimize the distortion in a decoder that use the code-word
      indexes (and the gains) to synthesize a voice signal by beans of a
      model of the human vocal tract.
    \end{enumerate}
    \item Encode the residuals using RLE and Golomb Coding.
\end{enumerate}
\end{itemize}

### Lossless/noiseless (source coding) and lossy

* Perceptual coders are based on a
psychoacoustic model and take advantage of the
masking properties of the human auditory system.

* Most audio compression algorithms typically segment
the input signal into blocks of 2ms up to 50ms
duration. A time-frequency analysis then decomposes
each analysis block in the encoder. This
transformation or subband filtering scheme compacts
the energy into a few transform coefficients and
therefore de-correlates successive samples. These
coefficients, subband samples or parameters are
quantized and encoded according to perceptual
criteria. Depending on the system objectives, the
time-frequency analysis section might contain:
  + Unitary transform (MDCT, FFT)
  + Polyphase filterbank with uniform bandpass filters
  + Time-varying, critically sampled bank of nonuniform bandpass filters
  + Hybrid transform/filterbank scheme
  + Harmonic/sinusoidal signal analyzer
  + Source system analysis (LPC)
 
* The time-frequency analysis approach always
involves a fundamental tradeoff between time and
frequency resolution requirements. The choice of the
time-frequency analysis method additionally
determines the amount of coding delay introduced, a 
parameter which may become important in duplex
broadcast and live-events applications. 

* Lossless audio compression mostly is realized by
using a combination of linear prediction or a
transformation followed by entropy coding. The
linear predictor attempts to minimize the variance of
the difference signal between the predicted sample
value and its actual value. The entropy coder
(Huffmann, LZW) allocates short codewords to
samples with high probability of occurance and
longer codewords to samples with lower probability
and in this way reduces the average bit consumption. 

* In contrast to lossless audio coding which is based on removing redundancy, lossy coding techniques make use of removing redundancy and irrelevancy based on perceptual criteria (http://staff.fh-hagenberg.at/schaffer/avt2/Docs/Perceptual_Audio_Coding_Tutorial.pdf). In contrast to a lossless coding system, a lossy
compression schemes not only exploits the statistical
redundancies but also the perceptual irrelevancies of
the signal, as they result from the properties of the
human auditory system. 

## Text compression = Entropy coding
In the context of signal encoding, entropy coding refeers to assign short codewords to hightly probable (quantization levels) levels and longer code-words to less probable levels, yilding a short average code-word length.