# Audio Coding

```
PCM   +---------+        +---------+ PCM
----->| Encoder |------->| Decoder |----->
audio +---------+ stream +---------+ audio'

              audio != audio'
                (usually)
```

## Typical encoder steps

1. **Overlaped subband analysis** (usually with [MDCT](http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform} (Modified
   Discrete Cosine Transform)). Goes from the temporal to a frequency
   domain.
  
2. **Quantization**. Basically, removes pure signals of low amplitude
   but taking also into account the SAM (pSycho Acoustic Model) of the
   HAS (Human Auditory System). Noise use to be of low power!
   
3. [**Entropy coding**](https://github.com/vicente-gonzalez-ruiz/teaching/blob/master/coding/text/text_coding.ipynb).

## Lossy coding

* The limitations of human perception are incorporated into the compression process through the use of psychoacoustic models. Some of these limitations are physiological, based on the machinery of hearing. Others are psychological, based on how our brain processes auditory stimuli.


### Overlaped processing

```
0              N-1            2N-1            3N-1
+---------------+---------------+---------------+ s[n]
<--------Transform Step--------->
                <---------Transform Step-------->
```

* Each transform step inputs $2N$ samples and outputs $N$ MDCT
  coeficients.
  
* $N$ can vary depending on the characteristics of the sound. For
  \emph{complex} sounds without clear armonics (such as a plosive sound),
  shortened windows improve the performance. For \emph{simple} sounds
  (such as a music instrument), large windows are better.
  

### Frequency resolution and simultaneous masking

* The HAS has a limited frequency resolution. Psychoacoustic
  experiments have demonstrated that the audible frequencies can be
  grouped into \href{../../../Perception/Auditive_perception/index.html#x1-50004}{barks}.

* Each bark defines the group of frequencies that excite the same
  cochlear area, i.e., those frequencies that can be masked by the
  tone with the highest energy (in that bark).

* In the cochlea, a frequency-to-place transformation
takes place which leads to the notion of critical bands
Critical bandwidth can be considered as the
bandwidth at which subjective responses change
abruptly. For example, the perceived loudness of a
narrow band noise at constant sound pressure level
remains constant within a critical band and then,
begins to increase, once the bandwidth of the stimulus
is increased beyond one critical band [5] (Zwicker E., Fastl H, “Psychoacoustics, Facts
and Models” Springer Verlag, 1990 ).

Bark Scale | Lower edge | Upper edge | Bandwidth | Center Frequency

* 

### Temporal masking

* The human auditory system has inertia:
  \href{../../../Perception/Auditive_perception/index.html#x1-70006}{sounds
    are not instantly perceived and remains after they are disapered}.

### Channel coupling

* Most of the time, similar sounds are transported in the channels
  of a non-mono audio signal. Channel coupling decreases inter-channel
  redundancy, usually, using prediction techniques.

### Quantization

* Depending on the desired output bit-rate and the frequency (see
  the ATH model), the SAM applies a different quantization step to
  barks (see Section~\ref{sec:ATH}). Roughly speaking, the higher the
  compression ratio, the larger the quantization step and therefore,
  the quantization noise; and the higher the frequency, the wider the
  bark. Notice also that the perception of a tone in a bark depends
  also on the temporal masking.

* At decoding time, those barks that suffered the biggest lossess
  are usually filled with [white noise](http://en.wikipedia.org/wiki/White_noise) in
  order to [ncrease the perceived quality](http://simplynoise.com).

### Entropy Coding

* Usually, a variable bit-rate (VBR) lossless encoding algorithm
  asigns code-words of less bits to those code-vectors (one or more
  quantized MDCT coefficients) with a high probability, and viceversa,
  producing an effective reduction of the bit-rate.

### MDCT

* Equivalent to apply a [bank of $N$ filters](http://en.wikipedia.org/wiki/Filter_bank).

* Determines the correlation between a set of $2N$ numbers
  (samples) and $N$
  [orthogonal](http://en.wikipedia.org/wiki/Orthogonality) (two
  functions/signals are orthogonal if it is impossible to obtain one
  of them by means of the other.) [cosine functions](http://guru.multimedia.cx/mdct/). 
  Therefore, at the input of the DCT there are $2N$ samples and at the output,
  $N$ coefficients.
  
* MDCT coefficients $S[w]$ of the PCM samples $s[n]$ are
  defined as:
  \begin{equation}
    S[w] = \sum_{n=0}^{2N-1}s[n]cos\Big[\frac{\pi}{N}(n+\frac{1}{2}+\frac{N}{2})(w+\frac{1}{2})\Big].
    \label{eq:MDCT}
  \end{equation}

## Contents

1. [Delta Modulation](delta_modulation/DM.ipynb).
1. [The sound](The_Sound/the_sound.ipynb).
2. [The human auditory system](The_Human_Auditory_System/the_human_auditory_system.ipynb).
3. [Human sound perception](Human_Sound_Perception/human_sound_perception.ipynb).
6. [FLAC](FLAC/FLAC.ipynb).
8. [MPEG Audio Layer 3 (MP3)](MP3/MP3.ipynb).
9. [MPEG Advanced Audio Coding (AAC)](AAC/AAC.ipynb).
10. [Vorbis](Vorbis/Vorbis.ipynb).
11. [Dolby Digital (AC3)](Dolby_Digital_AC3/AC3.ipynb).
12. [DTS](DTS/DTS.ipynb).