# MPEG Audio

* Input: a sequence of 16-bit PCM samples.

* Output: a sequence of MPEG Audio frames (frame = header + code-stream) which can be streamed.

```
audio   +-----------+       +--------------+         +----------+
channel | Time      |       | Quantization |         |          | MPEG audio
---+--->| frequency |---+-->| and          |-------> | Framming |------->
   |    | mapping   |   |   | coding       |         |          |
   |    +-----------+   |   +--------------+         +----------+
   |                    |           ^
   |                    |           |
   |                    |   +--------------+
   |                    +-->| Phycho-      |
   |                        | acustic      |
   +----------------------->| model        |
                            +--------------+
```

* The MPEG audio bitstream definition is normative. Most guidance about encoding
is informative. Thus, two MPEG-compliant bitstreams that encode the same audio material at
the same rate but on different encoders may sound very different. On the other hand, a given
MPEG bitstream decoded on different decoders will result in essentially the same output.

## [Layer I](https://en.wikipedia.org/wiki/MPEG-1_Audio_Layer_I)

* 4:1 compression (384 kbps).
* CBR (Constant Bit-Rate).

### Encoder

1. Split $s[n]$ into blocks of $12\times 32=384$ samples. For each block:
   1. Analyze the block using a 32-band equally-spaced (analysis) filter bank, producing $12$ coeffs/subband (the coeffs are downsampled (subsampled, decimated) by factor of $32$). $342$ time-domain samples are transformed into $32$ subbands with $12$ coeffs. Notice that in a subband, each coeff can be considered as a sample of such subband.
   2. Scale each block of $12$ coeffs to ensure that the entire range of the selected quantizer will be used. Output the *scalefactor*.
   3. Using the [FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform), compute the ATH for the block (considering the masking effects).
   4. Let $R^*$ the bit-rate selected by the user. While the generated bit-rate $R\leq R^*$:
     1. Decrement the quantization step $\Delta_b$ for each subband $b$, proportionally to the ATH in $b$. Compute $R$. The bit-rate is controlled be switching between quantizers with different number of bits.
   5. Output $\{\Delta_b\}_{b=1}^{32}$ and the quantization indexes.

### Decoder

1. For each input frame:
   1. "Dequantize" the coeffs of each subband.
   2. Descale the coeffs to their original dynamic range.
   3. Apply the 32-band synthesis filters bank.

## Loss information analysis

1. Aliasing in the 32-band analysis filter bank.

   ```
   ^ Amplitude
   |     _______     _______     ____....__     _______
   |    /       \   /       \   /          \   /       \
   |   /         \ /         \ /            \ /         \
   |  / subband 0 X subband 1 X              X  sub. 31  \
   +-/-----------/-\---------/-\-----....---/-\-----------\-> frequency
   ```

2. Quantization.

## [Layer II](https://en.wikipedia.org/wiki/MPEG-1_Audio_Layer_II)

* Backward compatibile with MP1.
* 8:1 compression (174 kbps).
* CBR (Constant Bit-Rate).
* Increases block-size to $3\times 12\times 32=1152$ samples.

## [Layer III](https://en.wikipedia.org/wiki/MP3)

* ["Rescued" by Napster](https://www.xataka.com/historia-tecnologica/la-historia-del-mp3-el-formato-que-tras-casi-morir-dos-veces-revoluciono-el-mundo-de-la-musica).
* Backward compatibile with MP1 and MP2.
* CBR and VBR (Variable Bit-Rate). In this last case, users usually select the average bit-rate.
* Typically, virtually lossless at 128 kbps for most human beings.
* Improved subband analysis by means of the MDCT (using 32 subbands, the low-frequency ones contains more than un bark, which generates a poor frequency resolution in the ATH computation).

### [Encoder](http://home.deib.polimi.it/dossi/fond_tlc/mpeg1layer3.pdf)

1. Split $s[n]$ into blocks of $36\times 32=1152$ samples. For each block:
   1. Performs FFT of the block to compute the ATH and windows sequence.
   1. Analyze the block using a 32-band equally-spaced (analysis) filter bank, producing $36$ coeffs/subband.
   2. For each subband:
      1.  Analyze transients. If detected, use a sequece of start/short*3/stop windows. Otherwise, use a long window.
      <img src="data/windows.png" width=800>
      2. Compute [MDCT](https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform). This produces $36$ (long), $30$ (start/stop) or $12$ coeffs/subband (short). This step produces $18$ coeffs/subband (long), $15$ coeffs/subband (start/stop) and $6$ coeffs/subband (short).
      3. Apply scalefactors to optimize quantization.
   3. **Distortion control loop**: keep (as much as possible) the quantization error below the ATH.
      1. **Rate control loop**: Let $R^*$ the bit-rate selected by the user. While the generated bit-rate $R\leq R^*$:
         1. Decrement the quantization step $\Delta_b$ for each subband $b$, proportionally to the ATH in $b$. Compute $R$ after encoding the quantizer indexes with (static) Huffman coding. As in previous layers, a quantizer is selected from a list of predefined logaritmic quantizers.

### Decoder

1. For each input frame:
   1. Decode the Huffman codes.
   2. "Dequantize" the coeffs of each subband.
   3. Descale the coeffs to their original dynamic range.
   4. Apply inverse MDCT.
   5. Apply the 32-band synthesis filters bank.