## Some technical notes about audio parameters

- The sampled signal is obtained in the Linear Pulse Code Modulation (LPCM).
- The signal is stereo (`nchanells=2`), but it is only used the left-side signal.
- It is utilized 16 bits (2 bytes) per sample to encode the audio. The native data type of this data is `int16`, which is capable of storing a [range from](https://www.mathworks.com/help/matlab/ref/audioread.html) `-32768` up to `+32767`.
- The data type is converted to `float` because of the numeric precision and because the floating point in `Python` [is interpreted as](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex) `double` in `C`, which is convenient.
- The original raw data is normalized to l2 norm. 

---

> 1. Carregar os diversos arquivos de áudio e realizar a subamostragem dos sinais de cada canal a fim
de gerar a base de dados de treino e teste.

Each command has 10 recordings. One chooses a split of 80% and 20% for the training and test set, respectively.

In [1]:
from numpy import empty, roll, multiply, sum, matmul
from scipy.io import wavfile
from scipy.linalg import toeplitz
from numpy.linalg import norm, cond, matrix_rank as rank, inv
from warnings import warn

n_train, n_test = 8, 2
# order of the AR(p) model
p = 10
# all coefficients of the AR(p) model. For each command, we have a 8 set of coefficients
all_a = {f'a_{command}': empty((n_train,p)) for command in ('avancar', 'esquerda', 'direita', 'parar', 'recuar')}

for command in ('avancar', 'direita', 'esquerda', 'parar', 'recuar'):
    # training set
    for file_number in range(1,n_train+1):
        file_name = f'./Audio_files_TCC_Jefferson/comando_{command}_{file_number:0>2d}.wav'
        # input audio vector, s_n -> [s[0], s[1], ..., s[N-1]]
        _, s_n = wavfile.read(file_name)
        # get the left-side signal (as float type)
        s_n = s_n[:,0].astype(float)
        # normalized signal, l2 norm
        s_n /= norm(s_n)
        # compute the autocorrelation vector, r_k -> r[k] -> [r[0], r[1], ..., r[p-1]]
        r_k = empty(p)
        for k in range(p):
            # s_n_minus_k -> s[n-k] -> [0, 0, ..., 0(k times), s[0], s[1], ..., s[N-k-1]]
            s_n_minus_k = roll(s_n, k)
            s_n_minus_k[0:k] = 0
            r_k[k] = sum(multiply(s_n, s_n_minus_k))
        # autocorrelation matrix
        R = toeplitz(r_k)
        if rank(R) == R.shape[0]:
            if cond(R) > 1e3:
                warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
            all_a[f'a_{command}'][file_number-1] = matmul(inv(R), r_k)
        else:
            warn(f'The autocorrelation matrix of the audio {file_name} is rank-deficient, skip over to the next audio recording.')

  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')


KeyboardInterrupt: 