## Some technical notes about audio parameters

- The sampled signal is obtained in the Linear Pulse Code Modulation (LPCM).
- The signal is stereo (`nchanells=2`), but it is only used the left-side signal.
- It is utilized 16 bits (2 bytes) per sample to encode the audio. The native data type of this data is `int16`, which is capable of storing a [range from](https://www.mathworks.com/help/matlab/ref/audioread.html) `-32768` up to `+32767`.
- The data type is converted to `float` because of the numeric precision and because the floating point in `Python` [is interpreted as](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex) `double` in `C`, which is convenient.
- The raw data is normalized by its l2 norm for each frame.
- The original sampling rate is $44.1\;kHz$. But each recording is downsampled into two different signals, with a sampling rate of $F_s = 22.05\;kHz$.
- The audio dataset comprises five classes (the speeches "avançar", "recuar", "parar", "direita", and "esquerda"), each with 10 recordings, totalizing 50 files. With the downsampling, we have 20 signals by class. Considering that the audio recording is stereo, that is, `nchannel=2`, the number of signals by class is increased to 40.

## Some notes about the LPC (linear predictive coding) and the Yule-Walker algorithms

- The AR(p) model is implemented for `p=10`, `p=15`, and `p=20`.
- A single recording is divided into 31 frames without overlapping. The number of samples per frames, $N_f$, and the number of samples between each frame, $N_{esp}$, are given by
$$ N_f = \frac{T_{sig}T_{f_{min}} F_s}{T_{min}} $$
and
$$ N_{esp} = \frac{T_{sig} F_s-31N_f}{30},$$
where $T_{sig}$ is the signal duration, $T_{min}$ is the minimum signal duration of the dataset, and $T_{f_{min}} \triangleq 15\;ms $ is the minimum frame duration. All these variables are defined in seconds.
- The Yule-Walker equation is applied to each of the 31 frames produced.

---

> 1. Carregar os diversos arquivos de áudio e realizar a subamostragem dos sinais de cada canal a fim
de gerar a base de dados de treino e teste.

### Initializing

In [7]:
from numpy import empty, roll, multiply, sum, matmul, inf
from scipy.io import wavfile
from scipy.linalg import toeplitz
from numpy.linalg import norm, cond, matrix_rank as rank, inv
from warnings import warn
from os import listdir

# train/test set split
n_train, n_test = 8, 2
# all coefficients of the AR(p) model. For each command, we have a 8 set of coefficients
all_a = {f'a_{command}': empty((n_train,all_p[0])) for command in ('avancar', 'esquerda', 'direita', 'parar', 'recuar')}

def get_T_min(root_dir):
    T_min = inf
    for file_name in listdir(root_dir):
        # input audio vector, s_n -> [s[0], s[1], ..., s[N-1]]
        F_s, s_n = wavfile.read(root_dir+file_name)
        # signal duration
        T_sig = s_n[:,0].size * (1/F_s)
        if T_sig < T_min:
            T_min = T_sig
    return T_min

# minimum audio duration of the dataset
T_min = get_T_min('./Audio_files_TCC_Jefferson/')
# minimum frame duration
T_f_min = 15e-3

### LPC and Yule-Walk algorithm

In [12]:
for p in range(10,21,5):
    for command in ('avancar', 'direita', 'esquerda', 'parar', 'recuar'):
        # training set
        for file_number in range(1,n_train+1):
            file_name = f'./Audio_files_TCC_Jefferson/comando_{command}_{file_number:0>2d}.wav'
            # input audio vector, s_n -> [s[0], s[1], ..., s[N_s-1]]
            F_s, s_n = wavfile.read(file_name)
            # Number of samples
            N_s = s_n[:,0].size
            # signal duration
            T_sig = N_s/F_s
            # convert to float type
            s_n = s_n.astype(float)
            # downsampling: generate s1_n and s2_n from s_n
            s1_n, s2_n = s_n[range(0,N_s,2),:], s_n[range(1,N_s,2),:]
            N_s /= 2
            F_s /= 2
            # number of samples per frame
            N_f = T_sig*T_f_min*F_s/T_min
            # number of samples between each frame
            N_esp = (N_s - 31*N_f)/30

            #for each frame
            for i, n in enumerate(range(0, N_s+1, N_f)):
                # get the frame
                s1f_n, s2f_n = s1_n[n+i*N_esp:n+N_f+i*N_esp], s2_n[n+i*N_esp:n+N_f+i*N_esp]
                # get channel b and chanell b
                s1fa_n, s1fb_n, s2fa_n, s2fb_n = s1f_n[:,0], s1f_n[:,1], s2f_n[:,0], s2f_n[:,1]
                # for each of the 4 signals from a single recording: channel a and b, samples even and odd
                for s_n0 in (s1fa_n, s1fb_n, s2fa_n, s2fb_n):
                    # s_n0 -> [s[n0], s[n0+1], ..., s[n0+N_f-1]], being n0\in\mathbb{N}
                    # normalized signal by its l2 norm
                    s_n0 /= norm(s_n0)
                    # compute the autocorrelation vector, r_k -> r[k] -> [r[0], r[1], ..., r[p-1]]
                    r_k = empty(p)
                    for k in range(p):
                        # s_n0_minus_k -> [0, 0, ..., 0(k times), s[n0], s[n0+1], ..., s[n0+N_f-1-k]]
                        s_n0_minus_k = roll(s_n0, k)
                        s_n0_minus_k[0:k] = 0
                        r_k[k] = sum(multiply(s_n0, s_n0_minus_k))
                    # autocorrelation matrix
                    R = toeplitz(r_k)
                    if rank(R) == R.shape[0]:
                        if cond(R) > 1e3:
                            warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
                        all_a[f'a_{command}'][file_number-1] = matmul(inv(R), r_k)
                    else:
                        warn(f'The autocorrelation matrix of the audio {file_name} is rank-deficient, skip over to the next audio recording.')

  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation matrix of the audio {file_name} is ill-conditioned! The results are suspect!')
  warn(f'The autocorrelation