# Welcome to ExKaldi

In this section, we will extract and process the acoustic feature.

Please ensure you have downloaded the complete librispeech_dummy corpus from our github.
https://github.com/wangyu09/exkaldi

First of all, update the wav path info in wav.scp file.

In [None]:
! cd librispeech_dummy && python3 reset_wav_path.py

From now on, we will start to build a ASR system from the scratch.

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

In the train dataset, there are 100 utterances fetched from 10 speakers. Each specker corresponds to 10 utterances.

You can compute feature from __WAV file__ or __Kaldi script-file table__ or exkaldi __ListTable__ object.

In [None]:
scpFile = os.path.join(dataDir, "train", "wav.scp")

feat = exkaldi.compute_mfcc(scpFile, name="mfcc")

feat

Use function __compute_mfcc__ to compute MFCC feature. In current version of ExKaldi, there are 4 functions to compute acoustic feature:

__compute_mfcc__: compute the MFCC feature.  
__compute_fbank__: compute the fBank feature.  
__compute_plp__: compute the PLP feature.  
__compute_spectrogram__: compute the power spectrogram feature.  

The returned object: ___feat___ is an exkaldi feature archive whose class name is __BytesFeat__. In ExKaldi, we use 3 approaches to discribe Kaldi archives: __Bytes Object__, __Numpy Array__, and __Index Table__. We have designed a group of classes to hold them. We will introduce them in later steps.

Here, __BytesFeat__ is one of __Bytes Object__ and its object holds the acoustic feature data with bytes format. You can use attribute: __.data__ to get it, but we do not recommend this if you just want to look it, because it is not a human-readable data format.

___feat___ object has some useful attributes and methods. For example, use __.dim__ to look feature dimensions.

In [None]:
feat.dim

Use __.utts__ to get its' utterances IDs.

In [None]:
feat.utts[0:5]

Randomly sample 10 utterances.

In [None]:
samplingFeat = feat.subset(nRandom=10)

samplingFeat

Here, ___samplingFeat___ is also a __BytesFeat__ object.

In ExKaldi, the name of object will record the operation. For example, the ___samplingFeat___ generated above has a new name now.

In [None]:
samplingFeat.name

In [None]:
del samplingFeat

Besides __BytesFeat__ class, these classes can hold other Kaldi archive tables in bytes format.

__BytesCMVN__: to hold the CMVN statistics.  
__BytesProb__: to hold the Neural Network output.  
__BytesAliTrans__: to hold the transition-ID Alignment.   
__BytesFmllr__: to hold the fmllr transform matrices. 

All these classes have some fimiliar properties. For more information, check the [ExKaldi Documents](https://wangyu09.github.io/exkaldi/#/) please. Here we only focus on feature processing.

By the way, in ExKaldi, we sort these archives rigorously in order to reduce buffer cost and accelerate processing.

In [None]:
featTemp = feat.sort(by="utt", reverse=True)

featTemp.utts[0:5]

In [None]:
del featTemp

Raw feature can be further optimized, typically, with applying CMVN. Here we firstly compute the CMVN statistics.

In [None]:
spk2uttFile = os.path.join(dataDir, "train", "spk2utt")

cmvn = exkaldi.compute_cmvn_stats(feat, spk2utt=spk2uttFile, name="cmvn")

cmvn

___cmvn___ is an exkaldi __BytesCMVN__ object. It holds the CMVN statistics in binary format. Then we use it to normalize the feature.

In [None]:
utt2spkFile = os.path.join(dataDir, "train", "utt2spk")

feat = exkaldi.use_cmvn(feat, cmvn, utt2spk=utt2spkFile)

feat.name

We save this feature into file. In futher steps, it will be restoraged. ExKaldi bytes archives can be saved the same as Kaldi format files.

In [None]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")

exkaldi.utils.make_dependent_dirs(path=featFile, pathIsFile=True)

featIndex = feat.save(featFile, returnIndexTable=True)

#del feat

If you appoint the option __returnIndexTable__ to be True, an __IndexTable__ object will be returned. As we introduced above, this is our second approach to discribe archives, __index table__. It plays almost the same role with original feature object. __IndexTable__ is a subclass of Python dict class, so you can view its data directly.

When training a large corpus or using multiple processes, __IndexTable__ will become the main currency.

In [None]:
featIndex

Of cause, original archives can also be loaded into memory again. For example, feature can be loaded from Kaldi binary archive file (__.ark__ file) or script table file (__.scp__).

Particularly, we can fetch the data via index table directly.

In [None]:
feat = featIndex.fetch(arkType="feat")
del featIndex

feat

All Bytes archives can be transformed to __Numpy__ format. So If you want to train NN acoustic model with Tensorflow or others, you can use the Numpy format data.

In [None]:
feat = feat.to_numpy()

feat

by calling __.to_numpy()__ function, ___feat___ became an exkaldi __NumpyFeat__ object, it has some fimiliar attributes and methods with __BytesFeat__, but also has own properties. Let's skip the details here.

This is the third way to discribe archives: __Numpy Array__. __NumpyFeat__ is one of Numpy archives classes.

Here we will introduce some methods to use its data.

In [None]:
sampleFeat = feat.subset(nHead=2)

1. use __.data__ to get the dict object whose keys are utterance IDs and values are data arrays.

In [None]:
sampleFeat.data

2. use __.array__ get the arrays only.

In [None]:
sampleFeat.array

3. use getitem function to get a specified utterance.

In [None]:
sampleFeat['103-1240-0000']

4. like dict object, __.keys()__,__.values()__,__.items()__ are availabel to iterate it.

In [None]:
for key in sampleFeat.keys():
    print( sampleFeat[key].shape )

5. setitem is also available only if you set the array with right format.

In [None]:
sampleFeat['103-1240-0000'] *= 2

In [None]:
sampleFeat['103-1240-0000']

In [None]:
del sampleFeat

Similarly, ExKaldi Numpy archives can be transformed back to bytes archives easily. 

In [None]:
tempFeat = feat.to_bytes()

tempFeat

In [None]:
del tempFeat

Numpy data can also be saved to .npy file with a specified format.

In [None]:
tempFile = os.path.join(dataDir, "exp", "temp_mfcc.npy")

feat.save(tempFile)

In [None]:
del feat

And it can also be restorage into memory again.

In [None]:
feat = exkaldi.load_feat(tempFile, name="mfcc")

feat

In [None]:
feat

Besides __NumpyFeat__ class, these classes hold Kaldi archives in Numpy format.

__NumpyCMVN__: to hold CMVN statistics data.  
__NumpyProb__:  to hold NN output data.  
__NumpyAli__:  to hold Users' own Alignment data.  
__NumpyAliTrans__:  to hold Transition-ID alignment.  
__NumpyAliPhone__:  to hold Phone-ID alignment.  
__NumpyAliPdf__:  to hold Pdf-ID alignment.  
__NumpyFmllr__:  to hold fmllr transform matrices.  

They have similar properties as __NumpyFeat__. We will introduce them in the next steps.