# Welcome to Exkaldi

In this section, we will extract and process acoustic feature.

Please ensure you have downloaded the integral librispeech_dummy corpus from our github.
https://github.com/wangyu09/exkaldi

First of all, update the wav path info in wav.scp file.

In [None]:
! cd librispeech_dummy && python reset_wav_path.py

From now on, we will start to build a ASR system from the scratch.

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

In the train dataset, there are 100 utterances fetched from 10 speakers. Each specker corresponds to 10 utterances.

You can compute feature from __WAV file__ or __Kaldi script-file table__ or exkaldi __ListTable__ object.

In [None]:
scpFile = os.path.join(dataDir, "train", "wav.scp")

feat = exkaldi.compute_mfcc(scpFile, name="mfcc")

feat

use function __compute_mfcc__ to compute MFCC feature. In current version of Exkaldi, there are four functions to compute acoustic feature:

__compute_mfcc__: compute the MFCC feature.  
__compute_fbank__: compute the fBank feature.  
__compute_plp__: compute the PLP feature.  
__compute_spectrogram__: compute the power spectrogram feature.

The returned object: ___feat___ is an exkaldi feature archieve whose class name is __BytesFeature__. In Exkaldi, we designed a group of classes to hold kaldi archieve table (in both binary format and text format) and script-file table. We will introduce them in laters steps.

Here, __BytesFeature__ object holds the feature data with bytes format. You can use attribute: __.data__ to get it, but we do not recommend this if you just want to look it, It is not a human-readable data format.

___feat___ object has some useful attributes and methods. For example, use __.dim__ to look feature dimensions.

In [None]:
feat.dim

Use __.utts__ to get its' utterances IDs.

In [None]:
feat.utts[0:5]

Get a specified utterance by using __.\_\_call\_\___ method.

In [None]:
oneFeat = feat("103-1240-0000")

oneFeat

Here, ___oneFeat___ is also a __BytesFeature__ object, but only one utterance. 

In exkaldi, the name of object will record the operation. For example, the ___oneFeat___ generated above has a new name now.

In [None]:
oneFeat.name

In [None]:
del oneFeat

Besides __BytesFeature__ class, these classes can hold other Kaldi achivement tables in bytes format.

__BytesCMVNStatistics__: to hold the CMVN statistics.  
__BytesProbability__: to hold the Neural Network output.  
__BytesAlignmentTrans__: to hold the Transition-ID Alignment.   
__BytesFmllrMatrix__: to hold the fmllr transform matrices. 

All these classes have some fimiliar properties. For more information, check the source code please. Here we only focus on feature processing.

By the way, in Exkaldi, we sort these archieves rigorously in order to reduce buffer cost and accelerate processing.

In [None]:
featTemp = feat.sort(by="utt", reverse=True)

featTemp.utts[0:5]

In [None]:
del featTemp

Raw feature can be further optimized, typically, with applying CMVN. Here we compute the CMVN statistics.

In [None]:
spk2uttFile = os.path.join(dataDir, "train", "spk2utt")

cmvn = exkaldi.compute_cmvn_stats(feat, spk2utt=spk2uttFile, name="cmvn")

cmvn

___cmvn___ is an exkaldi __BytesCMVNStatistics__ object. It holds the CMVN statistics in binary format. Then we normalize the feature.

In [None]:
utt2spkFile = os.path.join(dataDir, "train", "utt2spk")

feat = exkaldi.use_cmvn(feat, cmvn, utt2spk=utt2spkFile)

feat.name

We save this feature into file. In futher steps, it will be restoraged. Exkaldi bytes archieves can be saved the same as Kaldi format files whose suffix is __.ark__.

In [None]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")

exkaldi.utils.make_dependent_dirs(path=featFile, pathIsFile=True)

featIndex = feat.save(featFile, returnIndexTable=True)
#del feat

In [None]:
len(feat.data)

If you appoint the parameter __returnIndexTable__ to be True, an __ArkIndexTable__ object will be returned. It plays almost the same role with original feature object. __ArkIndexTable__ is a subclass of Python dict class, so you can view its data directly.

In [None]:
featIndex

Of cause, original archieves can also be loaded into memory again. For example, feature can be loaded from Kaldi binary achivement file (__.ark__ file) or script table file (__.scp__).

Particularly, we can fetch the data via index table directly.

In [None]:
feat = featIndex.fetch(arkType="feat")
del featIndex

feat

All Bytes archieves can be transformed to __Numpy__ format. So If you want to train NN acoustic model with Tensorflow or others, you can use the Numpy format data.

In [None]:
feat = feat.to_numpy()

feat

by calling __.to_numpy()__ function, ___feat___ became an exkaldi __NumpyFeature__ object, it has some fimiliar attributes and methods with __BytesFeature__, but also has own properties. Let's skip the details here.

So you can use __.data__ to look it.

In [None]:
oneFeat = feat.subset(nHead=1)

oneFeat.data

In [None]:
del oneFeat

Similarly, exkaldi Numpy archieves can be transformed back to Bytes archieves easily. 

In [None]:
feat.to_bytes()

Exkaldi Numpy achivements can also be saved to .npy file with a specified format.

In [None]:
tempFile = os.path.join(dataDir, "exp", "temp_mfcc.npy")

feat.save(tempFile)

In [None]:
del feat

And restorage it into memory.

In [None]:
feat = exkaldi.load_feat(tempFile, name="mfcc")

feat

In [None]:
os.remove(tempFile)
del feat

Besides __NumpyFeature__ class, these classes hold Kaldi archieves in Numpy format.

__NumpyCMVNStatistics__: to hold CMVN statistics data.  
__NumpyProbability__:  to hold NN output data.  
__NumpyAlignment__:  to hold Users' own Alignment data.  
__NumpyAlignmentTrans__:  to hold Transition-ID alignment.  
__NumpyAlignmentPhone__:  to hold Phone-ID alignment.  
__NumpyAlignmentPdf__:  to hold Pdf-ID alignment.  
__NumpyFmllrMatrix__:  to hold fmllr transform matrices.  

They have similar properties as __NumpyFeature__. We will introduce them in the next steps.