# Welcome to Exkaldi

In this section, we will extract and process acoustic feature.

Please ensure you have downloaded the integral librispeech_dummy corpus from our github.
https://github.com/wangyu09/exkaldi

First of all, update the wav path info in wav.scp file.

In [1]:
! cd ../examplesdata/librispeech_dummy && python reset_wav_path.py

In [2]:
import exkaldi

import os
dataDir = os.path.join("..", "examplesdata", "librispeech_dummy")

We use train data set, there are 100 utterances fetched from 10 speakers. Each specker corresponds to 10 utterances.

You can compute feature from WAV file or Kaldi script-file table or exkaldi __ListTable__ object.

In [3]:
scpFile = os.path.join(dataDir, "train", "wav.scp")

feat = exkaldi.compute_mfcc(scpFile, name="mfcc")

feat

<exkaldi.core.achivements.BytesFeature at 0x7f25c0f98f28>

___feat___ is an exkaldi feature achivements: __BytesFeature__ object.

in Exkaldi, we designed a group of classes to hold kaldi achivement table (in both binary format and text format) and script-file table.
We will introduce them in the futher steps.

Here, __BytesFeature__ object holds the feature data with binary format. you can use __.data__ to get it, but we do not suggest you do like this if you just want to look it.

___feat___ has some useful attributes and methods, for example, use __.dim__ to look feature dimensionality.

In [4]:
feat.dim

13

Use __.utts__ to look utt-IDs.

In [5]:
feat.utts[0:5]

['103-1240-0000',
 '103-1240-0001',
 '103-1240-0002',
 '103-1240-0003',
 '103-1240-0004']

Get a specified utterance by using __.\_\_call\_\___ method.

In [6]:
oneFeat = feat("103-1240-0000")

Here, ___oneFeat___ is an exkaldi __BytesFeature__ object, but only one utterance. 

In exkaldi, the name of object will record the operation. For example, the ___oneFeat___ generated above has a new name now.

In [7]:
oneFeat.name

'pick(mfcc,103-1240-0000)'

In [8]:
del oneFeat

Besides __BytesFeature__, these classes can hold other Kaldi achivement tables in binary format.

__BytesCMVNStatistics__: to hold the CMVN statistics data. 

__BytesProbability__: to hold the Neural Network output data. 

__BytesAlignmentTrans__: to hold the Transition-ID Alignment data. 

All these classes have some fimiliar properties. Here we only focus on feature processing.

by the way, in Exkaldi, we sort these achivements rigorously in order to reduce buffer cost and accelerate processing.

In [9]:
feat = feat.sort(by="utt", reverse=True)

feat.utts[0:5]

['1088-134315-0009',
 '1088-134315-0008',
 '1088-134315-0007',
 '1088-134315-0006',
 '1088-134315-0005']

Raw feature can be optimized furtherly, typically, applying CMVN.

In [10]:
spk2uttFile = os.path.join(dataDir, "train", "spk2utt")

cmvn = exkaldi.compute_cmvn_stats(feat, spk2utt=spk2uttFile, name="cmvn")

cmvn

<exkaldi.core.achivements.BytesCMVNStatistics at 0x7f261d32b2e8>

___cmvn___ is an Exkaldi BytesCMVNStatistics object. It holds the CMVN statistics in binary format.

In [11]:
utt2spkFile = os.path.join(dataDir, "train", "utt2spk")

feat = exkaldi.use_cmvn(feat, cmvn, utt2spk=utt2spkFile)

feat.name

'cmvn(mfcc,cmvn)'

Then add 2-order deltas and then splice left-right 3 frames.

In [12]:
feat = feat.add_delta(order=2)

feat.dim

39

In [13]:
feat = feat.splice(left=1, right=1)

feat.dim

117

Exkaldi achivements can be saved in file as Kaldi achivement files whose suffix is .ark (and .scp).

In [14]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

exkaldi.utils.make_dependent_dirs(path=featFile, pathIsFile=True)

feat.save(featFile, outScpFile=False)

'../examplesdata/librispeech_dummy/exp/mfcc.ark'

In [15]:
del feat

Of cause, feature can also be loaded Kaldi binary achivement file (.ark file) or script table file (.scp). 

In [16]:
feat = exkaldi.load_feat(featFile, name="mfcc")

feat

<exkaldi.core.achivements.BytesFeature at 0x7f261d366860>

All Bytes achivements can be transformed to visible Numpy format. So If you want to train NN acoustic model with Tensorflow or others, you can use the Numpy format data.

In [17]:
feat = feat.to_numpy()

feat

<exkaldi.core.achivements.NumpyFeature at 0x7f261d34eb70>

___feat___ became an Exkaldi __NumpyFeature__ object, it has some fimiliar attributes and methods with __BytesFeature__, but also has own properties. Let's skip the details first.

Unlike binary format, Numpy format is visible. So you can use __.data__ to look it.

In [18]:
oneFeat = feat.subset(nHead=1)

oneFeat.data

{'103-1240-0000': array([[ -2.254528  ,  -3.3443842 ,   8.89428   , ...,   0.22093892,
           0.57776177,  -0.5845985 ],
        [ -2.254528  ,  -3.3443842 ,   8.89428   , ...,   0.7834983 ,
           1.0710119 ,   0.29615283],
        [ -2.2711601 ,  -3.6887026 ,   8.395482  , ...,   1.1539035 ,
           1.3796794 ,   1.3895724 ],
        ...,
        [ -1.5286026 , -17.361238  , -10.944935  , ...,   0.2552031 ,
          -0.29630542,   0.5276331 ],
        [ -1.5548878 , -16.208216  , -15.402992  , ...,  -0.12567882,
           0.7363866 ,   0.20010377],
        [ -1.6056385 , -18.53891   , -13.54101   , ...,  -0.12567882,
           0.7363866 ,   0.20010377]], dtype=float32)}

In [19]:
del oneFeat

Exkaldi Numpy achivements can be transformed back to Bytes achivements easily. 

In [20]:
feat.to_bytes()

<exkaldi.core.achivements.BytesFeature at 0x7f261db74f98>

Exkaldi Numpy achivements can also be saved to .npy file with a specified format.

In [21]:
tempFile = os.path.join(dataDir, "exp", "mfcc.npy")

feat.save(tempFile)

'../examplesdata/librispeech_dummy/exp/mfcc.npy'

In [22]:
del feat

And load it into memory again.

In [23]:
feat = exkaldi.load_feat(tempFile, name="mfcc")

In [24]:
os.remove(tempFile)

Besides __NumpyFeature__ class, these classes hold Kaldi achivements in Numpy format.

__NumpyCMVNStatistics__: to hold CMVN statistics data.

__NumpyProbability__:  to hold NN output data.

__NumpyAlignment__:  to hold Users' own Alignment data.

__NumpyAlignmentTrans__:  to hold Transition-ID alignment.

__NumpyAlignmentPhone__:  to hold Phone-ID alignment.

__NumpyAlignmentPdf__:  to hold Pdf-ID alignment.

They have similar properties as NumpyFeature. We will introduce them in the next steps.