# Welcome to Exkaldi

In this section, we will training a DNN acoustic model with Tensorflow 2.x.

If you want run this step, please install tensorflow firstly.

In [1]:
import exkaldi

import os
dataDir = "librispeech_dummy"

We use keras to build and train model.

In [2]:
import tensorflow as tf
from tensorflow import keras
import random
import datetime
import numpy as np

Fix the random seed.

In [3]:
seed = 1
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

### Prepare Dataset

Restorage the training feature.

In [4]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

feat = exkaldi.load_feat(featFile)

feat = feat.to_numpy()

feat.dim

117

This feature is made following these steps:

compute mfcc (13) >> apply CMVN (13) >> add 2 order deltas (39) >> splice 1-1 frames (117)

We still further do global standerd normalization on it.

In [5]:
feat = feat.normalize(std=True)

Them we load the alignment data. They have been generated in early step (07_train_triphone_HMM-GMM_delta).

We will use pdf-ID as target label. In exkaldi, transition-ID and phone-ID can also be extracted for mutiple tasks.

In [6]:
aliFile = os.path.join(dataDir, "exp", "train_delta", "final.ali")
hmmFile = os.path.join(dataDir, "exp", "train_delta", "final.mdl")

ali = exkaldi.load_ali(aliFile)

ali = ali.to_numpy(aliType="pdfID", hmm=hmmFile)

ali

<exkaldi.core.achivements.NumpyAlignmentPdf at 0x7f48487043c8>

Look the classes of alignment. It is the output units of Neural Network.

Then we tuple the feature and alignment in order to generate a dataset for Neural Network Framework. We use __tuple_data(...)__ function to group them. 

But note that, this function will group the achivements by their name, so please ensure their names are avaliable as python identifiers. (that means, we only allow lower and upper letters, digits, and underline in their names.)

In [7]:
feat.rename("mfcc")
ali.rename("pdfID")

dataset = exkaldi.tuple_data([feat,ali], frameLevel=True)

datasetSize = len(dataset)
datasetSize

126345

___dataset___ is a list. whose members are namedtuples. For example:

In [8]:
oneRecord = dataset[0]

oneRecord

TupledData(uttID='103-1240-0000', frameID=0, mfcc=array([[-8.19768488e-01, -1.98504940e-01,  6.30387783e-01,
         9.09560174e-02,  8.71950507e-01,  1.24062955e+00,
         1.16157448e+00,  3.28367263e-01,  4.53087598e-01,
         6.50599599e-02,  2.01714620e-01,  6.11037135e-01,
         3.36491138e-01,  6.70793874e-04,  2.54764557e-02,
         1.77971616e-01,  1.78947791e-01,  4.53171022e-02,
        -5.15161633e-01, -4.16369766e-01,  2.91420579e-01,
         6.82660222e-01,  5.25227606e-01, -3.15611064e-01,
        -2.79526472e-01,  1.53618842e-01,  2.50820909e-02,
         3.96822095e-02,  2.34689027e-01, -2.32635170e-01,
         3.47851105e-02, -6.70426860e-02, -1.50040478e-01,
        -1.28142163e-01,  4.15323287e-01,  6.23874485e-01,
        -1.92458287e-01, -1.82956874e-01, -5.52129328e-01,
        -8.19626927e-01, -1.98620737e-01,  6.30370140e-01,
         9.10242200e-02,  8.71933877e-01,  1.24058509e+00,
         1.16157663e+00,  3.28481555e-01,  4.53085810e-01,
      

Use name to get specified data.

In [9]:
oneRecord.pdfID

array([0], dtype=int32)

If you train a sequential NN model, you may not want to tuple achivemnts data in framelevel but in utterance level. try to change the mode of tuple. 

You can tuple all kinds of exkaldi achivements such as feature, CMVN, alignment, probability. And even different feature such as MFCC, fBank and so on, different alignment such as PdfID, Phone ID. For example, now we want to do multiple tasks. 

In [10]:
ali2 = exkaldi.load_ali(aliFile)

ali2 = ali2.to_numpy(aliType="phoneID", hmm=hmmFile)

ali2.rename("phoneID")

dataset2 = exkaldi.tuple_data([feat,ali,ali2], frameLevel=True)

In [11]:
dataset2[0]

TupledData(uttID='103-1240-0000', frameID=0, mfcc=array([[-8.19768488e-01, -1.98504940e-01,  6.30387783e-01,
         9.09560174e-02,  8.71950507e-01,  1.24062955e+00,
         1.16157448e+00,  3.28367263e-01,  4.53087598e-01,
         6.50599599e-02,  2.01714620e-01,  6.11037135e-01,
         3.36491138e-01,  6.70793874e-04,  2.54764557e-02,
         1.77971616e-01,  1.78947791e-01,  4.53171022e-02,
        -5.15161633e-01, -4.16369766e-01,  2.91420579e-01,
         6.82660222e-01,  5.25227606e-01, -3.15611064e-01,
        -2.79526472e-01,  1.53618842e-01,  2.50820909e-02,
         3.96822095e-02,  2.34689027e-01, -2.32635170e-01,
         3.47851105e-02, -6.70426860e-02, -1.50040478e-01,
        -1.28142163e-01,  4.15323287e-01,  6.23874485e-01,
        -1.92458287e-01, -1.82956874e-01, -5.52129328e-01,
        -8.19626927e-01, -1.98620737e-01,  6.30370140e-01,
         9.10242200e-02,  8.71933877e-01,  1.24058509e+00,
         1.16157663e+00,  3.28481555e-01,  4.53085810e-01,
      

In [12]:
featureDim = feat.dim
pdfClasses = exkaldi.hmm.load_hmm(hmmFile,"triphone").info.pdfs

In [13]:
del ali2
del dataset2

del ali
del feat

### Training

Now we start to train DNN acoustic model. Fisrtly, design a data iterator from our provided dataset.

In [14]:
def data_generater(dataset, batchSize):

    length = len(dataset)
    while True:
        index = 0
        random.shuffle(dataset)
        while index < length:
            one = dataset[index]
            index += 1
            yield (one.mfcc[0], one.pdfID)

In [15]:
batchSize = 64
tf_datasets = tf.data.Dataset.from_generator(
                                 lambda : data_generater(dataset),
                                 (tf.float32, tf.int32)
                            ).batch(batchSize).prefetch(batchSize)

Then define a simple Dense model.

In [16]:
def make_DNN_model(inputsShape, classes):
    
    inputs = keras.Input(inputsShape)
    h1 = keras.layers.Dense(256, activation="relu", kernel_initializer="he_normal")(inputs)
    h1_bn = keras.layers.BatchNormalization()(h1)
    
    h2 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h1_bn)
    h2_bn = keras.layers.BatchNormalization()(h2)
    
    h3 = keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal")(h2_bn)
    h3_bn = keras.layers.BatchNormalization()(h3)
    
    outputs = keras.layers.Dense(classes, use_bias=False)(h3_bn)
    
    return keras.Model(inputs, outputs)

In [18]:
model = make_DNN_model((featureDim,), pdfClasses)

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 117)]             0         
_________________________________________________________________
dense (Dense)                (None, 256)               30208     
_________________________________________________________________
batch_normalization (BatchNo (None, 256)               1024      
_________________________________________________________________
dense_1 (Dense)              (None, 512)               131584    
_________________________________________________________________
batch_normalization_1 (Batch (None, 512)               2048      
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
batch_normalization_2 (Batch (None, 512)               2048  

Here are optimizer and metrics.

In [19]:
optimizer = keras.optimizers.Adam(0.001)

losses = keras.metrics.Mean(name="train/loss", dtype=tf.float32)
accs = keras.metrics.Mean(name="train/accuracy", dtype=tf.float32)

Speecify the output dir. You can use tensorboard to check the training results.

In [20]:
outDir = os.path.join(dataDir, "exp", "train_DNN")

stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logDir = os.path.join(outDir, "log", stamp)
file_writer = tf.summary.create_file_writer(logDir)

In [21]:
epochs = 20

epoch_iterations = datasetSize//batchSize

epoch_iterations

1974

In order to print the progress bar and control the epoch ending, we will lend a hand from __tqdm__ package.

In [22]:
! pip install tqdm 2>/dev/null



In [23]:
from tqdm import tqdm

Start to train this model. During the training loop, You can use tensorboard to look the visiable training result.

```
tensorboard --logdir=./librispeech_dummy/exp/train_DNN/log --bind_all
```

Just for fun, we do not validate the model during the training, but in real case, you should do it.

In [24]:
with file_writer.as_default():
    
    for epoch in range(epochs):
        
        for batch,i in zip(tf_datasets, tqdm(range(epoch_iterations))):
            data, label = batch
            
            with tf.GradientTape() as tape:
                logits = model(data, training=True)
                loss = keras.losses.sparse_categorical_crossentropy(label, logits, from_logits=True)
                losses(loss)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))

                pred = keras.backend.argmax(logits, axis=1)

                acc = exkaldi.nn.accuracy(label.numpy(), pred.numpy())
                accs(acc.accuracy)
        
            #if int(optimizer.iterations.numpy()) % epoch_iterations == 0:     #<<<< if you don't use tqdm
            #    break
        
        current_loss = losses.result()
        current_acc = accs.result()
        tf.print( f"Epoch {epoch}", f" Loss {current_loss:.6f}", f" Acc {current_acc:.6f}")

        tf.summary.scalar("train/loss", data=current_loss, step=epoch)
        tf.summary.scalar("train/accuracy", data=current_acc, step=epoch)

    tf.print( "Training Done" )

100%|██████████| 1974/1974 [00:44<00:00, 44.43it/s]

Epoch 0  Loss 2.581684  Acc 0.393665



100%|██████████| 1974/1974 [00:44<00:00, 44.46it/s]

Epoch 1  Loss 2.141212  Acc 0.467456



100%|██████████| 1974/1974 [00:44<00:00, 44.35it/s]

Epoch 2  Loss 1.881780  Acc 0.516480



100%|██████████| 1974/1974 [00:44<00:00, 44.20it/s]

Epoch 3  Loss 1.693251  Acc 0.555180



100%|██████████| 1974/1974 [00:44<00:00, 44.47it/s]

Epoch 4  Loss 1.545394  Acc 0.586849



100%|██████████| 1974/1974 [00:44<00:00, 44.31it/s]

Epoch 5  Loss 1.424021  Acc 0.614029



100%|██████████| 1974/1974 [00:44<00:00, 44.30it/s]

Epoch 6  Loss 1.322298  Acc 0.637538



100%|██████████| 1974/1974 [00:43<00:00, 45.56it/s]

Epoch 7  Loss 1.235971  Acc 0.657878



100%|██████████| 1974/1974 [00:44<00:00, 44.51it/s]

Epoch 8  Loss 1.161078  Acc 0.675820



100%|██████████| 1974/1974 [00:43<00:00, 45.68it/s]

Epoch 9  Loss 1.096026  Acc 0.691521



100%|██████████| 1974/1974 [00:44<00:00, 44.31it/s]

Epoch 10  Loss 1.038808  Acc 0.705696



100%|██████████| 1974/1974 [00:44<00:00, 44.29it/s]

Epoch 11  Loss 0.988151  Acc 0.718406



100%|██████████| 1974/1974 [00:45<00:00, 43.77it/s]

Epoch 12  Loss 0.942649  Acc 0.729912



100%|██████████| 1974/1974 [00:44<00:00, 44.27it/s]

Epoch 13  Loss 0.902020  Acc 0.740264



100%|██████████| 1974/1974 [00:43<00:00, 44.91it/s]

Epoch 14  Loss 0.864805  Acc 0.749970



100%|██████████| 1974/1974 [00:43<00:00, 45.18it/s]

Epoch 15  Loss 0.831406  Acc 0.758669



100%|██████████| 1974/1974 [00:44<00:00, 44.42it/s]

Epoch 16  Loss 0.800684  Acc 0.766786



100%|██████████| 1974/1974 [00:44<00:00, 44.72it/s]

Epoch 17  Loss 0.772631  Acc 0.774187



100%|██████████| 1974/1974 [00:44<00:00, 43.96it/s]

Epoch 18  Loss 0.746672  Acc 0.781133



100%|██████████| 1974/1974 [00:44<00:00, 44.20it/s]

Epoch 19  Loss 0.723037  Acc 0.787453
Training Done





Save the model in file.

In [25]:
tfModelFile = os.path.join(outDir, "dnn.h5")

model.save(tfModelFile, include_optimizer=False)

Now we predict the network output for test data for decoding. We do the same processing as training feature.

In [26]:
testFeatFile = os.path.join(dataDir, "exp", "test_mfcc.ark")

testFeat = exkaldi.load_feat(testFeatFile)

testFeat = testFeat.to_numpy()

testFeat = testFeat.normalize(std=True)

In [27]:
prob = {}
for utt, mat in testFeat.items:
    logits = model(mat, training=False)
    prob[utt] = logits.numpy()

prob = exkaldi.load_prob(prob)

prob

<exkaldi.core.achivements.NumpyProbability at 0x7f481fb70f98>

___prob___ is an exkaldi __NumpyProbability__ object. Save it to file. We will decode it in the next step.

In [28]:
probFile = os.path.join(outDir, "amp.npy")

prob.save(probFile)

'librispeech_dummy/exp/train_DNN/amp.npy'