# Welcome to Exkaldi

In this section, we will train a monophone HMM-GMM model.

In [1]:
import exkaldi

import os
dataDir = os.path.join("..","examplesdata","librispeech_dummy")

Firstly, prepare lexicons. We have generated and saved a LexiconBank object in file already (3_prepare_lexicons). So restorage it directly.

In [2]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f0685966a58>

Then we need to  the HMM-GMM toponology file and acoustic feature data to initialize a monophone HMM-GMM.

In [3]:
topoFile = os.path.join(dataDir, "exp", "topo")

exkaldi.hmm.make_toponology(lexicons, outFile=topoFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/topo'

In early step (2_feature_processing), we have made a mfcc feature, now use it.

In [4]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

feat = exkaldi.load_feat(featFile, name="mfcc")

feat

<exkaldi.core.achivements.BytesFeature at 0x7f0685853f60>

Now, make a HMM-GMM model.

In [5]:
model0 = exkaldi.hmm.MonophoneHMM(lexicons=lexicons, name="mono")

model0

<exkaldi.hmm.hmm.MonophoneHMM at 0x7f0685853c88>

___model0___ is an Exkaldi __MonophoneHMM__ object. Exkaldi have monophone HMM-GMM and triphone HMM-GMM achivements. We will introduce the later in next tutorial steps.

Now, this model is void and unavaliable. We must initialize it's archtecture.

In [6]:
model0.initialize(feat=feat, topoFile=topoFile)

model0.info

ModelInfo(phones=69, pdfs=211, transitionIds=438, transitionStates=211, dimension=117, gaussians=211)

Then we are about to train this model. We provide a high-level API, __model0.train(...)__ to train this model in a nutshell, but we still introduce the basic training step by step.

### Train HMM-GMM in detail

#### 1. Prepare the int-ID format transcription.

We actually use the transcription with int-ID format, so we convert text format to int-ID format firstly.

In [7]:
transFile = os.path.join(dataDir, "train", "text")
oov = lexicons("oov")

trans = exkaldi.hmm.transcription_to_int(transFile, lexicons, oov)

type(trans)

exkaldi.core.achivements.Transcription

___trans___ is an exkaldi __Transcription__ object, which is designed to hold the transcription. We save the int-format transcription for further using.

In [8]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans.save(intTransFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/text.int'

Look the transcription.

In [9]:
trans.subset(nHead=1)

{'103-1240-0000': '201 875 800 1004 744 653 1239 800 1004 744 725 671 1395 1268 96 751 1064 328 348 648 4 724 588 501 1416 36 53 687 367 53 1314 177 4 168'}

#### 2. Compile the train graph.

Compile the train graph. Here, L.fst file is necessary. In early step (3_prepare_lexicons), we have generated one, now use it.

In [10]:
Lfile = os.path.join(dataDir, "exp", "L_disambig.fst")

Even though decision tree is actually useless when traing monophone HMM-GMM, Kaldi still need it. 

When the monophone HMM is initialized, a temp tree is generated automatically. Use it directly.

In [11]:
tree = model0.tree

tree

<exkaldi.hmm.hmm.DecisionTree at 0x7f06f4d18c88>

___tree___ is an exkaldi DecisionTree object. In next step, we will introduce how to build a normal decision tree. But now, skip it.

In [12]:
outDir = os.path.join(dataDir, "exp", "train_mono")

exkaldi.utils.make_dependent_dirs(outDir, False)

In [13]:
trainGraphFile = os.path.join(outDir, "train_graph")

model0.compile_train_graph(tree=tree, transcription=trans, LFile=Lfile, outFile=trainGraphFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_mono/train_graph'

#### 3. Align acoustic feature averagely in order to start the first train step.

Kaldi align feature equally in the first step.

In [14]:
ali = model0.align_equally(feat, trainGraphFile)

ali

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f06f4c8bcf8>

___ali___ is an exkaldi BytesAlignmentTrans object.

You can covert it to numpy format to check it.

In [15]:
ali.subset(nHead=1).to_numpy(aliType="transitionID", hmm=model0).data

{'103-1240-0000': array([  2,   1,   1, ..., 279, 282, 281], dtype=int32)}

#### 4. Use alignment to accumulate the statistics in order to update the parameters of model

In [16]:
statsFile = os.path.join(outDir, "stats.acc")

model0.accumulate_stats(feat=feat, alignment=ali, outFile=statsFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_mono/stats.acc'

#### 5. Use these statistics to update model parameters.

This step can increase the numbers of gaussians. We try to use 10 more gaussians.

In [17]:
targetGaussians = model0.info.gaussians + 10

model0.update(statsFile, numgauss=targetGaussians)

model0.info

ModelInfo(phones=69, pdfs=211, transitionIds=438, transitionStates=211, dimension=117, gaussians=221)

In next training step, use Viterbi aligning to instead of average aligning.

#### 6. Align acoustic feature with Vertibi algorithm.

In [18]:
del ali

In [19]:
ali = model0.align(feat=feat, trainGraphFile=trainGraphFile)

ali

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f06f4d9be10>

In [20]:
ali.subset(nHead=1).to_numpy(aliType="transitionID", hmm=model0).data

{'103-1240-0000': array([146, 148, 150, ..., 277, 280, 282], dtype=int32)}

Actually, we have a high-level API to train the model.

### Train HMM-GMM with high-level API

In [21]:
os.remove(trainGraphFile)
os.remove(statsFile)
del ali
del trans

We try to train 10 iterations.

In [22]:
model0.train(feat, transFile, Lfile, tempDir=outDir, num_iters=10, max_iter_inc=8, totgauss=500)

Start to train mono model.
Start Time: 2020/06/02-11:50:59
Convert transcription to int value format.
Compiling training graph.

Iter 0
Aligning data equally >> Accumulate GMM statistics >> Update GMM parameters

Iter 1
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 2
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 3
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 4
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 5
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 6
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 7
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 8
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 9
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 10
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Align last time with final model.

D

In [23]:
model0.info

ModelInfo(phones=69, pdfs=211, transitionIds=438, transitionStates=211, dimension=117, gaussians=527)

Final model and alignment are saved in files automatically. You can save them manually. 

In [24]:
#modelFile = os.path.join(outDir, "0.mdl")
#model0.save(modelFile)