# Welcome to Exkaldi

In this section, we will train the triphone HMM-GMM up to train_delta step.

In [1]:
import exkaldi

import os
dataDir = os.path.join("..","examplesdata","librispeech_dummy")

Firstly, make a triphone model object.

In [2]:
model1 = exkaldi.hmm.TriphoneHMM()

model1

<exkaldi.hmm.hmm.TriphoneHMM at 0x7f778c5ec0f0>

___model1___ is unavaliable now. We have to initialize its data. 

We will use thses files which have been generated in early steps.

___tree___ and ___treeStats___ : generated in 6_train_decision_tree

___topo___:  generated in 5_train_mono_HMM-GMM

In [3]:
treeFile = os.path.join(dataDir, "exp", "train_tree", "tree")
treeStatsFile = os.path.join(dataDir, "exp", "train_tree", "treeStats.acc")
topoFile = os.path.join(dataDir, "exp", "topo")
numgauss = 1000

In [4]:
model1.initialize(tree=treeFile, treeStatsFile=treeStatsFile, topoFile=topoFile, numgauss=numgauss)

model1.info

ModelInfo(phones=69, pdfs=784, transitionIds=1854, transitionStates=919, dimension=117, gaussians=1000)

The training steps of triphone HMM are almost the same as monophone HMM except that we don't use equally aligning at the first time.

We will introduce the traing step in a nutshell.

### Training in deltail

At the first step, we must generate the new alignment data in the first step. You can convert the lastest alignment data generated by monophone model to a new alignment data corresponding to triphone. Use the generated alignment and final monophone model.

In [5]:
aliFile = os.path.join(dataDir, "exp", "train_mono", "final.ali")
monoFile= os.path.join(dataDir, "exp", "train_mono", "final.mdl")

In [6]:
newAli = exkaldi.hmm.convert_alignment(aliFile, monoFile, model1, treeFile)

newAli

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f777d97a1d0>

In another way, align feature again directly with new triphone model. In the next steps, we will review the steps to train a HMM-GMM model and use it to align acoustic feature.

In [7]:
del newAli

Prepare a lexicons (generated in 3_prepare_lexicons).

In [8]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f777d97a128>

Prepare int-format transcription (generated in 5_train_mono_HMM-GMM ).

In [9]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans = exkaldi.load_trans(intTransFile)

trans.subset(nHead=1)

{'103-1240-0000': '201 875 800 1004 744 653 1239 800 1004 744 725 671 1395 1268 96 751 1064 328 348 648 4 724 588 501 1416 36 53 687 367 53 1314 177 4 168'}

Prepare L.fst file (generated in 3_prepare_lexicons).

In [10]:
Lfile = os.path.join(dataDir, "exp", "L_disambig.fst")

Prepare feature (generated in 2_feature_processing).

In [11]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

feat = exkaldi.load_feat(featFile)

feat

<exkaldi.core.achivements.BytesFeature at 0x7f777d773f98>

#### 1. Compile new train graph.

In [12]:
outDir = os.path.join(dataDir, "exp", "train_delta")

exkaldi.utils.make_dependent_dirs(outDir, False)

In [13]:
trainGraphFile = os.path.join(outDir, "train_graph")

model1.compile_train_graph(tree=treeFile, transcription=trans, LFile=Lfile, outFile=trainGraphFile, lexicons=lexicons)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_delta/train_graph'

#### 2. Align acoustic feature.

In [14]:
ali = model1.align(feat, trainGraphFile, lexicons=lexicons)

ali

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f777d4b3208>

#### 3. Accumulate statistics.

In [16]:
statsFile = os.path.join(outDir, "stats.acc")

model1.accumulate_stats(feat=feat, alignment=ali, outFile=statsFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_delta/stats.acc'

#### 4. Update HMM-GMM parameters.

In [17]:
targetGaussians = 1100

model1.update(statsFile, targetGaussians)

model1.info

ModelInfo(phones=69, pdfs=784, transitionIds=1854, transitionStates=919, dimension=117, gaussians=1100)

### Training in high-level

In this step, we will introduce how to training the triphone in directly.

In [18]:
del model1
del ali
del trans

os.remove(trainGraphFile)
os.remove(statsFile)

Some file paths or objects defined above will be used here. 

Firstly, initialize model. Give lexicons as a optional parameter.

In [20]:
model1 = exkaldi.hmm.TriphoneHMM(lexicons=lexicons)

model1.initialize(tree=treeFile, treeStatsFile=treeStatsFile, topoFile=topoFile, numgauss=numgauss)

model1.info

ModelInfo(phones=69, pdfs=784, transitionIds=1854, transitionStates=919, dimension=117, gaussians=1000)

Then train it.

In [22]:
outDir = os.path.join(dataDir, "exp", "train_delta")
transFile = os.path.join(dataDir, "train", "text")

model1.train(feat=feat, transcription=transFile, LFile=Lfile, tree=treeFile, tempDir=outDir, 
             num_iters=10, max_iter_inc=8, totgauss=1500)

Start to train triphone model.
Start Time: 2020/06/02-12:17:46
Convert transcription to int value format.
Compiling training graph.

Iter 1
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 2
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 3
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 4
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 5
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 6
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 7
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 8
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 9
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Iter 10
Aligning data >> Accumulate GMM statistics >> Update GMM parameter

Align last time with final model.

Done to train the triphone model.
Saved Final Model: ../examplesdata/librispeech_

Final model and alignment have been saved in file automatically. Look the final model information.

In [23]:
model1.info

ModelInfo(phones=69, pdfs=784, transitionIds=1854, transitionStates=919, dimension=117, gaussians=1565)