# Welcome to Exkaldi

In this section, we will train the triphone HMM-GMM with delta feature.

In [1]:
import exkaldi

import os
dataDir = "librispeech_dummy"

Firstly, instantiate a triphone model object.

In [2]:
model = exkaldi.hmm.TriphoneHMM()

model

<exkaldi.hmm.hmm.TriphoneHMM at 0x7f94a49bd2b0>

___model___ is unavaliable now. We have to initialize its data. We will use these files which have been generated in early steps.

___tree___ and ___treeStats___ : generated in 6_train_decision_tree  
___topo___:  generated in 5_train_mono_HMM-GMM

In [5]:
treeFile = os.path.join(dataDir, "exp", "train_delta", "tree")
treeStatsFile = os.path.join(dataDir, "exp", "train_delta", "treeStats.acc")
topoFile = os.path.join(dataDir, "exp", "topo")

In [7]:
tree = exkaldi.hmm.load_tree(treeFile)

In [8]:
model.initialize(tree=tree, treeStatsFile=treeStatsFile, topoFile=topoFile)

model.info

ModelInfo(phones=69, pdfs=272, transitionIds=698, transitionStates=349, dimension=39, gaussians=272)

The training steps of triphone HMM are almost the same as monophone HMM except that we don't use equally aligning at the first time.

We will introduce the traing step in a nutshell.

### Training in deltail

At the first step, we must generate the new alignment data in the first step. You can convert the lastest alignment data generated by monophone model to a new alignment data corresponding to triphone. Use the generated alignment and final monophone model.

In [9]:
aliFile = os.path.join(dataDir, "exp", "train_mono", "final.ali")
monoFile= os.path.join(dataDir, "exp", "train_mono", "final.mdl")

In [11]:
newAli = exkaldi.hmm.convert_alignment(
                                 alignment=aliFile, 
                                 originHmm=monoFile, 
                                 targetHmm=model, 
                                 tree=treeFile
                            )
newAli

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f948d8d8588>

In another way, align feature again directly with new triphone model. 

In the next steps, we will review the steps to train a HMM-GMM model and use it to align acoustic feature.

In [12]:
del newAli

Prepare a lexicons (generated in 3_prepare_lexicons).

In [13]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f948d8d8a90>

Prepare int-format transcription (generated in 5_train_mono_HMM-GMM ).

In [14]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans = exkaldi.load_trans(intTransFile)

trans.subset(nHead=1)

{'103-1240-0000': '201 875 800 1004 744 653 1239 800 1004 744 725 671 1395 1268 96 751 1064 328 348 648 4 724 588 501 1416 36 53 687 367 53 1314 177 4 168'}

Prepare L.fst file (generated in 3_prepare_lexicons).

In [15]:
Lfile = os.path.join(dataDir, "exp", "L.fst")

Prepare feature (generated in 2_feature_processing).

In [21]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")
feat = exkaldi.load_feat(featFile)
feat = feat.add_delta(order=2)

feat.dim

39

#### 1. Compile new train graph.

In [17]:
outDir = os.path.join(dataDir, "exp", "train_delta")

exkaldi.utils.make_dependent_dirs(outDir, pathIsFile=False)

In [18]:
trainGraphFile = os.path.join(outDir, "train_graph")

model.compile_train_graph(tree=treeFile, transcription=trans, LFile=Lfile, outFile=trainGraphFile, lexicons=lexicons)

'/misc/Work19/wangyu/exkaldi-1.2/tutorials/librispeech_dummy/exp/train_delta/train_graph'

#### 2. Align acoustic feature.

In [22]:
ali = model.align(feat, trainGraphFile, lexicons=lexicons)

ali

<exkaldi.core.achivements.BytesAlignmentTrans at 0x7f948cd32a20>

#### 3. Accumulate statistics.

In [23]:
statsFile = os.path.join(outDir, "stats.acc")

model.accumulate_stats(feat=feat, alignment=ali, outFile=statsFile)

'/misc/Work19/wangyu/exkaldi-1.2/tutorials/librispeech_dummy/exp/train_delta/stats.acc'

#### 4. Update HMM-GMM parameters.

In [25]:
targetGaussians = 300

model.update(statsFile, targetGaussians)

model.info

ModelInfo(phones=69, pdfs=272, transitionIds=698, transitionStates=349, dimension=39, gaussians=300)

### Training in high-level

In this step, we will introduce how to training the triphone in directly.

In [26]:
del model
del ali
del trans

os.remove(trainGraphFile)
os.remove(statsFile)

Some file paths or objects defined above will be used here. 

Firstly, initialize model. Give lexicons as a optional parameter.

In [27]:
model = exkaldi.hmm.TriphoneHMM(lexicons=lexicons)

model.initialize(tree=tree, treeStatsFile=treeStatsFile, topoFile=topoFile)

model.info

ModelInfo(phones=69, pdfs=272, transitionIds=698, transitionStates=349, dimension=39, gaussians=272)

Then train it.

In [29]:
outDir = os.path.join(dataDir, "exp", "train_delta")
transFile = os.path.join(dataDir, "train", "text")

model.train(feat=feat, 
             transcription=transFile, 
             LFile=Lfile, 
             tree=tree, 
             tempDir=outDir, 
             num_iters=10, 
             max_iter_inc=8, 
             totgauss=1500
            )

Start to train triphone model.
Start Time: 2020/06/21-20:27:34
Convert transcription to int value format.
Compiling training graph.
Iter >> 1
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.3642 seconds
Iter >> 2
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.3717 seconds
Iter >> 3
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.4882 seconds
Iter >> 4
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.5767 seconds
Iter >> 5
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.6665 seconds
Iter >> 6
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.7272 seconds
Iter >> 7
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.6726 seconds
Iter >> 8
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.7892 seconds
Iter >> 9
Aligning data
Accumulate GMM statistics
Update GMM parameter
Used time: 1.8318 sec

Final model and alignment have been saved in file automatically. Look the final model information.

In [30]:
model.info

ModelInfo(phones=69, pdfs=272, transitionIds=698, transitionStates=349, dimension=39, gaussians=1650)