# Welcome to Exkaldi

In this section, we will train a monophone HMM-GMM model.

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

Firstly, prepare lexicons. We have generated and saved a LexiconBank object in file already (3_prepare_lexicons). So restorage it directly.

In [None]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

Then we need to  the HMM-GMM toponology file and acoustic feature data to initialize a monophone HMM-GMM.

In [None]:
topoFile = os.path.join(dataDir, "exp", "topo")

exkaldi.hmm.make_toponology(lexicons, outFile=topoFile)

In early step (2_feature_processing), we have made a mfcc feature, now use it.

In [None]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

feat = exkaldi.load_feat(featFile, name="mfcc")

feat

Now, make a HMM-GMM model.

In [None]:
model0 = exkaldi.hmm.MonophoneHMM(lexicons=lexicons, name="mono")

model0

___model0___ is an Exkaldi __MonophoneHMM__ object. Exkaldi have monophone HMM-GMM and triphone HMM-GMM achivements. We will introduce the later in next tutorial steps.

Now, this model is void and unavaliable. We must initialize it's archtecture.

In [None]:
model0.initialize(feat=feat, topoFile=topoFile)

model0.info

Then we are about to train this model. We provide a high-level API, __model0.train(...)__ to train this model in a nutshell, but we still introduce the basic training step by step.

### Train HMM-GMM in detail

#### 1. Prepare the int-ID format transcription.

We actually use the transcription with int-ID format, so we convert text format to int-ID format firstly.

In [None]:
transFile = os.path.join(dataDir, "train", "text")
oov = lexicons("oov")

trans = exkaldi.hmm.transcription_to_int(transFile, lexicons, oov)

type(trans)

___trans___ is an exkaldi __Transcription__ object, which is designed to hold the transcription. We save the int-format transcription for further using.

In [None]:
intTransFile = os.path.join(dataDir, "exp", "text.int")

trans.save(intTransFile)

Look the transcription.

In [None]:
trans.subset(nHead=1)

#### 2. Compile the train graph.

Compile the train graph. Here, L.fst file is necessary. In early step (3_prepare_lexicons), we have generated one, now use it.

In [None]:
Lfile = os.path.join(dataDir, "exp", "L_disambig.fst")

Even though decision tree is actually useless when traing monophone HMM-GMM, Kaldi still need it. 

When the monophone HMM is initialized, a temp tree is generated automatically. Use it directly.

In [None]:
tree = model0.tree

tree

___tree___ is an exkaldi DecisionTree object. In next step, we will introduce how to build a normal decision tree. But now, skip it.

In [None]:
outDir = os.path.join(dataDir, "exp", "train_mono")

exkaldi.utils.make_dependent_dirs(outDir, False)

In [None]:
trainGraphFile = os.path.join(outDir, "train_graph")

model0.compile_train_graph(tree=tree, transcription=trans, LFile=Lfile, outFile=trainGraphFile)

#### 3. Align acoustic feature averagely in order to start the first train step.

Kaldi align feature equally in the first step.

In [None]:
ali = model0.align_equally(feat, trainGraphFile)

ali

___ali___ is an exkaldi BytesAlignmentTrans object.

You can covert it to numpy format to check it.

In [None]:
ali.subset(nHead=1).to_numpy(aliType="transitionID", hmm=model0).data

#### 4. Use alignment to accumulate the statistics in order to update the parameters of model

In [None]:
statsFile = os.path.join(outDir, "stats.acc")

model0.accumulate_stats(feat=feat, alignment=ali, outFile=statsFile)

#### 5. Use these statistics to update model parameters.

This step can increase the numbers of gaussians. We try to use 10 more gaussians.

In [None]:
targetGaussians = model0.info.gaussians + 10

model0.update(statsFile, numgauss=targetGaussians)

model0.info

In next training step, use Viterbi aligning to instead of average aligning.

#### 6. Align acoustic feature with Vertibi algorithm.

In [None]:
del ali

In [None]:
ali = model0.align(feat=feat, trainGraphFile=trainGraphFile)

ali

In [None]:
ali.subset(nHead=1).to_numpy(aliType="transitionID", hmm=model0).data

Actually, we have a high-level API to train the model.

### Train HMM-GMM with high-level API

In [None]:
os.remove(trainGraphFile)
os.remove(statsFile)
del ali
del trans

We try to train 10 iterations.

In [None]:
model0.train(feat, transFile, Lfile, tempDir=outDir, num_iters=10, max_iter_inc=8, totgauss=500)

In [None]:
model0.info

Final model and alignment are saved in files automatically. You can save them manually. 

In [None]:
#modelFile = os.path.join(outDir, "0.mdl")
#model0.save(modelFile)