# Welcome to Exkaldi

In this section, we will build a decision tree. In order to train a triphone model, a decision tree is necessary.

In [2]:
import exkaldi

import os
dataDir = os.path.join("..","examplesdata","librispeech_dummy")

Restorage lexicons generated in early step (3_prepare_lexicons).

In [3]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7fca6a21f208>

Then initialize a DecisionTree object. ___lexicons___ can be provided as a parameter.

In [4]:
tree = exkaldi.hmm.DecisionTree(lexicons=lexicons)

tree

<exkaldi.hmm.hmm.DecisionTree at 0x7fca6a200d30>

___tree___ is an exkaldi __DecisionTree__ object.

Then prepare acoustic feature, hmm model and alignment.

In [6]:
featFile = os.path.join(dataDir, "exp", "mfcc.ark")

feat = exkaldi.load_feat(featFile)

feat

<exkaldi.core.achivements.BytesFeature at 0x7fcad15387f0>

Monophone HMM model and alignment have been generated in last step (5_train_mono_HMM-GMM). Now use them directly.

In [7]:
hmmFile = os.path.join(dataDir, "exp", "train_mono", "final.mdl")

aliFile = os.path.join(dataDir, "exp", "train_mono", "final.ali")

As training the HMM model, we provide high-level API to train tree, but now we still introduce the training steps indetail.

### Train Dicision Tree in detail

#### 1. Accumulate statistics data

In [9]:
outDir = os.path.join(dataDir, "exp", "train_tree")

exkaldi.utils.make_dependent_dirs(outDir, False)

In [10]:
treeStatsFile = os.path.join(outDir, "treeStats.acc")

tree.accumulate_stats(feat, hmmFile, aliFile, outFile=treeStatsFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_tree/treeStats.acc'

#### 2. Cluster phones and compile questions.

In [11]:
topoFile = os.path.join(dataDir, "exp", "topo")

questionsFile = os.path.join(outDir, "questions.qst")

tree.compile_questions(treeStatsFile, topoFile, outFile=questionsFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_tree/questions.qst'

#### 3. Build tree.

We set 1000 as the target numbers of leaves of tree, that is more than pdfs of monophone model.

In [12]:
model0 = exkaldi.hmm.load_hmm(hmmFile, hmmType="monophone", name="mono")

model0.info.gaussians

527

In [13]:
targetLeaves = 1000

tree.build(treeStatsFile, questionsFile, targetLeaves, topoFile)

<exkaldi.hmm.hmm.DecisionTree at 0x7fca6a200d30>

Decision has been built done. Look it.

In [14]:
tree.info

TreeInfo(numPdfs=784, contextWidth=3, centralPosition=1)

Save the tree to file.

In [15]:
treeFile = os.path.join(outDir, "tree")

tree.save(treeFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/train_tree/tree'

As mentioned above, we provided a high-level API to build tree directly.

### Train Dicision Tree in high-level API

In [20]:
del tree
del model0

os.remove(treeStatsFile)
os.remove(questionsFile)
os.remove(treeFile)

In [21]:
tree = exkaldi.hmm.DecisionTree(lexicons=lexicons)

tree.train(feat=feat, hmm=hmmFile, alignment=aliFile, topoFile=topoFile, numleaves=1000, tempDir=outDir)

Start to build decision tree.
Start Time: 2020/06/02-12:07:55
>> Accumulate tree statistics
>> Cluster phones and compile questions
>> Build tree
Done to build the decision tree.
Saved Final Tree: ../examplesdata/librispeech_dummy/exp/train_tree/tree
End Time: 20200602-120803


Tree has been saved in directory automatically.