# Welcome to Exkaldi

In this section, we will build a decision tree. In order to train a triphone model, a decision tree is necessary.

In [1]:
import exkaldi

import os
dataDir = "librispeech_dummy"

Restorage lexicons generated in early step (3_prepare_lexicons).

In [2]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f8f100842b0>

Then instantiate a __DecisionTree__ object. ___lexicons___ can be provided as a parameter.

In [3]:
tree = exkaldi.hmm.DecisionTree(lexicons=lexicons,contextWidth=3,centralPosition=1)

tree

<exkaldi.hmm.hmm.DecisionTree at 0x7f8f10084208>

Then prepare acoustic feature, hmm model and alignment.

In [4]:
featFile = os.path.join(dataDir, "exp", "train_mfcc_cmvn.ark")
feat = exkaldi.load_feat(featFile)
feat = feat.add_delta(order=2)

feat.dim

39

Monophone HMM model and alignment have been generated in last step (5_train_mono_HMM-GMM). Now use them directly.

In [5]:
hmmFile = os.path.join(dataDir, "exp", "train_mono", "final.mdl")

aliFile = os.path.join(dataDir, "exp", "train_mono", "final.ali")

As training the HMM model, we provide high-level API to train tree, but now we still introduce the training steps in detail.

### Train Dicision Tree in detail

#### 1. Accumulate statistics data

In [6]:
outDir = os.path.join(dataDir, "exp", "train_delta")

exkaldi.utils.make_dependent_dirs(outDir, False)

In [7]:
treeStatsFile = os.path.join(outDir, "treeStats.acc")

tree.accumulate_stats(feat, hmmFile, aliFile, outFile=treeStatsFile)

'/misc/Work19/wangyu/exkaldi-1.2/tutorials/librispeech_dummy/exp/train_delta/treeStats.acc'

#### 2. Cluster phones and compile questions.

In [8]:
topoFile = os.path.join(dataDir, "exp", "topo")

questionsFile = os.path.join(outDir, "questions.qst")

tree.compile_questions(treeStatsFile, topoFile, outFile=questionsFile)

'/misc/Work19/wangyu/exkaldi-1.2/tutorials/librispeech_dummy/exp/train_delta/questions.qst'

#### 3. Build tree.

In [9]:
targetLeaves = 300

tree.build(treeStatsFile, questionsFile, targetLeaves, topoFile)

<exkaldi.hmm.hmm.DecisionTree at 0x7f8f10084208>

Decision has been built done. Look it.

In [10]:
tree.info

TreeInfo(numPdfs=272, contextWidth=3, centralPosition=1)

Save the tree to file.

In [11]:
treeFile = os.path.join(outDir, "tree")

tree.save(treeFile)

'/misc/Work19/wangyu/exkaldi-1.2/tutorials/librispeech_dummy/exp/train_delta/tree'

As mentioned above, we provided a high-level API to build tree directly.

### Train Dicision Tree in high-level API

In [12]:
del tree
os.remove(treeStatsFile)
os.remove(questionsFile)
os.remove(treeFile)

In [13]:
tree = exkaldi.hmm.DecisionTree(lexicons=lexicons,contextWidth=3,centralPosition=1)

tree.train(feat=feat, hmm=hmmFile, alignment=aliFile, topoFile=topoFile, numleaves=300, tempDir=outDir)

Start to build decision tree.
Start Time: 2020/06/21-20:15:17
Accumulate tree statistics
Cluster phones and compile questions
Build tree
Done to build the decision tree.
Saved Final Tree: librispeech_dummy/exp/train_delta/tree
End Time: 20200621-201518


Tree has been saved in directory automatically.

In [14]:
tree.info

TreeInfo(numPdfs=272, contextWidth=3, centralPosition=1)