# Welcome to ExKaldi

In this section, we will prepare various lexicons.

In [None]:
import exkaldi

import os
dataDir = "librispeech_dummy"

In ExKaldi, most of lexicons are generated automatically when a pronounce file is provided.  
You can specify the silence words, and we will give them pronounce symbol same as the word if you provided a list.  
You can specify a symbol for OOV, and we will give them pronounce symbol same as the word if you provided a list.  
In this tutorials, we only make position independent lexicons.

In [None]:
lexiconFile = os.path.join(dataDir, "pronunciation.txt")

silWords={"<SIL>":"<SIL>",  # silence and its' pronunciation
          "<SPN>":"<SPN>"}  # spoken noise and its' pronunciation
unkSymbol={"<UNK>":"<SPN>"}  # unknown symbol and its' pronunciation
optionalSilPhone = "<SIL>"  # optional silence

lexicons = exkaldi.decode.graph.lexicon_bank(lexiconFile,
                                             silWords,
                                             unkSymbol, 
                                             optionalSilPhone, 
                                             positionDependent = False,
                                             shareSilPdf = False )

lexicons

___lexicons___ is an exkaldi __LexiconBank__ object. It is designed to manage all lexicons.  
Use __.view__ to show all names of generated lexicons.

In [None]:
lexicons.view

You can call a specified lexicon. In particular, if you call "words" or "phones", it will return an exkaldi __ListTable__ object (a subclass of Pyhton dict).

In [None]:
lexicons("silence_phones")

In [None]:
type(lexicons("words"))

All lexicons can be dump to file with Kaldi text format. Some lexicons are allowed to be saved in their int value format.

In [None]:
outFile = os.path.join(dataDir, "exp", "words.txt")

exkaldi.utils.make_dependent_dirs(path=outFile, pathIsFile=True)

lexicons.dump_dict(name="words", fileName=outFile, dumpInt=False)

As memtioned above, Wrod-ID lexicon and Phone-ID lexicon have been made defaultly, you can reset it with your new file.

In [None]:
lexicons("phones")

In [None]:
# newPhonesFile = "myPhones.txt"

# lexicons.reset_phones(target=newPhonesFile)

After a new lexicon probability generated, you can update the probability of all related lexicons.

In [None]:
# newProbFile = "newLexiconp.txt"

# lexicons.update_prob(newProbFile)

__LexiconBank__ object is very useful in ExKaldi. it will be used in almost all training steps.

Now we try to make two Lexicon fsts.

In [None]:
Lfile = os.path.join(dataDir,"exp","L.fst")

exkaldi.decode.graph.make_L(lexicons, outFile=Lfile, useDisambigLexicon=False)

In [None]:
Lfile = os.path.join(dataDir,"exp","L_disambig.fst")

exkaldi.decode.graph.make_L(lexicons, outFile=Lfile, useDisambigLexicon=True)

We can save this __LexiconBank__ object to file.

In [None]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons.save(lexFile)

Actually, besides pronunciation __lexicon__ file, __lexiconp__, __lexiconp_disambig__, __lexiconp_silprob__ and __lexiconp_silprob_disambig__ can also be used to initialize the __LexiconBank__ object.