# Welcome to Exkaldi

In this section, we will introduce how to make lexicons.

In [3]:
import exkaldi

import os
dataDir = os.path.join("..","examplesdata","librispeech_dummy")

In exkaldi, most of lexicons are generated automatically when a pronounce file is provided.

You can specify the silence words, and we will give them pronounce symbol same as the word if you provided a list.

You can specify a symbol for OOV, and we will give them pronounce symbol same as the word if you provided a list.

In this tutorials, we only make position independent lexicons.

In [4]:
lexiconFile = os.path.join(dataDir, "pronunciation.txt")

silWords={"<SIL>":"<SIL>", 
          "<SPN>":"<SPN>"}
unkSymbol={"<UNK>":"<SPN>"}
optionalSilPhone = "<SIL>"

lexicons = exkaldi.decode.graph.lexicon_bank(lexiconFile, 
                                             silWords, 
                                             unkSymbol, 
                                             optionalSilPhone, 
                                             positionDependent = False,
                                             shareSilPdf = True )

lexicons

<exkaldi.decode.graph.LexiconBank at 0x7f1d9d48f940>

___lexicons___ is an exkaldi __LexiconBank__ object. It is designed to manage all lexicons.  

Use __.view__ to show all names of generated lexicons.

In [5]:
lexicons.view

['lexiconp',
 'disambig',
 'lexiconp_disambig',
 'silence_phones',
 'optional_silence',
 'nonsilence_phones',
 'phone_map',
 'silence_phone_map',
 'nonsilence_phone_map',
 'extra_questions',
 'silence',
 'nonsilence',
 'context_indep',
 'wdisambig',
 'wdisambig_phones',
 'wdisambig_words',
 'align_lexicon',
 'oov',
 'sets',
 'roots',
 'phones',
 'words']

 And you can call a specified lexicon. If you call "words" or "phones", it will return an exkaldi __ListTable__ object (a subclass of Pyhton dict).

In [6]:
lexicons("silence_phones")

['<SIL>', '<SPN>']

In [8]:
type(lexicons("words"))

exkaldi.core.achivements.ListTable

All lexicons can be saved to file. Some lexicons supported to save their int value format.

In [9]:
outFile = os.path.join(dataDir, "exp", "words.txt")

exkaldi.utils.make_dependent_dirs(path=outFile, pathIsFile=True)

lexicons.dump_dict(name="words", outFile=outFile, dumpInt=False)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/words.txt'

As memtioned above, Wrod-ID lexicon and Phone-ID lexicon have been made defaultly, you can reset it by your new file.

In [None]:
# newWordsFile = "myWords.txt"

# lexicons.reset_words(target=newWordsFile)

After a new lexicon probability generated, you can update the probability of all related lexicons.

In [None]:
# newProbFile = "newLexiconp.txt"

# lexicons.update_prob(newProbFile)

__LexiconBank__ object will be useful in other training steps.

Now we will try to make a disambiguation lexicon fst.

In [10]:
Lfile = os.path.join(dataDir,"exp","L_disambig.fst")

exkaldi.decode.graph.make_L(lexicons, outFile=Lfile, useDisambigLexicon=True)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/L_disambig.fst'

We can save this LexiconBank object to file.

In [11]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons.save(lexFile)

'/misc/Work19/wangyu/exkaldi-1.0/examplesdata/librispeech_dummy/exp/lexicons.lex'

Actually, besides __lexicon__ file, __lexiconp__, __lexiconp_disambig__, __lexiconp_silprob__ and __lexiconp_silprob_disambig__ can also be used to initialize the __LexiconBank__ object.