# Welcome to Exkaldi

In this section, we will further process the Kaldi decoding lattice and score the results.

In [1]:
import exkaldi

import os
dataDir = os.path.join("..","examplesdata","librispeech_dummy")

Load the lattice file (generated in 09_decode_back_HMM-GMM_and_WFST).

In [3]:
latFile = os.path.join(dataDir, "exp", "decode_test", "test.lat")

lat = exkaldi.decode.wfst.load_lat(latFile)

lat

<exkaldi.decode.wfst.Lattice at 0x7ffa8cd7e748>

To be simple and straightforward, we get the 1-best result from lattice. Word-id table and HMM model are necessary.

Word-id table can be words.txt file (If decoded in word level) or phones.txt file (If decoded in phone level) or Exkaldi ListTable object. 
Ideally, LexiconBank object is also avaliable because you can get both "words" and "phones" from it.

In [4]:
wordsFile = os.path.join(dataDir, "exp", "words.txt")

hmmFile = os.path.join(dataDir, "exp", "train_delta", "final.mdl")

In [5]:
result = lat.get_1best(wordSymbolTable=wordsFile, hmm=hmmFile, lmwt=1, acwt=0.5)

result.subset(nHead=1)

{'103-1240-0000': '182 790 718 908 670 595 1120 718 908 670 1155 780 605 1259 1146 89 676 965 132 314 584 4 653 529 449 1279 35 50 619 329 50 1187 161 4 153'}

___result___ is a exkaldi __Transcription__ object (a subclass of Python dict).

The decoding result is int-ID format. If you want it by text-format, try this:

In [6]:
textResult = exkaldi.hmm.transcription_from_int(result, wordsFile)

textResult.subset(nHead=1)

{'103-1240-0000': 'CHAPTER ONE MISSUS RACHEL LYNDE ITS SURPRISED MISSUS RACHEL LYNDE THEY OF JUST WHERE THE AVONLEA MAIN ROAD BIT DOWN INTO A LITTLE HOLLOW FRINGED WITH ALDERS AN LADIES EARDROPS AN TRAVERSED BY A BROOK'}

Just for convenience, we restorage lexicons.

In [None]:
lexFile = os.path.join(dataDir, "exp", "lexicons.lex")

lexicons = exkaldi.decode.graph.load_lex(lexFile)

lexicons

In [7]:
del textResult

We can transform transcription by using the __Transcription__'s own method, like this:

In [8]:
word2id = lexicons("words")
oovID = word2id[lexicons("oov")]
id2word = word2id.reverse()

textResult = result.convert(id2word, oovID)

textResult.subset(nHead=1)

{'103-1240-0000': 'CHAPTER ONE MISSUS RACHEL LYNDE ITS SURPRISED MISSUS RACHEL LYNDE THEY OF JUST WHERE THE AVONLEA MAIN ROAD BIT DOWN INTO A LITTLE HOLLOW FRINGED WITH ALDERS AN LADIES EARDROPS AN TRAVERSED BY A BROOK'}

In [9]:
del result

Now we can score the decoding result. Typically, you can compute the WER(word err rate).

In [10]:
refFile = os.path.join(dataDir, "test", "text")

score = exkaldi.decode.score.wer(ref=refFile, hyp=textResult, mode="present")

score

Score(WER=35.34, words=3707, insErr=572, delErr=16, subErr=722, SER=99.0, sentences=99, wrongSentences=100, missedSentences=0)

Or some times, compute the edit distance score. 

In [11]:
score = exkaldi.decode.score.edit_distance(ref=refFile, hyp=textResult, mode="present")

score

Score(editDistance=2537, words=16504, SER=0.99, sentences=100, wrongSentences=99, missedSentences=0)

Then compute the accuracy of words

In [12]:
1 - score.editDistance/score.words

0.8462796897721765

We support further process the lattice, for example, to add penalty or to scale it.

Here is a example to config different language model weight(LMWT) and penalty. (In Instead of text-format result, we use int-format reference file.)

In [None]:
refInt = exkaldi.hmm.transcription_to_int(refFile, lexicons("words"), unkSymbol=lexicons("oov"))
refIntFile = os.path.join(dataDir, "exp", "decode_test", "text.int")
refInt.save(refIntFile)

refInt.subset(nHead=1)

In [13]:
for penalty in [0., 0.5, 1.0]:
    for LMWT in range(10, 15):
        
        newLat = lat.add_penalty(penalty)
        result = newLat.get_1best(lexicons("words"), hmmFile, lmwt=LMWT, acwt=0.5)

        score = exkaldi.decode.score.wer(ref=refInt, hyp=result, mode="present")
        
        print(f"Penalty {penalty}, LMWT {LMWT}: WER {score.WER}")

Penalty 0.0, LMWT 10: WER 35.01
Penalty 0.0, LMWT 11: WER 35.01
Penalty 0.0, LMWT 12: WER 35.01
Penalty 0.0, LMWT 13: WER 35.01
Penalty 0.0, LMWT 14: WER 35.01
Penalty 0.5, LMWT 10: WER 35.01
Penalty 0.5, LMWT 11: WER 35.01
Penalty 0.5, LMWT 12: WER 35.01
Penalty 0.5, LMWT 13: WER 35.01
Penalty 0.5, LMWT 14: WER 35.01
Penalty 1.0, LMWT 10: WER 34.99
Penalty 1.0, LMWT 11: WER 34.99
Penalty 1.0, LMWT 12: WER 34.99
Penalty 1.0, LMWT 13: WER 34.99
Penalty 1.0, LMWT 14: WER 34.99


From the lattice, you can get the phone-level result.

In [14]:
phoneResult = lat.get_1best(lexicons("phones"), hmmFile, lmwt=1, acwt=0.5, phoneLevel=True)

phoneResult = exkaldi.hmm.transcription_from_int(phoneResult, lexicons("phones"))

phoneResult.subset(nHead=1)

{'103-1240-0000': '<SIL> CH AE1 P T ER0 W AH1 N <SIL> M IH1 S IH0 Z R EY1 CH AH0 L L IH1 N D <SIL> IH0 T S S AH0 P R AY1 Z D <SIL> M IH1 S IH0 Z R EY1 CH AH0 L L IH1 N D <SIL> DH EY1 AH0 V <SIL> JH AH1 S T W EH1 R DH IY0 <SIL> AE1 V AH0 N L IY2 M EY1 N R OW1 D <SIL> B IH1 T <SIL> D AW1 N IH1 N T UW0 AH0 L IH1 T AH0 L HH AA1 L OW0 <SIL> F R IH1 N JH D W IH1 DH AO1 L D ER0 Z AH0 N L EY1 D IY0 Z IH1 R D R AA2 P S AH0 N T R AH0 V ER1 S T B AY1 AH0 B R UH1 K <SIL>'}

From lattice, N-Best results can also be extracted.

In [15]:
result = lat.get_nbest(
                        n=3,
                        wordSymbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=False,
                )

for re in result:
    print(re.name, type(re))

1-best <class 'exkaldi.core.achivements.Transcription'>
2-best <class 'exkaldi.core.achivements.Transcription'>
3-best <class 'exkaldi.core.achivements.Transcription'>


___phoneResult___ is a list of N-bests __Transcription__ objects. If ___requireCost___ is True, return the LM score and AM score sumultaneously.

In [16]:
result = lat.get_nbest(
                        n=3,
                        wordSymbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=True,
                )

for re in result[0]:
    print(re.name, type(re))
    
for re in result[1]:
    print(re.name, type(re))

for re in result[2]:
    print(re.name, type(re))

1-best <class 'exkaldi.core.achivements.Transcription'>
2-best <class 'exkaldi.core.achivements.Transcription'>
3-best <class 'exkaldi.core.achivements.Transcription'>
AM-1-best <class 'exkaldi.core.achivements.Cost'>
AM-2-best <class 'exkaldi.core.achivements.Cost'>
AM-3-best <class 'exkaldi.core.achivements.Cost'>
LM-1-best <class 'exkaldi.core.achivements.Cost'>
LM-2-best <class 'exkaldi.core.achivements.Cost'>
LM-3-best <class 'exkaldi.core.achivements.Cost'>


And importantly, Alignment can be returned to support discriminative training. 

In [17]:
result = lat.get_nbest(
                        n=3,
                        wordSymbolTable=lexicons("words"),
                        hmm=hmmFile, 
                        acwt=0.5, 
                        phoneLevel=False,
                        requireCost=False,
                        requireAli=True,
                )

for re in result[1]:
    print(re.name, type(re))

1-best <class 'exkaldi.core.achivements.NumpyAlignmentTrans'>
2-best <class 'exkaldi.core.achivements.NumpyAlignmentTrans'>
3-best <class 'exkaldi.core.achivements.NumpyAlignmentTrans'>
