## Tips
- To avoid unpleasant surprises, I suggest you _run all cells in their order of appearance_ (__Cell__ $\rightarrow$ __Run All__).


- If the changes you've made to your solution don't seem to be showing up, try running __Kernel__ $\rightarrow$ __Restart & Run All__ from the menu.


- Before submitting your assignment, make sure everything runs as expected. First, restart the kernel (from the menu, select __Kernel__ $\rightarrow$ __Restart__) and then **run all cells** (from the menu, select __Cell__ $\rightarrow$ __Run All__).

## Reminder

- Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name, UA email, and collaborators below:



Several of the cells in this notebook are **read only** to ensure instructions aren't unintentionally altered.  

If you can't edit the cell, it is probably intentional.

In [None]:
NAME = "Wenmo Sun"
# University of Arizona email address
EMAIL = "wmsun@email.arizona.edu"
# Names of any collaborators.  Write N/A if none.
COLLABORATORS = "N/A"

## Scratchpad

You are welcome to create new cells (see the __Cell__ menu) to experiment and debug your solution.

In [None]:
%load_ext autoreload
%autoreload 2

# Mini Python tutorial

This course uses Python 3.8.

Below is a very basic (and incomplete) overview of the Python language... 

For those completely new to Python, [this section of the official documentation may be useful](https://docs.python.org/3.8/library/stdtypes.html#common-sequence-operations).

In [1]:
# This is a comment.  
# Any line starting with # will be interpreted as a comment

# this is a string assigned to a variable
greeting = "hello"

# If enclosed in triple quotes, strings can also be multiline:

"""
I'm a multiline
string.
"""

# let's use a for loop to print it letter by letter
for letter in greeting:
    print(letter)
    
# Did you notice the indentation there?  Whitespace matters in Python!

# here's a list of integers

numbers = [1, 2, 3, 4]

# let's add one to each number using a list comprehension
# and assign the result to a variable called res
# list comprehensions are used widely in Python (they're very Pythonic!)

res = [num + 1 for num in numbers]

# let's confirm that it worked
print(res)

# now let's try spicing things up using a conditional to filter out all values greater than or equal to 3...
print([num for num in res if not num >= 3])

# Python 3.7 introduced "f-strings" as a convenient way of formatting strings using templates
# For example ...
name = "Josuke"

print(f"{greeting}, {name}!")

# f-strings are f-ing convenient!


# let's look at defining functions in Python..

def greet(name):
    print(f"Howdy, {name}!")

# here's how we call it...

greet("partner")

# let's add a description of the function...

def greet(name):
    """
    Prints a greeting given some name.
    
    :param name: the name to be addressed in the greeting
    :type name: str
    
    """
    print(f"Howdy, {name}!")
    
# I encourage you to use docstrings!

# Python introduced support for optional type hints in v3.5.
# You can read more aobut this feature here: https://docs.python.org/3.8/library/typing.html
# let's give it a try...
def add_six(num: int) -> int:
    return num + 6

# this should print 13
print(add_six(7))

# Python also has "anonymous functions" (also known as "lambda" functions)
# take a look at the following code:

greet_alt = lambda name: print(f"Hi, {name}!")

greet_alt("Fred")

# lambda functions are often passed to other functions
# For example, they can be used to specify how a sequence should be sorted
# let's sort a list of pairs by their second element
pairs = [("bounce", 32), ("bighorn", 12), ("radical", 4), ("analysis", 7)]
# -1 is last thing in some sequence, -2 is the second to last thing in some seq, etc.
print(sorted(pairs, key=lambda pair: pair[-1]))

# we can sort it by the first element instead
# NOTE: python indexing is zero-based
print(sorted(pairs, key=lambda pair: pair[0]))

# You can learn more about other core data types and their methods here: 
# https://docs.python.org/3.8/library/stdtypes.html

# Because of its extensive standard library, Python is often described as coming with "batteries included".  
# Take a look at these "batteries": https://docs.python.org/3.8/library/

# You now know enough to complete this homework assignment (or at least where to look)

h
e
l
l
o
[2, 3, 4, 5]
[2]
hello, Josuke!
Howdy, partner!
13
Hi, Fred!
[('radical', 4), ('analysis', 7), ('bighorn', 12), ('bounce', 32)]
[('analysis', 7), ('bighorn', 12), ('bounce', 32), ('radical', 4)]


In [1]:
from typing import Iterator, Sequence, Text, Tuple, Union

import numpy as np
from scipy.sparse import spmatrix, vstack
from sklearn.linear_model import LogisticRegression

import itertools
import pytest

# an NDArray is either a numpy array (ndarray) or a scipy sparse matrix (spmatrix)
NDArray  = Union[np.ndarray, spmatrix]
# type aliases for sequences of strings
# we'll use this type alias for our tokens
TokenSeq = Sequence[Text]
# ...and this one for our POS tags
TagSeq   = Sequence[Text]

In [2]:
np.random.seed(42)

In [3]:
# Add your imports here (ex. classes from scikit-learn)
# YOUR CODE HERE
import re
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import LabelEncoder
# raise NotImplementedError()

## `read_ptbtagged`


In [4]:
def read_ptbtagged(ptbtagged_path: str) -> Iterator[Tuple[TokenSeq, TagSeq]]:
    """
    Reads sentences from a Penn TreeBank .tagged file.
    Each sentence is a sequence of tokens and part-of-speech tags.

    Penn TreeBank .tagged files contain one token per line, with an empty line
    marking the end of each sentence. Each line is composed of a token, a tab
    character, and a part-of-speech tag. Here is an example:

        What	WP
        's	VBZ
        next	JJ
        ?	.

        Slides	NNS
        to	TO
        illustrate	VB
        Shostakovich	NNP
        quartets	NNS
        ?	.

    :param ptbtagged_path: The path of a Penn TreeBank .tagged file, formatted
    as above.
    :return: An iterator over sentences, where each sentence is a tuple of
    a sequence of tokens and a corresponding sequence of part-of-speech tags.
    """
    # YOUR CODE HERE
    readfile = open(ptbtagged_path).read().strip()
#     print(readfile)
    lines = re.sub("\n\t\n", "\n\n", readfile).split("\n\n")
#     print(lines)
    return iter([([token_tag.split('\t')[0] for token_tag in sent.split('\n')], [token_tag.split('\t')[1] for token_tag in sent.split('\n')]) for sent in lines])
            

# read_ptbtagged('data/PTBSmall/train.tagged')      
#     raise NotImplementedError()

## `Classifier`

In [16]:
# Our MEMM
class Classifier:
    def __init__(self):
        """
        Initializes the classifier.
        """
        self.label_encoder = LabelEncoder()
        # Use `DictVectorizer` to record your features.
        # See https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
        #
        # Minimally, you must include the following features:
        # `token` (the current word) 
        # `pos-1` (the prior tag) 
        self.feature_encoder = DictVectorizer()
        # multinomial logistic regression
        self.model = LogisticRegression(solver="liblinear", multi_class="ovr")

#     # YOUR CODE HERE
#     raise NotImplementedError()

#     # YOUR CODE HERE
#     raise NotImplementedError()

    def train(self, tagged_sentences: Iterator[Tuple[TokenSeq, TagSeq]]) -> Tuple[NDArray, NDArray]:
        """
        Trains the classifier on the part-of-speech tagged sentences,
        and returns the feature matrix and label vector on which it was trained.

        The feature matrix should have one row per training token. The number
        of columns is up to the implementation, but there must at least be 1
        feature for each token, named "token=T", where "T" is the token string,
        and one feature for the part-of-speech tag of the preceding token,
        named "pos-1=P", where "P" is the part-of-speech tag string, or "<s>" if
        the token was the first in the sentence. For example, if the input is:

            What	WP
            's	VBZ
            next	JJ
            ?	.

        Then the first row in the feature matrix should have features for
        "token=What" and "pos-1=<s>", the second row in the feature matrix
        should have features for "token='s" and "pos-1=WP", etc. The alignment
        between these feature names and the integer columns of the feature
        matrix is given by the `feature_index` method below.

        The label vector should have one entry per training token, and each
        entry should be an integer. The alignment between part-of-speech tag
        strings and the integers in the label vector is given by the
        `label_index` method below.

        :param tagged_sentences: An iterator over sentences, where each sentence
        is a tuple of a sequence of tokens and a corresponding sequence of
        part-of-speech tags.
        
        :return: A tuple of (feature-matrix, label-vector).
        """
        # YOUR CODE HERE
        # break the tagged sentences into feature matrix, label vector
        features = []
        labels = []
        for tokens_tags in tagged_sentences:
            tokens, pos_tags = tokens_tags
            # at least one feature for each token, and one for POS
#             features.append({'token': tokens[0], 'pos-1': '<s>'})
            features.append({'token': tokens[0], 'token-1': '<s>', 'pos-1': '<s>'})
            labels.append(pos_tags[0])
            
            if len(tokens) > 1:
                for i in range(1, len(tokens)):
#                     features.append({'token': tokens[i], 'pos-1': pos_tags[i - 1]})
                    features.append({'token': tokens[i], 'token-1': tokens[i - 1], 'pos-1': pos_tags[i-1]})
                    labels.append(pos_tags[i])
        
#         print("original features: ", features)
#         print("original labels: ", labels)
#         feature_matrix and label_vector
#         self.feature_encoder.fit(features)
#         print("features length: ", len(features))
        feature_matrix = self.feature_encoder.fit_transform(features)
#         print("feature_matrix after transform", feature_matrix)
        
#         print("label length: ", len(labels))
#         self.label_encoder.fit(labels)
        label_vector = self.label_encoder.fit_transform(labels)
#         print("label_vector after transform", label_vector)
        
        # train the model with features and labels
        self.trained_model = self.model.fit(feature_matrix, label_vector)
        
        return (feature_matrix, label_vector)
        
#         raise NotImplementedError()

    def feature_index(self, feature: Text) -> int:
        """
        Returns the column index corresponding to the given named feature.

        The `train` method should always be called before this method is called.

        :param feature: The string name of a feature.
        
        :return: The column index of the feature in the feature matrix returned
        by the `train` method.
        """
        # YOUR CODE HERE
        feature_names = self.feature_encoder.get_feature_names()
#         print("feature_names returned by feature index: ", feature_names)
        return feature_names.index(feature)
#         raise NotImplementedError()

    def label_index(self, label: Text) -> int:
        """
        Returns the integer corresponding to the given part-of-speech tag

        The `train` method should always be called before this method is called.

        :param label: The part-of-speech tag string.
        
        :return: The integer for the part-of-speech tag, to be used in the label
        vector returned by the `train` method.
        """
        # YOUR CODE HERE
        return list(self.label_encoder.classes_).index(label)
        
#         raise NotImplementedError()

    def predict(self, tokens: TokenSeq) -> TagSeq:
        """
        Predicts part-of-speech tags for the sequence of tokens.

        This method delegates to either `predict_greedy` or `predict_viterbi`.
        The implementer may decide which one to delegate to.

        :param tokens: A sequence of tokens representing a sentence.
        
        :return: A sequence of part-of-speech tags, one for each token.
        """
        _, pos_tags = self.predict_greedy(tokens)
        # _, _, pos_tags = self.predict_viterbi(tokens)
        return pos_tags

    def predict_greedy(self, tokens: TokenSeq) -> Tuple[NDArray, TagSeq]:
        """
        Predicts part-of-speech tags for the sequence of tokens using a
        greedy algorithm, and returns the feature matrix and predicted tags.

        Each part-of-speech tag is predicted one at a time, and each prediction
        is considered a hard decision, that is, when predicting the
        part-of-speech tag for token i, the model will assume that its
        prediction for token i-1 is correct and unchangeable.

        The feature matrix should have one row per input token, and be formatted
        in the same way as the feature matrix in `train`.

        :param tokens: A sequence of tokens representing a sentence.
        
        :return: The feature matrix and the sequence of predicted part-of-speech
        tags (one for each input token).
        """
        # YOUR CODE HERE
        # Greedy: consider i-1
        
        # format the feature matrix as in 'train': one token per row, the input tokens are in one sentence
        features = []
        features.append({'token': tokens[0], 'token-1': '<s>'})
        
        if len(tokens) > 1 :
            for i in range(1, len(tokens)):
                features.append({'token': tokens[i], 'token-1': tokens[i - 1]})
        
        feature_matrix = self.feature_encoder.transform(features)
        predicted_label_matrix = self.trained_model.predict(feature_matrix)
        
        predicted_labels = list(self.label_encoder.inverse_transform(predicted_label_matrix))
        
        p_features = []
        p_features.append({'pos-1': '#'})
        for labels in predicted_labels[:-1]:
            p_features.append({'pos-1': labels})
        
        predicted_feature_matrix = self.feature_encoder.transform(p_features).toarray()
        
        return (predicted_feature_matrix, predicted_labels)
        
#         raise NotImplementedError()

    # BONUS (not required)
    def predict_viterbi(self, tokens: TokenSeq) -> Tuple[NDArray, NDArray, TagSeq]:
        """
        Predicts part-of-speech tags for the sequence of tokens using the
        Viterbi algorithm, and returns the transition probability tensor,
        the Viterbi lattice, and the predicted tags.

        The entry i,j,k in the transition probability tensor should correspond
        to the log-probability estimated by the classifier of token i having
        part-of-speech tag k, given that the previous part-of-speech tag was j.
        Thus, the first dimension should match the number of tokens, the second
        dimension should be one more than the number of part of speech tags (the
        last entry in this dimension corresponds to "<s>"), and the third
        dimension should match the number of part-of-speech tags.

        The entry i,k in the Viterbi lattice should correspond to the maximum
        log-probability achievable via any path from token 0 to token i and
        ending at assigning token i the part-of-speech tag k.

        The predicted part-of-speech tags should correspond to the highest
        probability path through the lattice.

        :param tokens: A sequence of tokens representing a sentence.
        
        :return: The transition probability tensor, the Viterbi lattice, and the
        sequence of predicted part-of-speech tags (one for each input token).
        """
        # YOUR CODE HERE
        total_tags = list(self.label_encoder.classes_)
        pre_tags = total_tags + ["<s>"]
        
        # i, j, k tensor
        # i = len(tokens)
        # j = len(total_tags) + 1 = len(pre_tags)
        # k = len(total_tags)
        transition_table = np.zeros((len(tokens), len(pre_tags), len(total_tags)))
        
        # i = len(tokens)
        # k = len(total_tags)
        viterbi_lattice = np.zeros((len(tokens), len(total_tags)))
        
        print("transition prob table shape: ", transition_table.shape)
        print("viterbi lattice shape: ", viterbi_lattice.shape)
        
        # transition table grid
        for i in range(0, len(tokens)):
            for j in range(0, len(pre_tags)):
                word = [{'token': tokens[i], 'pos-1': pre_tags[j]}]
                feature_of_word = self.feature_encoder.transform(word)
                transition_table[i][j] = self.trained_model.predict_log_proba(feature_of_word)
        
        # initialize base case, run viterbi
        viterbi_lattice[0] = transition_table[0, len(total_tags)]
        predicted_labels = [total_tags[np.argmax(viterbi_lattice[0])]]
        for i in range(1, len(tokens)):
            for k in range(0, len(total_tags)):
                list_p = []
                for j in range(0, len(total_tags)):
                    list_p.append(viterbi_lattice[i-1][j] + transition_table[i][j][k])
                
                # highest probability
                max_index = np.argmax(list_p)
                viterbi_lattice[i][k] = list_p[max_index]
            
            max_pro_label = total_tags[np.argmax(viterbi_lattice[i])]
            predicted_labels.append(max_pro_label)
        
        return (transition_table, viterbi_lattice, predicted_labels)
                
                
#         raise NotImplementedError()

In [6]:
# part-of-speech tags from the Penn Treebank
PTB_TAGS = {
    "#", "$", "''", "``", ",", "-LRB-", "-RRB-", ".", ":", "CC", "CD", "DT",
    "EX", "FW", "IN", "JJ", "JJR", "JJS", "LS", "MD", "NN", "NNP", "NNPS",
    "NNS", "PDT", "POS", "PRP", "PRP$", "RB", "RBR", "RBS", "RP", "SYM", "TO",
    "UH", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ", "WDT", "WP", "WP$", "WRB",
}

## Test `.read_ptbtagged()` (3 pts)

Tests that you read in a) the correct number of sentences and tokens from the training data, b) that all `PTB_TAGS` were found in that partition of the data, and c) each token has exactly one corresponding tag.

In [7]:
def test_read_ptbtagged():
    # keep a counter here (instead of enumerate) in case the iterator is empty
    token_count = 0
    sentence_count = 0
    for sentence in read_ptbtagged("data/PTBSmall/train.tagged"):
        assert len(sentence) == 2
        tokens, pos_tags = sentence
        assert len(tokens) == len(pos_tags)
        assert all(pos in PTB_TAGS for pos in pos_tags)
        token_count += len(tokens)
        sentence_count += 1
    assert token_count == 191969
    assert sentence_count == 8020

    # check the sentence count in the dev set too
    assert sum(1 for _ in read_ptbtagged("data/PTBSmall/dev.tagged")) == 5039
    
test_read_ptbtagged()

## Test features (5 pts)

This test ensures you are, per the definition of MEMM, minimally representing **token** and **prior tag** ($t_{i-1}$) features.  

Use the special symbol `<s>` to represent the prior tag of the first token in a sequence.

In [8]:
def test_feature_vectors():
    clf       = Classifier()
    ptb_train = read_ptbtagged("data/PTBSmall/train.tagged")
    ptb_train = itertools.islice(ptb_train, 2)  # just the first 2 sentences
    features_matrix, labels_vector = clf.train(ptb_train)
    # num. tokens
    assert features_matrix.shape[0] == 31
    assert labels_vector.shape[0] == 31

    # train.tagged starts with
    # Pierre	NNP
    # Vinken	NNP
    # ,	,
    # 61	CD
    # years	NNS
    # old	JJ
    assert features_matrix[4, clf.feature_index("token=years")] == 1
    assert features_matrix[4, clf.feature_index("token=old")] == 0
    assert features_matrix[4, clf.feature_index("pos-1=CD")] == 1
    assert features_matrix[4, clf.feature_index("pos-1=NNS")] == 0
    assert features_matrix[0, clf.feature_index("pos-1=<s>")] == 1
    assert labels_vector[3] == clf.label_index("CD")
    assert labels_vector[4] == clf.label_index("NNS")
test_feature_vectors()

## Test greedy decoding (5pts)

In the greedy decoding approach, each tag is predicted one at a time, and each prediction is considered a **hard** decision.  In other words, when predicting the tag for token $t_{i}$, the model will assume that its prediction for the prior token $t_{i-1}$ is correct and unchangeable.

In [9]:
def test_predict_greedy():
    clf        = Classifier()
    ptb_train  = read_ptbtagged("data/PTBSmall/train.tagged")
    ptb_train  = itertools.islice(ptb_train, 2)  # just the 1st 2 sentences
    clf.train(ptb_train)

    tokens = "Vinken is a director .".split()
    features_matrix, pos_tags = clf.predict_greedy(tokens)

    # check that there is one feature vector per POS tag
    assert features_matrix.shape[0] == len(pos_tags)

    # check that all POS tags are in the PTB tagset
    assert all(pos_tag in PTB_TAGS for pos_tag in pos_tags)

    def last_pos_index(ptb_tag):
        return clf.feature_index("pos-1=" + ptb_tag)

    # check that the first word ("The") has no pos-1 feature
    for ptb_tag in {"NNP", ",", "CD", "NNS", "JJ", "MD", "VB", "DT", "NN", "IN",
                    "VBZ", "VBG"}:
        assert features_matrix[0, last_pos_index(ptb_tag)] == 0

    # check that the remaining words have the correct pos-1 features
    for i, pos_tag in enumerate(pos_tags[:-1]):
        assert features_matrix[i + 1, last_pos_index(pos_tag)] > 0

test_predict_greedy()

## Minimum accuracy (4pts)

Your model should achieve >= 93% acccuracy against the first 100 sentences of the Penn Treebank development partition.  To achieve this accuracy, you may need to include additional contextual features (i.e., features that represent information about the surrounding words and/or tags).

**WARNING**: _this test may be slow to run (2 min.+)_

In [75]:
def test_accuracy():
    clf       = Classifier()
    ptb_train = read_ptbtagged("data/PTBSmall/train.tagged")
    clf.train(ptb_train)

    total_count   = 0
    correct_count = 0
    ptb_dev = read_ptbtagged("data/PTBSmall/dev.tagged")
    ptb_dev = itertools.islice(ptb_dev, 100)  # just the 1st 100 sentences
    for tokens, pos_tags in ptb_dev:
        total_count += len(tokens)
        predicted_tags = clf.predict(tokens)
        assert len(predicted_tags) == len(pos_tags)
        for predicted_tag, true_tag in zip(predicted_tags, pos_tags):
            if predicted_tag == true_tag:
                correct_count += 1
    accuracy = correct_count / total_count

    # print out performance
    sg = f"\n{accuracy:.1%} accuracy on first 100 sentences of PTB dev"
    print(sg)
    
    assert accuracy >= 0.93

test_accuracy()


94.2% accuracy on first 100 sentences of PTB dev


## BONUS TEST (+5 pts)

In [17]:
def test_predict_viterbi():
    clf        = Classifier()
    ptb_train  = read_ptbtagged("data/PTBSmall/train.tagged")
    ptb_train  = itertools.islice(ptb_train, 2)  # just the 1st 2 sentences
    clf.train(ptb_train)

    # POS tags in first 2 sentences
    possible_tags = {"NNP", ",", "CD", "NNS", "JJ", "MD", "VB", "DT", "NN",
                     "IN", ".", "VBZ", "VBG"}
    n_tags = len(possible_tags)

    # sample sentence to be fed to classifier
    tokens = "Vinken is a director .".split()
    trans_probs, viterbi_lattice, pos_tags = clf.predict_viterbi(tokens)

    # check that the transition probabilities are the right shape
    # second axis is +1 since the last entry is transitions starting at <s>
    assert trans_probs.shape == (len(tokens), n_tags + 1, n_tags)

    # check that probability distribution from <s> to first word sums to 1
    s_index = n_tags
    prob_dist_sums = np.sum(np.exp(trans_probs), axis=-1)
    np.testing.assert_almost_equal(prob_dist_sums[0, s_index], 1)

    # check that probability distributions of all non-<s> pairs sum to 1
    for i in range(1, len(tokens)):
        for j in range(0, n_tags):
            np.testing.assert_almost_equal(prob_dist_sums[i, j], 1)

    # check that the lattice is the right shape
    assert viterbi_lattice.shape == (len(tokens), n_tags)

    # check that the numbers are not all the same in the lattice
    assert np.std(viterbi_lattice) > 0
    assert np.all(np.std(viterbi_lattice, axis=0) > 0)
    assert np.all(np.std(viterbi_lattice, axis=1) > 0)

    # check that the probabilities from <s> are on the first token
    np.testing.assert_almost_equal(viterbi_lattice[0], trans_probs[0, s_index])

    # check some probability calculations in the lattice
    np.testing.assert_almost_equal(viterbi_lattice[1, 2], max([
        viterbi_lattice[0, k] + trans_probs[1, k, 2] for k in range(n_tags)]))
    np.testing.assert_almost_equal(viterbi_lattice[3, 1], max([
        viterbi_lattice[2, k] + trans_probs[3, k, 1] for k in range(n_tags)]))
    np.testing.assert_almost_equal(viterbi_lattice[-1, 9], max([
        viterbi_lattice[-2, k] + trans_probs[-1, k, 9] for k in range(n_tags)]))

    # check that the POS tags are all valid tags
    assert all([pos_tag in PTB_TAGS for pos_tag in pos_tags])

    # check that the lattice's score for the predicted POS tag path matches
    # the score we would get from the transition probabilities
    pos_indexes = [clf.label_index(t) for t in pos_tags]
    np.testing.assert_almost_equal(
        viterbi_lattice[-1, pos_indexes[-1]],
        sum(trans_probs[i, pos_indexes[i - 1] if i else s_index, pos_index]
            for i, pos_index in enumerate(pos_indexes)))

    # check that the selected POS path has the highest score of all the possible
    # paths through the lattice
    np.testing.assert_almost_equal(
        viterbi_lattice[-1, pos_indexes[-1]],
        max(trans_probs[0, s_index, index1] +
            trans_probs[1, index1, index2] +
            trans_probs[2, index2, index3] +
            trans_probs[3, index3, index4] +
            trans_probs[4, index4, index5]
            for index1 in range(n_tags)
            for index2 in range(n_tags)
            for index3 in range(n_tags)
            for index4 in range(n_tags)
            for index5 in range(n_tags)))

test_predict_viterbi()

transition prob table shape:  (5, 14, 13)
viterbi lattice shape:  (5, 13)
