# Homework 8

### 8.1 (2p)

In this task, you will build a function *unigram_tagger (genre, tag)* that creates and evaluates a Unigram part-of-speech tagger. Your function should take a name of one of the categories from the
Brown corpus and a pos-tag tag (both as strings) and return an evaluation of the accuracy of your tagger for tag as the backoff strategy. The sentences of the given categories should be loaded and divided into the training, development and test sets as follows:

-  the training corpus – first 60% of the corpus
- the development corpus – 60-80% of the corpus
- the test corpus – last 20% of the corpus (not needed in this task)

As a backoff tagger we will use nltk’s DefaultTagger with 3 tag options —- “NNS”, “NN” and “VB” —- in order to
find the best value of this hyperparameter. Train your tagger on the training set and assess it on the development set.

In [86]:
# Dependencies.
from nltk.corpus import brown
import nltk
from nltk.probability import ConditionalFreqDist


In [52]:
def unigram_tagger(genre:str, tag:str) -> float:
    '''
    @param genre: name of one of the categories from the Brown coprus
    @param tag: pos tag
    @return float: evaluation of the tagger

    devide all sentences from the respected category into 3 parts
    initialize the unigram tagger
    train and evaluate
    '''
    # Initialization
    brown_tagged_Sents=brown.tagged_sents(categories=genre)
    brown_tokens=list(word[0] for sent in brown_tagged_Sents for word in sent)
    train_size=int(len(brown_tagged_Sents)*0.6)
    dev_size=int(len(brown_tagged_Sents)*0.2)
    train_sets=brown_tagged_Sents[:train_size]
    dev_sets=brown_tagged_Sents[train_size:train_size+dev_size]
    dev_tokens=brown_tokens[train_size:train_size+dev_size]
    test_sets=brown_tagged_Sents[train_size+dev_size:]
    
    # Initialize tagger
    default_tagger=nltk.DefaultTagger(tag)
    unigram_tagger=nltk.UnigramTagger(train_sets,backoff=default_tagger)
    return unigram_tagger.evaluate(test_sets)

Report your results for the following:

```
print(unigram_tagger("adventure", "NNS"))
print(unigram_tagger("adventure", "NN"))
print(unigram_tagger("adventure", "VB"))

```

Which Default Tagger should be preferred?
- NNS
- NN
- VB

In [53]:
print(unigram_tagger("adventure", "NNS"))
print(unigram_tagger("adventure", "NN"))
print(unigram_tagger("adventure", "VB"))

0.7991757474154791
0.8246018440905281
0.7882089969265158


### Homework 8.2

Let’s continue working on our Unigram tagger. We will slightly modify the function from the task 8.1 and turn it into the function unigram_tagger(genre, train_size). Your function should now takea name of one of the categories as a string and the size of the training corpus and return an evaluation of the accuracy of your tagger. Set the value of the hyperparameter tag to “NN” (now and in the remainder of this homework). The sentences of the given categories should be loaded and divided into the training, development and test sets as follows:

- the training corpus — first X%
- the test corpus — last 20% of the corpus
- the development corpus — remaining part of the corpus

Create a Unigram Tagger and use the DefaultTagger with “NN” as a backoff. Train your tagger on the training set and test it on the test set. Evaluate the accuracy of the tagger and return the result.

In [56]:
def unigram_tagger(genre:str, train_size:float) -> float:
    '''
    @param genre: name of one of the categories from the Brown coprus
    @param train_size: the size of the train corpus
    @return float: evaluation of the tagger
    devide all sentences from the respected category into 3 parts
    initialize the unigram tagger
    train and evaluate
    '''
    # Initialization
    # *sets is the list of tokens with its pos. *tokens is the list of tokens
    tag='NN'
    brown_tagged_Sents=brown.tagged_sents(categories=genre)
    brown_tokens=list(word[0] for sent in brown_tagged_Sents for word in sent)
    train_size=int(len(brown_tagged_Sents)*train_size)
    test_size=int(len(brown_tagged_Sents)*0.2)
    dev_size=int(len(brown_tagged_Sents)*(train_size-0.2))
    train_sets=brown_tagged_Sents[:train_size]
    dev_sets=brown_tagged_Sents[train_size:train_size+dev_size]
    dev_tokens=brown_tokens[train_size:train_size+dev_size]
    test_sets=brown_tagged_Sents[-test_size:]
    
    default_tagger=nltk.DefaultTagger(tag)
    unigram_tagger=nltk.UnigramTagger(train_sets,backoff=default_tagger)
    
    return unigram_tagger.evaluate(test_sets)

Report the value of the function for the following conditions:
```
print(unigram_tagger("adventure", 0.4))
print(unigram_tagger("adventure", 0.5))
print(unigram_tagger("adventure", 0.6))
print(unigram_tagger("adventure", 0.8))
```

In [57]:
print(unigram_tagger("adventure", 0.4))
print(unigram_tagger("adventure", 0.5))
print(unigram_tagger("adventure", 0.6))
print(unigram_tagger("adventure", 0.8))

0.8064042508564637
0.8172411382227505
0.8245822554708803
0.8412920366356709


### Homework 8.3

This task is similar to the previous one. Build a function bigram_tagger(genre,train_size) that creates a Bigram Tagger. Divide all sentences according to the scheme above. As a backoff you should use the Unigram Tagger like the one from the previous task. Train your tagger on the train set and test it on the test set. The output should be the evaluation of the performance of your tagger.

In [60]:
def bigram_tagger(genre:str, train_size:float) -> float:
    '''
    @param genre: name of one of the categories from the Brown coprus
    @param train_size: the size of the train corpus
    @return float: evaluation of the tagger

    devide all sentences from the respected category into 3 parts
    initialize the bigram tagger
    train and evaluate
    '''
    # Initialization
    # *sets is the list of tokens with its pos. *tokens is the list of tokens
    tag='NN'
    brown_tagged_Sents=brown.tagged_sents(categories=genre)
    brown_tokens=list(word[0] for sent in brown_tagged_Sents for word in sent)
    train_size=int(len(brown_tagged_Sents)*train_size)
    test_size=int(len(brown_tagged_Sents)*0.2)
    dev_size=int(len(brown_tagged_Sents)*(train_size-0.2))
    train_sets=brown_tagged_Sents[:train_size]
    dev_sets=brown_tagged_Sents[train_size:train_size+dev_size]
    dev_tokens=brown_tokens[train_size:train_size+dev_size]
    test_sets=brown_tagged_Sents[-test_size:]
    
    # initialize tagger
    default_tagger=nltk.DefaultTagger(tag)
    unigram_tagger=nltk.UnigramTagger(train_sets,backoff=default_tagger)
    bigram_tagger=nltk.BigramTagger(train_sets, backoff=unigram_tagger)
    
    return bigram_tagger.evaluate(test_sets)

Report the value of the function for the following conditions again:

```
print(bigram_tagger("adventure", 0.4))
print(bigram_tagger("adventure", 0.5))
print(bigram_tagger("adventure", 0.6))
print(bigram_tagger("adventure", 0.8))
```

In [61]:
print(bigram_tagger("adventure", 0.4))
print(bigram_tagger("adventure", 0.5))
print(bigram_tagger("adventure", 0.6))
print(bigram_tagger("adventure", 0.8))

0.815982660980214
0.8278682793819478
0.8352793120324408
0.8514996853806893


### Homework 8.4

Let’s return to the function unigram_tagger(genre, train_size) from the task 8.2 to investigate the most common types of errors that our tagger makes. Modify your function into a function unigram_tag_errors(genre, train_size), which should take a category name and the size of the training corpus and proceed with the creating a pos tagger like in the task 8.2. In the next step, your function should compare two lists of tagged sentences — test and gold. 

**The test** sentences should be the ones that have been automatically tagged (you might need the method `tag_sents` for it), and **the gold** should be the ones that have been manually labeled — the Brown corpus contains the correct human labels already. 

The result should be stored in a list of tuples with incorrect and correct tags. Create a **frequency distribution** on the error list and report the **two most frequent errors of your tagger together with their frequencies.** The output of your function should look like the following:

$$
[(('VB', 'NN'), 207), (('NN','NNS'),198)]
$$

meaning that the tagger wrongly labeled (confused) an ‘NN’ with a ‘VB’ in 207 cases, etc.

In [101]:
def unigram_tag_errors(genre:str, train_size:float) -> list:
    '''
    @param genre: name of one of the categories from the Brown coprus
    @param train_size: the size of the train corpus
    @return list: the two most frequent errors of your tagger with their frequencies
    devide all sentences from the respected category into 3 parts
    initialize the unigram tagger and train it

    create a list of tuples (wrong,correct) given automatically tagged data and the
    gold standard for that data

    errors=[]

    create a frequency distribution on that list and return the two most common
    errors with their frequencies

    return nltk.FreqDist(errors).most_common(2)
    '''
    # Initialization
    # *sets is the list of tokens with its pos. *tokens is the list of tokens
    tag='NN'
    brown_tagged_Sents=brown.tagged_sents(categories=genre)

    train_size=int(len(brown_tagged_Sents)*train_size)
    test_size=int(len(brown_tagged_Sents)*0.2)
    dev_size=int(len(brown_tagged_Sents)*(train_size-0.2))
    train_sets=brown_tagged_Sents[:train_size]
    dev_sets=brown_tagged_Sents[train_size:train_size+dev_size]
    test_sets=brown_tagged_Sents[-test_size:]
    
    test_sents=[]
    for sent in test_sets:
        s=[]
        for element in sent:
            s.append(element[0])
        test_sents.append(s)
    
    # Test sents
    unigram_tagger=nltk.UnigramTagger(train_sets)
    testTaggerRes=unigram_tagger.tag_sents(test_sents)
    
    
    # Gold sents
    goldsents=test_sets
    
    currentList=[]
    wrongList=[]
    for index_sent,sent in enumerate(goldsents):
        for index_word, word in enumerate(sent):
            wordinTest=testTaggerRes[index_sent][index_word] 
            if ((word[0]==wordinTest[0]) & (word[1]==wordinTest[1])):
                currentList.append(wordinTest)
            else:
                wrongList.append(wordinTest)
    
    freq=nltk.FreqDist(wrongList)
    return freq.most_common(2)

In [102]:

print(unigram_tag_errors("adventure", 0.4))
print(unigram_tag_errors("adventure", 0.5))
print(unigram_tag_errors("adventure", 0.6))
print(unigram_tag_errors("adventure", 0.8))


[(('to', 'TO'), 90), (('that', 'CS'), 47)]
[(('to', 'TO'), 90), (('that', 'CS'), 47)]
[(('to', 'TO'), 90), (('that', 'CS'), 47)]
[(('to', 'TO'), 90), (('that', 'CS'), 47)]
