# 4. Named Entity Recognition with Window Classifier

We will perform NER(Named Entity Recognition) with Window Classifier. As you may have already noticed, non-feedforward neural networks like RNN, GRU, LSTM will work well in these kinds of tasks. So we will revisit NER after we will have covered those networks.

### References
- [CS224n: Natural Language Processing with Deep Learning - Lecture 4](http://web.stanford.edu/class/cs224n/lectures/lecture4.pdf)



In [1]:
from models import WindowClassifier
import nltk
import random

## Load and Preprocess Corpus

In [2]:
corpus = nltk.corpus.conll2002.iob_sents()

In [3]:
data = []
for sent in corpus:
    words, _, tags = list(zip(*sent))
    data.append([words, tags])

In [4]:
print(len(data))
print(data[0])

35651
[('Sao', 'Paulo', '(', 'Brasil', ')', ',', '23', 'may', '(', 'EFECOM', ')', '.'), ('B-LOC', 'I-LOC', 'O', 'B-LOC', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O')]


In [5]:
# split train/dev/test data
random.seed(1004)
random.shuffle(data)
idx1 = int(len(data) * 0.6)
idx2 = int(len(data) * 0.8)
train_data = data[:idx1]
valid_data = data[idx1:idx2]
test_data = data[idx2:]

# Fit and Train WindowClassifier

In [6]:
model = WindowClassifier.WindowClassifier(word_embedding_size=100,
                                          window_size=5,
                                          hidden_size=300,
                                          learning_rate=0.001)

DEBUG: 04132118


In [7]:
model.fit_to_data(train_data, valid_data)

Instructions for updating:
Use the retry module or similar alternatives.


In [8]:
model.train(10, save_dir="save/04_ner", log_dir="log/04_ner", print_every=1000)

--------------------------------------------------------------------------------
Created and Initialized fresh model. Size: 5280809
--------------------------------------------------------------------------------
001000: 1 [01000/03170], train_loss = 0.42526925, secs/batch = 0.0031
002000: 1 [02000/03170], train_loss = 0.48944670, secs/batch = 0.0035
003000: 1 [03000/03170], train_loss = 0.28266060, secs/batch = 0.0036
Epoch training time: 10.522184371948242

Evaluating..
             precision    recall  f1-score   support

      B-LOC       0.68      0.54      0.60      2316
     B-MISC       0.71      0.19      0.30      1680
      B-ORG       0.70      0.55      0.62      2816
      B-PER       0.71      0.48      0.57      2542
      I-LOC       0.68      0.21      0.32       636
     I-MISC       0.60      0.15      0.24      1260
      I-ORG       0.66      0.49      0.57      2034
      I-PER       0.79      0.57      0.66      1918

avg / total       0.70      0.45      0.53  

Epoch training time: 10.2492835521698

Evaluating..
             precision    recall  f1-score   support

      B-LOC       0.75      0.77      0.76      2316
     B-MISC       0.57      0.58      0.58      1680
      B-ORG       0.80      0.76      0.78      2816
      B-PER       0.80      0.80      0.80      2542
      I-LOC       0.66      0.58      0.62       636
     I-MISC       0.58      0.49      0.53      1260
      I-ORG       0.73      0.74      0.73      2034
      I-PER       0.89      0.78      0.83      1918

avg / total       0.74      0.72      0.73     15202

Finished Epoch 10
train_loss = 0.00696038, validation_loss = 0.29741616



# Test
According to [Named Entity Recognition with Character-Level Models - Klein et al.](https://nlp.stanford.edu/cmanning/papers/conll-ner.pdf), "*because of data sparsity, sophisticated
unknown word models are generally required for good performance.*"

But in this model, we will just ignore unknown words in test time. We will embed unknown words to zero-vector for convenience. Maybe we will go deeper into NER after we cover some CNN and RNN models.

In [9]:
model.test(test_data, load_dir="save/04_ner")

INFO:tensorflow:Restoring parameters from save/04_ner/epoch010_0.2974.model
--------------------------------------------------------------------------------
Restored model from checkpoint for testing. Size: 5280809
--------------------------------------------------------------------------------
             precision    recall  f1-score   support

      B-LOC       0.75      0.78      0.76      2237
     B-MISC       0.56      0.59      0.58      1608
      B-ORG       0.81      0.74      0.77      2963
      B-PER       0.80      0.83      0.81      2534
      I-LOC       0.66      0.57      0.61       615
     I-MISC       0.56      0.42      0.48      1305
      I-ORG       0.74      0.75      0.75      2043
      I-PER       0.89      0.83      0.86      1859

avg / total       0.74      0.72      0.73     15164

test samples: 135642, time elapsed: 0.7597, time per one batch: 0.0007
