# (a)

### Window-based model:

\begin{align*}
e^{(t)} &= [x^{(t - w)}L, \cdots, x^{(t)}L, \cdots, x^{(t+w)}L] \\
h^{(t)} &= \mathrm{ReLU}(e^{(t)}W + b_1) \\
\hat{y}^{(t)} &= \mathrm{softmax}(h^{(t)}U + b_2) \\
J &= CE(y^{(t)}, \hat{y}^{(t)}) = -\sum_{i}^{T}y_i^{(t)}\mathrm{log}(\hat{y}_i^{(t)})
\end{align*}

Variable parameters:

* $x$: $V$ (one-hot)
* $L$: $V \times D$
* $e^{(t)}$: $1 \times (2w + 1)D$
* $W$: $(2w + 1)D \times H$
* $b_1$: $1 \times H$
* $h^{(t)}$: $1 \times H$
* $U$: $H \times C$
* $b_2$: $1 \times C$

### RNN:

\begin{align*}
e^{(t)} &= x^{(t)}L \\
h^{(t)} &= \sigma(h^{(t - 1)} W_h + e^{(t)} W_x + b_1) ) \\
\hat{y}^{(t)} &= \mathrm{softmax}(h^{(t)}U + b_2) \\
J &= CE(y^{(t)}, \hat{y}^{(t)}) = -\sum_{i}^{T}y_i^{(t)}\mathrm{log}(\hat{y}_i^{(t)})
\end{align*}

Variable parameters (only different parts from window-based model are shown):

* $e^{(t)}$: $1 \times D$
* $W_x$: $D \times H$
* $W_h$: $H \times H$

So 

i. (1) RNN has an additional $W_h$, and (2) its $W_x$ has different dimensions from its correspondent in the window-based model

ii. in terms of computation complexity, (1) window-based model has $\mathcal{O}(DH + HC)$; while in the RNN model, it's $\mathcal{O}((HH + DH)T + HC)$. Given $C$ is small, the $HC$ term may be dropped.

# (b)

$$F1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}$$

F1 is not differentiable and and it needs to take the entire corpus into consideration to be calculated, so it would be difficult to optimize directly for F1.

# (f)

Output:

```
INFO:Epoch 10 out of 10
439/439 [==============================] - 33s - train loss: 0.0233      

INFO:Evaluating on development data
102/102 [==============================] - 40s     
DEBUG:Token-level confusion matrix:
go\gu           PER             ORG             LOC             MISC            O       
PER             2925.00         40.00           81.00           26.00           77.00   
ORG             85.00           1617.00         102.00          158.00          130.00  
LOC             20.00           64.00           1925.00         46.00           39.00   
MISC            25.00           17.00           35.00           1090.00         101.00  
O               24.00           30.00           20.00           52.00           42633.00

DEBUG:Token-level scores:
label   acc     prec    rec     f1   
PER     0.99    0.95    0.93    0.94 
ORG     0.99    0.91    0.77    0.84 
LOC     0.99    0.89    0.92    0.90 
MISC    0.99    0.79    0.86    0.83 
O       0.99    0.99    1.00    0.99 
micro   0.99    0.98    0.98    0.98 
macro   0.99    0.91    0.90    0.90 
not-O   0.99    0.90    0.88    0.89 

INFO:Entity level P/R/F1: 0.84/0.86/0.85

102/102 [==============================] - 41s    
```

### Limitation of RNN

1. There is heavy mix bewteen MISC & ORG, and O & ORG.
1. It seems that RNN has the same problem of non-contiguous entity predictions as window-based NN model.

    ```
    x : starting on May 13 next year , the Test and County Cricket Board 
    y*: O        O  O   O  O    O    O O   ORG  ORG ORG    ORG     ORG   
    y': O        O  O   O  O    O    O O   MISC O   ORG    ORG     ORG   

    x : as well as one-day matches against the Minor Counties and 
    y*: O  O    O  O       O       O       O   ORG   ORG      O   
    y': O  O    O  O       O       O       O   ORG   MISC     O  

    x : May 14 Practice at Lord 's  
    y*: O   O  O        O  LOC  LOC 
    y': O   O  O        O  LOC  O   

    x : May 25 Third one-day international ( at Lord 's  , London ) 
    y*: O   O  O     O       O             O O  LOC  LOC O LOC    O 
    y': O   O  O     O       O             O O  LOC  O   O LOC    O 

    x : June 5-9 First test match ( at Edgbaston , Birmingham ) 
    y*: O    O   O     O    O     O O  LOC       O LOC        O 
    y': O    O   MISC  O    O     O O  LOC       O LOC        O 

    x : June 19-23 Second test ( at Lord 's  ) 
    y*: O    O     O      O    O O  LOC  LOC O 
    y': O    O     O      O    O O  LOC  O   O 

    x : SOCCER - SHEARER NAMED AS ENGLAND CAPTAIN . 
    y*: O      O PER     O     O  LOC     O       O 
    y': O      O O       O     O  LOC     O       O 

    x : BASKETBALL - INTERNATIONAL TOURNAMENT RESULT . 
    y*: O          O O             O          O      O 
    y': O          O MISC          O          O      O 
    ```

1. But it can recognize person correctly now, which is cool!

    ```
    x : " I 'm an emotional player , " said the 104th-ranked Tarango . " 
    y*: O O O  O  O         O      O O O    O   O            PER     O O 
    y': O O O  O  O         O      O O O    O   O            PER     O O 
    ```
    
Not sure how the first two points could be fixed yet. :(