### Exercise 1 - RegEx Chunker 
Using the RegExp tagger for guidance, write a tag pattern to cover noun phrases that contain gerunds, e.g. "the/DT receiving/VBG end/NN", "assistant/NN managing/VBG editor/NN". Add these patterns to the grammar, one per line. Test your work using some tagged sentences of your own devising. Note that you can provide multiple regexp patterns for identifying by using the syntax below. Note that the order of the patterns is important.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import nltk
import re

sent = 'Tim is managing well'
tag_sent = nltk.pos_tag(nltk.word_tokenize(sent))
print(tag_sent)
sent2 = 'The managing director'
tag_sent2 = nltk.pos_tag(nltk.word_tokenize(sent2))
print(tag_sent2)
sent3 = 'Tim is running around'
tag_sent3 = nltk.pos_tag(nltk.word_tokenize(sent3))
print(tag_sent3)

[('Tim', 'NNP'), ('is', 'VBZ'), ('managing', 'VBG'), ('well', 'RB')]
[('The', 'DT'), ('managing', 'VBG'), ('director', 'NN')]
[('Tim', 'NNP'), ('is', 'VBZ'), ('running', 'VBG'), ('around', 'RB')]


In [3]:
grammar = r"""
NP: {<VBZ>?<VBG>*<RB>}
{<NNP>+}
{<DT>*<VBG>?<NN>}
"""

cp = nltk.RegexpParser(grammar)
tree = cp.parse(tag_sent)
print(tree)
tree = cp.parse(tag_sent2)
print(tree)
tree = cp.parse(tag_sent3)
print(tree)

(S (NP Tim/NNP) (NP is/VBZ managing/VBG well/RB))
(S (NP The/DT managing/VBG director/NN))
(S (NP Tim/NNP) (NP is/VBZ running/VBG around/RB))


## Grading
This is recognizing gerunds that aren't NPs.  NP gerunds do not start with VBZ.  Those are VPs.

### Exercise 2 - Name Entity Recognition
Develop an NER system specific to the category of names that you collected in exercise 1 of the written exercises. You can build a classifier or use a rule-based system. Either way, you will want to consider incorporating some of the features discussed during lecture, along with some that you found in your analysis. Evaluate your system on a collection of text likely to contain instances of these named entities. Discuss how the evaluation compared to what you expected.

List being used:
* Dota 2
* World of Warcraft
* Street Fighter IV
* Ultimate Marvel vs. Capcom 3
* Mortal Kombat X
* The Witcher 3: Wild Hunt
* The Witcher 2
* The Witcher
* Skyrim
* Batman: Arkham Knight
* Batman: Arkham City
* Batman: Arkham Origins
* Batman: Arkham Asylum
* Metal Gear Solid V: The Phantom Pain
* Wolfenstein 3D
* Wolfenstein: The New Order
* Wolfenstein II: The New Colossus
* Wolfenstein: The Old Blood
* Warframe
* IL-2 Sturmovik: Battle of Stalingrad
* Mass Effect
* Mass Effect 2
* Mass Effect 3
* Mass Effect: Andromeda

In [4]:
list_of_sents = ['the interface for skill trees has been redesigned as opposed to previous versions of World of Warcraft',
                'DOTA 2 retains the strategic aspect of lane control and positioning like the Warcraft mod it was based on',
                'Marvel vs. Capcom continues to be a ground for unbalanced and meta characters to completely destroy new players',
                'The Witcher 3 takes graphical fidelity and story telling to levels previously unheard of',
                'Skyrim is a game that will never end as long as its modding community exists',
                 'Wolfenstein 3D is a blast from the past',
                 'The New Order provides perspective into the alternate reality where the allied nations lost the war',
                'the gameplay for The New Collosus is polished but its story rarely ever makes sense',
                 'Arkham Knight was broken so badly that Warner Brothers had to pull the game from stores until it was fixed',
                 'Batman: Arkham City was a masterpiece in storytelling and lore',
                'Arkham Origins suffers from the same problems its predecessors had',
                 'Mass Effect: Andromeda has fantastic gameplay but the game overall pales in comparision to the original trilogy']

In [5]:
for sent in list_of_sents:
    tagged_sents = nltk.pos_tag(nltk.word_tokenize(sent))
    print(tagged_sents)
    print()

[('the', 'DT'), ('interface', 'NN'), ('for', 'IN'), ('skill', 'NN'), ('trees', 'NNS'), ('has', 'VBZ'), ('been', 'VBN'), ('redesigned', 'VBN'), ('as', 'IN'), ('opposed', 'VBN'), ('to', 'TO'), ('previous', 'JJ'), ('versions', 'NNS'), ('of', 'IN'), ('World', 'NNP'), ('of', 'IN'), ('Warcraft', 'NNP')]

[('DOTA', 'NNP'), ('2', 'CD'), ('retains', 'VBZ'), ('the', 'DT'), ('strategic', 'JJ'), ('aspect', 'NN'), ('of', 'IN'), ('lane', 'NN'), ('control', 'NN'), ('and', 'CC'), ('positioning', 'VBG'), ('like', 'IN'), ('the', 'DT'), ('Warcraft', 'NNP'), ('mod', 'NN'), ('it', 'PRP'), ('was', 'VBD'), ('based', 'VBN'), ('on', 'IN')]

[('Marvel', 'NNP'), ('vs.', 'FW'), ('Capcom', 'NNP'), ('continues', 'VBZ'), ('to', 'TO'), ('be', 'VB'), ('a', 'DT'), ('ground', 'NN'), ('for', 'IN'), ('unbalanced', 'JJ'), ('and', 'CC'), ('meta', 'JJ'), ('characters', 'NNS'), ('to', 'TO'), ('completely', 'RB'), ('destroy', 'VB'), ('new', 'JJ'), ('players', 'NNS')]

[('The', 'DT'), ('Witcher', 'NNP'), ('3', 'CD'), ('takes', 

In [10]:
rules = r'''
GAME: {<DT>*<NNP>}
{<NNP>*<CD>}
{<NNP>?<FW>*<NNP>}
{<NNP>?}
ORG: {<DT>?<NNP>*<NNPS>}
'''
for sent in list_of_sents:
    tagged_sents = nltk.pos_tag(nltk.word_tokenize(sent))
    reg_pars = nltk.RegexpParser(rules)
    parsed_words = reg_pars.parse(tagged_sents)
    print(parsed_words)

(S
  the/DT
  interface/NN
  for/IN
  skill/NN
  trees/NNS
  has/VBZ
  been/VBN
  redesigned/VBN
  as/IN
  opposed/VBN
  to/TO
  previous/JJ
  versions/NNS
  of/IN
  (GAME World/NNP)
  of/IN
  (GAME Warcraft/NNP))
(S
  (GAME DOTA/NNP)
  (GAME 2/CD)
  retains/VBZ
  the/DT
  strategic/JJ
  aspect/NN
  of/IN
  lane/NN
  control/NN
  and/CC
  positioning/VBG
  like/IN
  (GAME the/DT Warcraft/NNP)
  mod/NN
  it/PRP
  was/VBD
  based/VBN
  on/IN)
(S
  (GAME Marvel/NNP)
  vs./FW
  (GAME Capcom/NNP)
  continues/VBZ
  to/TO
  be/VB
  a/DT
  ground/NN
  for/IN
  unbalanced/JJ
  and/CC
  meta/JJ
  characters/NNS
  to/TO
  completely/RB
  destroy/VB
  new/JJ
  players/NNS)
(S
  (GAME The/DT Witcher/NNP)
  (GAME 3/CD)
  takes/VBZ
  graphical/JJ
  fidelity/NN
  and/CC
  story/NN
  telling/VBG
  to/TO
  levels/NNS
  previously/RB
  unheard/IN
  of/IN)
(S
  (GAME Skyrim/NNP)
  is/VBZ
  a/DT
  game/NN
  that/WDT
  will/MD
  never/RB
  end/VB
  as/RB
  long/RB
  as/IN
  its/PRP$
  modding/VBG
  communit

In [11]:
# reg_pars.parse(tagged_sents).draw()

### Analysis:
It appears that my rules work with the sentences I have so far. Patterns were obtained from the sentences in the list and rules / classes could be added with some testing to more sentences. The source of the sentences were reviews off of game related articles and posts based on the Steam platform. Something of note: converting roman numerals to their cardinal equivalent helps with assigning classes, and further - attempting to assign a class to a word that has a similar tag (generated from the tokenizer) is difficult since the rules would conflict with each other, especially when they share similar patterns. 

## Grading
Good first attempt.  Adding additional features would improve both precision and recall.