#### Dependency Grammar and Parsing

Previously we looked at `constituency grammars` which describe the syntactic structure of sentences in terms of `hierarchical/nested phrasal constituents`.  Another common and useful type of grammar formalism which we will now explore is called `dependency grammars`. In a dependency grammar, the syntax of sentences is described entirely by `binary assymmetric grammatical relations` between words called `dependencies`. Such a relation can be depicted by a `labelled arrow` that goes from a `head` word to its `dependent` word. The dependents of a particular head word play the role of a modifier of that head word.

All the dependency relations in a sentence are then captured in a `directed-acyclic-graph`, which we call a `dependency tree`, as shown in the example below. 

<img src="dependency_tree_example.png" width="430" height="150">

The `head of a sentence is usually a tensed verb`, also called the `predicate` (which in the above example is the verb "cancelled"), and all other words connect to this head through a dependency path. Also, each word is a dependent of exactly one head. The root node of the tree is designated as the head of the predicate word which is the head of the entire senetence.

Dependency relations can be broadly classified into two main categories: `clausal argument relations` and `nominal  modifier relations`. Clausl relations describe syntactic roles that words play with resepct to the predicate, such as `nominal/noun subject` (the word "United" in the example) and `direct/indirect object` (the word "flights" in the example). Modifier relations catagorize the ways in which words can modify their heads, such as `adjectival modifiers` (obviously these are adjectives), `nominal modifiers` (these are nounds), `determiners` and `case modifiers` (these are prepositions). In the example above, for the phrase "morning flights", the head word is "flights" and the dependent word "morning" is a nominal modifier of this head. 

Given any sentence, the goal of a `dependency parser` is to generate the dependency tree of that sentence. There are two main types of dependency parsing algorithms:

1) Greedy Transition-Based Parsers

2) Graph-Based Parsers

Transition-based parsers are implemented in terms of a `state machine`. Parsing involves starting from an initial state and executing a sequence of `shift-reduce` operations to reach a goal/terminal state. An `oracle`, is used to decide which operation to execute at each step. Such an oracle is trained using supervised machine learning.

On the other hand, a Graph-based method starts with a `fully-connected graph` (where the words are the vertices and the edges represent all possible head-dependent assignments). Then a `scoring model`, which can also be trained using supervised machine learning, is used to assign weights/scores to each edge (along with scores for all possible labels for an edge). Then, parsing involves finding the optimal tree which has the largest sum of edge scores, which can be done by constructing a `maximum spanning tree` from the initial fully-connected graph.

In this notebook, we will look at some of the basic ideas behind a greedy transition-based parser, and create a `training oracle` for generating training data, which we will later use in a different notebook to train an oracle using supervised machine learning.

#### Greedy Transition-Based Parsing Algorithm:

For this algorithm, we have the following components: a `stack`, a `buffer`, a `list of dependency relations`, a `set of operations` and an `oracle`. The stack is initialized with the designated `ROOT` of the tree and the buffer is the list of words for the sentence to be parsed. At each step, the oracle can choose form the following actions: `LEFTARC`, `RIGHTARC` and `SHIFT`. 

The `LEFTARC` operation assigns a head-dependent relation between the word at the top of the stack and the second word from the stack, then removes the second word. Also, the second word cannot be the `ROOT`.

The `RIGHTARC` operation assigns a head-dependent relation between the second wornd on the stack and the top word on the stack, then removes the top word. 

The `SHIFT` operation takes the top word from the buffer and places it on the top of the stack.

The `LEFTARC` and `RIGHTARC` are also called `reduce operations`. Each time one of these operations is executed, we add the corresponding head-dependent relation to the list of dependency relations. Also note that these operations create unlabeled dependency relations. In order to accomodate labeled dependency relations, we need separate `LEFTARC` and `RIGHTARC` operations for each possible relation, e.g. for the direct-object label we would have `LEFTARC-OBJ` and `RIGHTARC-OBJ`.

The state/configuration of the parse is defined by the state of the stack, buffer and dependency relations list. The goal/terminal state is the state where the stack only contains the `ROOT` and the buffer is empty. Parsing involves starting from the initial state and performing a sequence of operations (chosen by the oracle) to arrive at the goal state. Since this is a greedy algorithm, once an operation is executed, the new state cannot be undone, so a single wrong operation can lead to the parse being incorrect at the end. 

We will consider a simplified example of parsing where we ignore the dependency labels and assume a perfect oracle.

Example sentence: "Book me the morning flight"

|Step | Stack          | Buffer                           | Operation | Relation Added |
|-----|-------         |--------                          |-----------|----------------|
|  0  | [ROOT,]        | [Book, me, the, morning, flight] | SHIFT     |         |   
|  1  | [ROOT,Book]        | [me, the, morning, flight] | SHIFT     |         |   
|  2  | [ROOT,Book, me]        | [the, morning, flight] | RIGHTARC     |     (book $\to$ me)    |   
|  3  | [ROOT,Book]        | [the, morning, flight] | SHIFT     |         |   
|  4  | [ROOT,Book, the]        | [morning, flight] | SHIFT     |        |   
|  5  | [ROOT,Book, the, morning]        | [flight] | SHIFT     |     |   
|  6  | [ROOT,Book, the, morning, flight]        | [] | LEFTARC     |  (morning $\gets$ flight)   |   
|  7  | [ROOT,Book, the, flight]        | [] | LEFTARC     |  (the $\gets$ flight)   |   
|  8  | [ROOT,Book, flight]        | [] | RIGHTARC     |  (book $\to$ flight)   |   
|  9  | [ROOT,Book]        | [] | RIGHTARC     |  (ROOT $\to$ Book)   |   
|  10  | [ROOT]        | [] | DONE     |     |   


To train a neural-network based Oracle, we need to pair features that are extracted from the currect state of a parse and the corresponding ground-truth operation that needs to be executed next. We will draw instances from an annotated treebank dataset which contains full dependency parse trees. We will then create (state features, next operation) pairs from these trees.  


In [4]:
""" 

Our training data consists of dependency parse trees expressed in `CoNLL-U format`. An example of a parsed sentences in this format is shown below:

1	The	_	DET	DT	_	4	det	_	_
2	luxury	_	NOUN	NN	_	4	compound	_	_
3	auto	_	NOUN	NN	_	4	compound	_	_
4	maker	_	NOUN	NN	_	7	nsubj	_	_
5	last	_	ADJ	JJ	_	6	amod	_	_
6	year	_	NOUN	NN	_	7	nmod:tmod	_	_
7	sold	_	VERB	VBD	_	0	root	_	_
8	1,214	_	NUM	CD	_	9	nummod	_	_
9	cars	_	NOUN	NNS	_	7	dobj	_	_
10	in	_	ADP	IN	_	12	case	_	_
11	the	_	DET	DT	_	12	det	_	_
12	U.S.	_	PROPN	NNP	_	7	nmod	_	_


Each line has a sequence of tab separated fields:  `TOKEN_ID    WORD_FORM   LEMMA   U_POS   X_POS   FEATS   HEAD_ID    DEPREL   DEPS    MISC`

where LEMMA is the base form of the word, U_POS is the universal part-of-speech tag, and X_POS is the language-specific part-of-speech tag. The HEAD_ID is the id of the token that is the parent of the current token in the parse tree, and DEPREL is the dependency relation between the current token and its parent. The DEPS field is a list of secondary dependencies, and the MISC field is a catch-all for other information.

A lot of these fields are blank in the file containing our dataset, because we don't need that information for our task. We will only use the WORD_FORM, U_POS, HEAD_ID, and DEPREL fields.

"""



" \n\nOur training data consists of dependency parse trees expressed in `CoNLL-U format`. An example of a parsed sentences in this format is shown below:\n\n1\tThe\t_\tDET\tDT\t_\t4\tdet\t_\t_\n2\tluxury\t_\tNOUN\tNN\t_\t4\tcompound\t_\t_\n3\tauto\t_\tNOUN\tNN\t_\t4\tcompound\t_\t_\n4\tmaker\t_\tNOUN\tNN\t_\t7\tnsubj\t_\t_\n5\tlast\t_\tADJ\tJJ\t_\t6\tamod\t_\t_\n6\tyear\t_\tNOUN\tNN\t_\t7\tnmod:tmod\t_\t_\n7\tsold\t_\tVERB\tVBD\t_\t0\troot\t_\t_\n8\t1,214\t_\tNUM\tCD\t_\t9\tnummod\t_\t_\n9\tcars\t_\tNOUN\tNNS\t_\t7\tdobj\t_\t_\n10\tin\t_\tADP\tIN\t_\t12\tcase\t_\t_\n11\tthe\t_\tDET\tDT\t_\t12\tdet\t_\t_\n12\tU.S.\t_\tPROPN\tNNP\t_\t7\tnmod\t_\t_\n\n\nEach line has a sequence of tab separated fields:  `TOKEN_ID    WORD_FORM   LEMMA   U_POS   X_POS   FEATS   HEAD_ID    DEPREL   DEPS    MISC`\n\nwhere LEMMA is the base form of the word, U_POS is the universal part-of-speech tag, and X_POS is the language-specific part-of-speech tag. The HEAD_ID is the id of the token that is the parent of