# Syntax

### Natural Language Processing and Information Extraction,  2022 WS
Lecture 6, 11/18/2022

Gábor Recski

This material can be downloaded from [https://github.com/tuw-nlp-ie/tuw-nlp-ie-2022WS](https://github.com/tuw-nlp-ie/tuw-nlp-ie-2022WS)

## Topics and SLP3 chapters

- Parts-of-speech [8.1-8.4](https://web.stanford.edu/~jurafsky/slp3/8.pdf)

- Constituency [12.1-12.3](https://web.stanford.edu/~jurafsky/slp3/12.pdf), [13.1-13.3](https://web.stanford.edu/~jurafsky/slp3/13.pdf)

- Dependency [14.1](https://web.stanford.edu/~jurafsky/slp3/14.pdf)

## Dependencies

To run this notebook, you will need to install the **stanza** and **spacy** python packages.

Make sure to restart the kernel afterwards.

Then you can use the cells below to download and initialize the necessary models.

### Download models, initialize pipelines

In [None]:
import stanza
stanza.download('en')
stanza_nlp = stanza.Pipeline(lang='en', logging_level='WARNING')

In [None]:
import spacy
from spacy.cli import download as spacy_download
spacy_download('en_core_web_sm')
spacy_nlp = spacy.load("en_core_web_sm")

In [None]:
import stanza
stanza.download('en')

## Recap

### Tokenization, lemmatization, decompounding

In [None]:
doc = stanza_nlp("Did you get me those muffins?")
print("\n".join([f"{word.text:<8}\t{word.lemma}" for word in doc.sentences[0].words]))

### What's next?


```
Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
```
(Lewis Carroll: [Jabberwocky](https://en.wikipedia.org/wiki/Jabberwocky))

<br>
<br>
<br>
<br>
<br>
<br>

```
Es brillig war. Die schlichten Toven
Wirrten und wimmelten in Waben;
Und aller-mümsige Burggoven
Die mohmen Räth' ausgraben.
```
(Translated by Robert Scott)
<br>
<br>
<br>
<br>
<br>
<br>

They don't make much sense, but how come they make any?

## Part-of-speech (POS)

In [None]:
print("\n".join([f"{word.text:<8}\t{word.pos}" for word in doc.sentences[0].words]))

In [None]:
print("\n".join([f"{word.text:<8}\t{word.xpos}" for word in doc.sentences[0].words]))

### POS-tags are morphosyntactic categories

| Word | [UPOS](https://universaldependencies.org/u/pos/) |  | [PTB](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) | |
| :--- | :--- | :--- | :--- | :--- |
| Did | AUX | auxiliary | VBD | verb, past tense |
| you | PRON | pronoun | PRP | personal pronoun |
| get | VERB | verb | VB | verb, base form |
| me | PRON | pronoun | PRP | personal pronoun |
| those | DET | determiner | DT | determiner |
| muffins | NOUN | noun | NNS | noun, plural |
| ? | PUNCT | punctuation | . | punctuation |

There's always more morphosyntactic features to consider:

In [None]:
print("\n".join([f"{word.text:<8}\t{word.pos:<8}\t{word.feats}" for word in doc.sentences[0].words]))

## Difficulties of POS-tagging

_earnings growth took a __back/JJ__ seat_

_a small building in the **back/NN**_

_a clear majority of senators **back/VBP** the bill_

_Dave began to __back/VB__ toward the door_

_enable the country to buy __back/RP__ about debt_

_I was twenty-one __back/RB__ then_

([SLP Ch.8](https://web.stanford.edu/~jurafsky/slp3/8.pdf))

### Why not implement grammar?

- grammar and vocabulary change too fast

- resolving ambiguities requires probabilistic reasoning

| _Time_ | _flies_ | _like_ | _an_ | _arrow_ |
| :----- | :------ | :----- | :--- | :------ |
| NOUN   | VERB    | ADP    | DET  | NOUN    |

| _Time_ | _flies_ | _like_ | _an_ | _arrow_ |
| :----- | :------ | :----- | :--- | :------ |
| VERB   | NOUN    | ADP    | DET  | NOUN    |

| _Time_ | _flies_ | _like_ | _an_ | _arrow_ |
| :----- | :------ | :----- | :--- | :------ |
| NOUN   | NOUN    | VERB   | DET  | NOUN    |

BTW: the second one can still have three interpretations - can you think of all of them (without googling)? 

# Questions?

_See the supplementary material in 06b_POS_tagging_HMMs.ipynb on POS-tagging with Hidden Markov Models_

# Syntactic structure

## Two perspectives

- Constituency structure (SLP3 Ch. [12](https://web.stanford.edu/~jurafsky/slp3/12.pdf))

- Dependency structure (SLP3 Ch. [15](https://web.stanford.edu/~jurafsky/slp3/12.pdf))

# Constituency

## I shot an elephant in my pyjamas

In [None]:
doc = stanza_nlp("I shot an elephant in my pyjamas")
print("\n".join([f"{word.text:<12}{word.pos}" for word in doc.sentences[0].words]))

<br>
<br>
<br>
<br>
<br>
<br>

![elephant](elephant.jpg)

([SLP Ch.13](https://web.stanford.edu/~jurafsky/slp3/13.pdf))

![NP](np2_70.jpg)

> (NP <br/>
> $\quad$ (DET an) <br/>
> $\quad$ (Nominal <br/>
> $\quad \quad$ (Nominal <br/>
> $\quad \quad \quad$ (NOUN elephant) <br/>
> $\quad \quad$ ) <br/>
> $\quad$ (PP <br/>
> $\quad \quad$ (PREP in) <br/>
> $\quad \quad$ (NP <br/>
> $\quad \quad \quad$ (DET my) <br/>
> $\quad \quad \quad$ (NOUN pyjamas) <br/>
> $\quad \quad$ ) <br/>
> $\quad $ ) <br/>
> )


### NP, PP, etc. are distributional categories. Just like POS-tags!

(DET an) (NOUN elephant) (PREP in) (DET my) (NOUN pyjamas)

(DET two) (NOUN pandas) (PREP behind) (DET his) (NOUN tent)

(NP I) (VERB shot) (NP an elephant) (PP in my pyjamas)

(NP My best friend) (VERB met) (NP two pandas) (PP behind his tent)

(NP I) (VP shot an elephant in my pyjamas)

(NP The guy driving the jeep) (VP fainted)

## Phrase structure grammars

```
S -> NP VP
VP -> VERB (NP)
NP -> (DET) NOUN (PP)
PP -> PREP NP
(...)
DET -> (an|the|my|his|...)
VERB -> (shot|met|fainted...)
PREP -> (in|behind|...)
NOUN -> (I|elephant|pyjamas|panda|tent|jeep|guy|...)
```


## Probabilistic grammars

```
NOUN -> I (0.8)
NOUN -> elephant (0.1)
(...)
VP -> VERB (0.2)
VP -> VERB NP (0.8)
```

## Constituency parsing

Parsing is the task of determining the (most likely) possible derivations of a sentence, given a (probabilistic) grammar

### The CKY algorithm

See example in [cky.pdf](cky.pdf)

See SLP3 Chapters [13](https://web.stanford.edu/~jurafsky/slp3/13.pdf) and [14](https://web.stanford.edu/~jurafsky/slp3/14.pdf) for more.

# Questions?

# Dependency structure

![dep1](dep1.jpg)
![dep2](dep2.jpg)

- **NSUBJ**: nominal subject
- **OBJ**: object
- **DET**: determiner
- **OBL**: oblique nominal
- **NMOD**: nominal modifier
- **POSS**: possessive

In [None]:
doc = stanza_nlp("I shot an elephant in my pyjamas")
print("\n".join([f"{word.id:<4}{word.text:<12}{word.deprel:<12}{word.head:<8}" for word in doc.sentences[0].words]))

## Dependency parsing - approaches

### Arc-factored parsing
- model the likelihood of edges
- e.g. how likely is _nmod(elephant, pyjamas)_?
- find the dependency graph with the most likely edges

### Transition-based parsing
- build dependency graphs by adding one word at a time
- model the likelihood of possible next steps
- e.g. should I attach _pyjamas_ to _elephant_ or _shot_?

# Shift-reduce parsing

![shiftreduce](shiftreduce.jpg)

([SLP Ch.13](https://web.stanford.edu/~jurafsky/slp3/13.pdf))

## Shift-reduce parsing


- transition-based approach
- processes words one-by-one, in linear order, no backtracking

- for each word, choose between:
    - **shift**: push the next word on the **stack**
    - **reduce**: add a dependency edge between the top two words on the stack, and remove the dependent.

# Shift-reduce example

See [shiftreduce.pdf](shiftreduce.pdf)

## A historical note on the two perspectives

### Constituency structure
- Origins in **structural linguistics** (F. de Saussure, 1900s and later L. Bloomfield, 1930s)
- (The basic ideas actually date back to **Pāṇini** (~500 BCE))
- Application of **formal language theory** (e.g. PS grammars) in 1950s (N. Chomsky)
- Remains the mainstream perspective in theoretical linguistics (known as **generative grammar**)

### Dependency structure
- Origins in **Dependency grammar** (Tesnière, 1950s)
- (The basic ideas actually date back to **Pāṇini** (~500 BCE))
- Widespread use in NLP

# Questions?