# Parsing
***
## Table of Contents
1. [Introduction](#1-introduction)
    - [Types of Parsing](#types-of-parsing)
2. [Constituency Parsing](#2-constituency-parsing)
3. [Dependency Parsing](#3-dependency-parsing)
***

In [20]:
import nltk
import spacy
from spacy import displacy
from IPython.display import display, HTML

## 1. Introduction
Parsing in Natural Language Processing (NLP) refers to the process of analysing the grammatical structure of a sentence to determine the relationships between words and phrases. The main purpose of parsing is to build a structured representation (often a tree form) that represents how words group together and relate to each other in the sentence.

Without parsing, NLP models would lack the structural understanding needed to accurately interpret and process human language. Parsing acts as a bridge between raw text and higher-level language understanding, making it essential for building reliable NLP applications.

### Types of Parsing
- **Constituency Parsing**: Breaks a sentence into sub-phrases (constituents), such as noun phrases (NP) and verb phrases (VP), then creates a tree-like structure that represents the grammatical relationships between words and phrases. It demonstrates how words are combined to form larger syntactic units.

- **Dependency Parsing**: Focuses on binary relationships between words, identifying which words are the 'head' (main word) and which are 'dependent' (modifiers). This forms a dependency tree using a probababilistic context-free grammar (PCFG).

## 2. Constituency Parsing
Constituency parsing divides a sentence into nested sub-phrases. Let's consider a sentence:
> John saw the car in the parking lot.

`nltk.CFG()` allows us to define a simple context-free grammar:
- A sentence (`S`) consists of a noun phrase (`NP`) and a verb phrase (`VP`).
- A noun phrase can be 'John', 'the car', or 'the parking lot'.
- A verb phrase can be 'saw', 'saw' followed by an `NP`, or 'saw' followed by an `NP` and a prepositional phrase `PP`.
- A prepositional phrase is 'in' followed by an `NP`.

In [21]:
sentence = "John saw the car in the parking lot"
tokens = nltk.word_tokenize(sentence)

# Define a simple grammar using CFG
grammar = nltk.CFG.fromstring("""
  S -> NP VP
  NP -> 'John' | 'the' 'car' | 'the' 'parking' 'lot'
  VP -> 'saw' | 'saw' NP | 'saw' NP PP
  PP -> 'in' NP
""")

parser = nltk.ChartParser(grammar)

for tree in parser.parse(tokens):
    print(tree)
    tree.pretty_print()  # ASCII visualisation

(S (NP John) (VP saw (NP the car) (PP in (NP the parking lot))))
                  S                         
  ________________|___                       
 |                    VP                    
 |     _______________|_______               
 |    |       |               PP            
 |    |       |        _______|_____         
 NP   |       NP      |             NP      
 |    |    ___|___    |    _________|_____   
John saw the     car  in the     parking lot



## 3. Dependency Parsing
Dependency parsing identifies relationships between individual words. Considering a sentence:
> 'The quick brown fox jumps over the lazy dog'

In [28]:
nlp = spacy.load('en_core_web_sm')
text = 'The quick brown fox jumps over the lazy dog'
doc = nlp(text)

In [32]:
# Print each token and its dependency information
for token in doc:
    print(f"{token.text:10} {token.dep_:10} {token.head.text:10}")

The        det        fox       
quick      amod       fox       
brown      amod       fox       
fox        nsubj      jumps     
jumps      ROOT       jumps     
over       prep       jumps     
the        det        dog       
lazy       amod       dog       
dog        pobj       over      


In [29]:
html = displacy.render(doc, style='dep', jupyter=False)
display(HTML(html))

Here's the explanation for each token:

1. **The** (`det`)
- Dependency: Determiner
- Head: 'fox'
- Role: Specifies which fox ('the' fox)

2. **quick** (`amod`)
- Dependency: Adjectival modifier
- Head: 'fox'
- Role: Describes the fox's quality

3. **brown** (`amod`)
- Dependency: Adjectival modifier
- Head: 'fox'
- Role: Adds another quality to the fox

4. **fox** (`nsubj`)
- Dependency: Nominal subject
- Head: 'jumps'
- Role: Performer of the jumping action

5. **jumps** (`ROOT`)
- Dependency: Root verb
- Head: Self-referential (root)
- Role: Main action of the sentence

6. **over** (`prep`)
- Dependency: Prepositional modifier
- Head: `jumps'
- Role: Indicates direction/position of the jump

7. **the** (`det`)
- Dependency: Determiner
- Head: 'dog'
- Role: Specifies which dog ('the' dog)

8. **lazy** (`amod`)
- Dependency: Adjectival modifier
- Head: 'dog'
- Role: Describes the dog's characteristic

9. **dog** (`pobj`)
- Dependency: Prepositional object
- Head: 'over'
- Role: Target of the preposition 'over'

Key relationships for these tokens are:
- **Core Action**: fox (`nsubj`) -> jumps (`ROOT`)
- **Prepositional Phrase**: jumps -> over -> dog
- **Noun Phrase Modifiers**:
    - fox <- The (`det`)
    - fox <- quick (`amod`)
    - fox <- brown (`amod`)
    - dog <- the (`det`)
    - dog <- lazy (`amod`)