![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/collab/Chunkers/NLU_Chunking_Example.ipynb)
# Grammatical Chunk Matching with NLU
With the chunker you can filter a data set based on Part of Speech Tags with Regex patterns.    
 
I.e. You could get all nouns or adjectives in your datset with the following parameterization.
```
pipe['default_chunker'].setRegexParsers(['<NN>+', '<JJ>+'])
```

See [here](https://www.rexegg.com/regex-quickstart.html)  for a great reference of Regex operators

## Overview of all Part of Speech Tags : 


|Tag |Description | Example|
|------|------------|------|
|CC| Coordinating conjunction | This batch of mushroom stew is savory **and** delicious    |
|CD| Cardinal number | Here are **five** coins    |
|DT| Determiner | **The** bunny went home    |
|EX| Existential there | **There** is a storm coming    |
|FW| Foreign word | I'm having a **déjà vu**    |
|IN| Preposition or subordinating conjunction | He is cleverer **than** I am   |
|JJ| Adjective | She wore a **beautiful** dress    |
|JJR| Adjective, comparative | My house is **bigger** than yours    |
|JJS| Adjective, superlative | I am the **shortest** person in my family   |
|LS| List item marker | A number of things need to be considered before starting a business **,** such as premises **,** finance **,** product demand **,** staffing and access to customers |
|MD| Modal | You **must** stop when the traffic lights turn red    |
|NN| Noun, singular or mass | The **dog** likes to run    |
|NNS| Noun, plural | The **cars** are fast    |
|NNP| Proper noun, singular | I ordered the chair from **Amazon**  |
|NNPS| Proper noun, plural | We visted the **Kennedys**   |
|PDT| Predeterminer | **Both** the children had a toy   |
|POS| Possessive ending | I built the dog'**s** house    |
|PRP| Personal pronoun | **You** need to stop    |
|PRP$| Possessive pronoun | Remember not to judge a book by **its** cover |
|RB| Adverb | The dog barks **loudly**    |
|RBR| Adverb, comparative | Could you sing more **quietly** please?   |
|RBS| Adverb, superlative | Everyone in the race ran fast, but John ran **the fastest** of all    |
|RP| Particle | He ate **up** all his dinner    |
|SYM| Symbol | What are you doing **?**    |
|TO| to | Please send it back **to** me    |
|UH| Interjection | **Wow!** You look gorgeous    |
|VB| Verb, base form | We **play** soccer |
|VBD| Verb, past tense | I **worked** at a restaurant    |
|VBG| Verb, gerund or present participle | **Smoking** kills people   |
|VBN| Verb, past participle | She has **done** her homework    |
|VBP| Verb, non-3rd person singular present | You **flit** from place to place    |
|VBZ| Verb, 3rd person singular present | He never **calls** me    |
|WDT| Wh-determiner | The store honored the complaints, **which** were less than 25 days old    |
|WP| Wh-pronoun | **Who** can help me?    |
|WP\$| Possessive wh-pronoun | **Whose** fault is it?    |
|WRB| Wh-adverb | **Where** are you going?  |








Chunks are Named 


In [None]:
import os
! apt-get update -qq > /dev/null   
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install nlu > /dev/null 


# 2. Load the Chunker and print parameters

In [None]:
import nlu 

pipe = nlu.load('match.chunks')
# Now we print the info to see at which index which com,ponent is and what parameters we can configure on them 
pipe.print_info()

match_chunks download started this may take some time.
Approx size to download 4.3 MB
[OK!]
The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('disabled')         | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : disabled
>>> pipe['sentence_detector'] has settable params:
pipe['sentence_detector'].setCustomBounds([])                 | Info: characters used to explicitly mark sentence bounds | Currently set to : []
pipe['sentence_detector'].setDetectLists(True)                | Info: whether detect lists during sentence detection | Currently set to : True
pipe['sentence_detector'].setExplodeSentences(False)          | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['sentence_det

# 3. Configure pipe to only match nounds and adjvectives and predict on data

In [None]:
# Lets set our Chunker to only match NN
pipe['default_chunker'].setRegexParsers(['<NN>+', '<JJ>+'])
# Now we can predict with the configured pipeline
pipe.predict("Jim and Joe went to the big blue market next to the town hall")

Unnamed: 0_level_0,pos,chunk
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO...",market
0,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO...",town hall
0,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO...",big blue
0,"[NNP, CC, NNP, VBD, TO, DT, JJ, JJ, NN, JJ, TO...",next
