## Natural Language Understanding (NLU): 
  * The structures and processes which are useful in making "computer" understand the Natural Language.
  * Natural Language Understanding (NLU) or Natural Language Interpretation (NLI) is a subtopic of natural language processing in artificial intelligence.
  * NLU or NLI is **useful in evaluating the output of** Natural Langauge Generation, Machine Translation, Speech Recognition applicaiotns. 

&nbsp;
  
  * **Syntactic Analysis:** looks into the grammer and structure of a sentence (or part of a sentence). Syntactic analysis needs rules of grammer to understand correctness of a sentence. Grametical **Parsing algorithms** are used to generate sentence structure (or a parse tree).

### POS Tagging
  * The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset.

### POS tagging with NLTK

In [1]:
#nltk.download()

In [2]:
import nltk
sentence = 'Ford has embarked on an aggressive strategy for future profitability and is streamlining the operations in the global markets by reducing the number of platforms used and discontinuing the products that are not garnering good sales volume.'
corpus = sentence.split()
print(nltk.pos_tag(corpus))

[('Ford', 'NNP'), ('has', 'VBZ'), ('embarked', 'VBN'), ('on', 'IN'), ('an', 'DT'), ('aggressive', 'JJ'), ('strategy', 'NN'), ('for', 'IN'), ('future', 'JJ'), ('profitability', 'NN'), ('and', 'CC'), ('is', 'VBZ'), ('streamlining', 'VBG'), ('the', 'DT'), ('operations', 'NNS'), ('in', 'IN'), ('the', 'DT'), ('global', 'JJ'), ('markets', 'NNS'), ('by', 'IN'), ('reducing', 'VBG'), ('the', 'DT'), ('number', 'NN'), ('of', 'IN'), ('platforms', 'NNS'), ('used', 'VBN'), ('and', 'CC'), ('discontinuing', 'VBG'), ('the', 'DT'), ('products', 'NNS'), ('that', 'WDT'), ('are', 'VBP'), ('not', 'RB'), ('garnering', 'VBG'), ('good', 'JJ'), ('sales', 'NNS'), ('volume.', 'NN')]


In [3]:
nltk.help.upenn_tagset('NN')

NN: noun, common, singular or mass
    common-carrier cabbage knuckle-duster Casino afghan shed thermostat
    investment slide humour falloff slick wind hyena override subhumanity
    machinist ...


In [4]:
nltk.help.upenn_tagset('VBZ')

VBZ: verb, present tense, 3rd person singular
    bases reconstructs marks mixes displeases seals carps weaves snatches
    slumps stretches authorizes smolders pictures emerges stockpiles
    seduces fizzes uses bolsters slaps speaks pleads ...


In [5]:
nltk.help.upenn_tagset('NN.*')

NN: noun, common, singular or mass
    common-carrier cabbage knuckle-duster Casino afghan shed thermostat
    investment slide humour falloff slick wind hyena override subhumanity
    machinist ...
NNP: noun, proper, singular
    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
    Shannon A.K.C. Meltex Liverpool ...
NNPS: noun, proper, plural
    Americans Americas Amharas Amityvilles Amusements Anarcho-Syndicalists
    Andalusians Andes Andruses Angels Animals Anthony Antilles Antiques
    Apache Apaches Apocrypha ...
NNS: noun, common, plural
    undergraduates scotches bric-a-brac products bodyguards facets coasts
    divestitures storehouses designs clubs fragrances averages
    subjectivists apprehensions muses factory-jobs ...


### Tagged Corpora
  * Several of the corpora included with NLTK have been tagged for their part-of-speech
    * Examples nltk.corpus.brown.tagged_words(), nltk.corpus.nps_chat.tagged_words(), nltk.corpus.treebank.tagged_words(),etc..
  

In [6]:
nltk.corpus.indian.tagged_words()[:10] # POS tagged corpora are available forfour lanuages Bangla, Hindi, Marathi, and Telugu

[('মহিষের', 'NN'),
 ('সন্তান', 'NN'),
 (':', 'SYM'),
 ('তোড়া', 'NNP'),
 ('উপজাতি', 'NN'),
 ('৷', 'SYM'),
 ('বাসস্থান-ঘরগৃহস্থালি', 'NN'),
 ('তোড়া', 'NNP'),
 ('ভাষায়', 'NN'),
 ('গ্রামকেও', 'NN')]

In [7]:
print(nltk.corpus.indian.tagged_sents()[:5])

[[('মহিষের', 'NN'), ('সন্তান', 'NN'), (':', 'SYM'), ('তোড়া', 'NNP'), ('উপজাতি', 'NN'), ('৷', 'SYM')], [('বাসস্থান-ঘরগৃহস্থালি', 'NN'), ('তোড়া', 'NNP'), ('ভাষায়', 'NN'), ('গ্রামকেও', 'NN'), ('বলে', 'VM'), ('`', 'SYM'), ('মোদ', 'NN'), ("'", 'SYM'), ('৷', 'SYM')], [('মোদের', 'NN'), ('আয়তন', 'NN'), ('খুব', 'INTF'), ('বড়ো', 'JJ'), ('নয়', 'VM'), ('৷', 'SYM')], [('প্রতি', 'QF'), ('মোদে', 'NN'), ('আছে', 'VM'), ('কিছু', 'QF'), ('কুঁড়েঘর', 'NN'), (',', 'SYM'), ('সাধারণ', 'JJ'), ('মহিষশালা', 'NN'), ('৷', 'SYM')], [('আর', 'CC'), ('গ্রামের', 'NN'), ('বাইরে', 'NST'), ('থাকে', 'VM'), ('ডেয়ারি-মন্দির', 'NN'), ('৷', 'SYM')]]


In [8]:
nltk.corpus.indian.fileids()

['bangla.pos', 'hindi.pos', 'marathi.pos', 'telugu.pos']

In [9]:
print(nltk.corpus.indian.tagged_words(fileids='telugu.pos')[:20])

[('4', 'QFNUM'), ('.', 'SYM'), ('ఆడిట్', 'NN'), ('నిర్వహణ', 'NN'), ('ఆడిటర్', 'NN'), ('ఒక', 'QFNUM'), ('కొత్త', 'JJ'), ('ఆడిట్', 'NN'), ('చేపట్టే', 'VRB'), ('ముందు', 'PREP'), ('సక్రమ', 'JJ'), ('పద్ధతి', 'NN'), ('లో', 'PREP'), ('కార్య', 'JJ'), ('ప్రణాళికను', 'NN'), ('రూపొందించాలి', 'VFM'), ('.', 'SYM'), ('దాని', 'PRP'), ('కనుగుణంగా', 'PREP'), ('వ్యవహరించాలి', 'VFM')]


In [10]:
print(nltk.corpus.indian.tagged_sents(fileids='telugu.pos')[:5])

[[('4', 'QFNUM'), ('.', 'SYM')], [('ఆడిట్', 'NN'), ('నిర్వహణ', 'NN'), ('ఆడిటర్', 'NN'), ('ఒక', 'QFNUM'), ('కొత్త', 'JJ'), ('ఆడిట్', 'NN'), ('చేపట్టే', 'VRB'), ('ముందు', 'PREP'), ('సక్రమ', 'JJ'), ('పద్ధతి', 'NN'), ('లో', 'PREP'), ('కార్య', 'JJ'), ('ప్రణాళికను', 'NN'), ('రూపొందించాలి', 'VFM'), ('.', 'SYM')], [('దాని', 'PRP'), ('కనుగుణంగా', 'PREP'), ('వ్యవహరించాలి', 'VFM'), ('.', 'SYM')], [('పత్రసహిత', 'JJ'), ('సాక్ష్యాధారాల', 'NN'), ('తో', 'PREP'), (',', 'SYM'), ('వ్యవహారాల', 'NN'), ('ను', 'PREP'), ('తనిఖీ', 'NVB'), ('చేయాలి', 'VFM'), ('.', 'SYM')], [('ఆడిట్', 'NVB'), ('చేసే', 'VJJ'), ('విధానం', 'NN'), ('సంస్థ', 'NN'), ('అవసరాల', 'NN'), ('ను', 'PREP'), ('బట్టి', 'CC'), (',', 'SYM'), ('అంతర్గత', 'JJ'), ('తనిఖీన్', 'NN'), ('బట్టి', 'CC'), (',', 'SYM'), ('ఇంకా', 'CC'), ('అనేక', 'QF'), ('ఇతర', 'JJ'), ('విషయాల', 'NN'), ('ను', 'PREP'), ('బట్టి', 'CC'), ('మారుతూఉంటుంది', 'VAUX'), ('.', 'SYM')]]


### SpaCy, a production ready framework for NLP tasks
  * Non-destructive tokenization
  * Named entity recognition
  * Support for 49+ languages
  * 16 statistical models for 9 languages
  * Pre-trained word vectors
  * State-of-the-art speed
  * Easy deep learning integration
  * Part-of-speech tagging
  * Labelled dependency parsing
  * Syntax-driven sentence segmentation
  * Built in visualizers for syntax and NER
  * Convenient string-to-hash mapping
  * Export to numpy data arrays
  * Efficient binary serialization
  * Easy model packaging and deployment
  * Robust, rigorously evaluated accuracy

### POS Tagging with SpaCy: 
  * https://spacy.io/api/annotation#pos-tagging

In [11]:
import spacy

nlp = spacy.load("en_core_web_sm")  # download small model -> python -m spacy download en_core_web_sm
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text, '-', token.pos_, '-', token.dep_)

Apple - PROPN - nsubj
is - AUX - aux
looking - VERB - ROOT
at - ADP - prep
buying - VERB - pcomp
U.K. - PROPN - compound
startup - NOUN - dobj
for - ADP - prep
$ - SYM - quantmod
1 - NUM - compound
billion - NUM - pobj


### Syntactic Analysis - Sentence Structure and Parsers

#### Context Free Grammer (CFG or Constituency Grammer or Phrase Structure Grammer):  
  * Context free grammer is defined as four tuples G = {$V, \sum, S, P$}.
    * V = Set of variables or Non-Terminal symbols
    * $\sum$ = Set of Terminal symbols
    * S = Start symbols
    * P = "Production rules" : A -> $\alpha$ where $\alpha$ = $\{V U \sum\}^*$ and A $\epsilon$ V
  <img src="img_nlp/cfg.png" width=700/>  
  
      * Example to generate a sentence "I like football". Given set of nouns and verbs. N = {I|HE|SHE|BOY|GIRL|CRICKET|FOOTBALL}, V={LIKE|READ|SING}.
        * The production rules are as below 
          * R1: S -> NP VP
          * R2: NP -> N
          * R3: VP -> V NP
          * R4: VP -> V
        
   <img src="img_nlp/cfg_example.png" />

In [12]:
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")

In [13]:
sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(groucho_grammar)

In [14]:
for tree in parser.parse(sent):
    print(tree)

(S
  (NP I)
  (VP
    (VP (V shot) (NP (Det an) (N elephant)))
    (PP (P in) (NP (Det my) (N pajamas)))))
(S
  (NP I)
  (VP
    (V shot)
    (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))


* A parser processes input sentences according to the productions of a grammar, and builds one or more constituent structures that conform to the grammar.
* **Recursive Descent Parser Demo (simulator) - nltk.app.rdparser():** This tool allows you to watch the operation of a recursive descent parser as it grows the parse tree and matches it against the input words. "The dog saw a man in the park". The tree recursively expands, hence it is called as REcursive Descent Parser.

* **Recursive Grammer:** A grammar is said to be recursive if a category occurring on the left hand side of a production also appears on the righthand side of a production.
  * Direct Recursion : The production Nom -> Adj Nom (where Nom is the category of nominals) 
  * Indirect Recursion : S arises from the combination of two productions, namely S -> NP VP and VP -> V S.

In [16]:
nltk.app.rdparser()

* **Shift-Reduce Parsing Demo (simulator) - nltk.app.srparser():** The shift-reduce parser repeatedly pushes the next input word onto a stack; this is the shift operation. If the top n items on the stack match the n items on the right hand side of some production, then they are all popped off the stack, and the item on the left-hand side of the production is pushed on the stack. This replacement of the top n items with a single item is the reduce operation.

In [18]:
nltk.app.srparser()



#### Dependency Parsing:
  * Dependency grammar, focusses instead on how words relate to other words. Dependency is a binary asymmetric relation that holds between a head and its dependents. The head of a sentence is usually taken to be the tensed verb, and every other word is either dependent on the sentence head, or connects to it through a path of dependencies.

  * A dependency representation is a labeled directed graph, where the nodes are the lexical items and the labeled arcs represent dependency relations from heads to dependents. Below diagram illustrates dependency graph, where arrows point from heads to their dependents.
  
<img src="img_nlp/depgraph0.png" />

  * The arcs are labeled with the grammatical function that holds between a dependent and its head. For example, I is the SBJ (subject) of shot (which is the head of the whole sentence), and in is an NMOD (noun modifier of elephant).
  * Here's one way of encoding a dependency grammar in NLTK — note that it only captures bare dependency information without specifying the type of dependency

In [69]:
groucho_dep_grammar = nltk.DependencyGrammar.fromstring("""
'shot' -> 'I' | 'elephant' | 'in'
'elephant' -> 'an' | 'in'
'in' -> 'pajamas'
'pajamas' -> 'my'
""")
pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
sent = 'I shot an elephant in my pajamas'.split()
trees = pdp.parse(sent)

In [70]:
for tree in trees:
    print(tree)

(shot I (elephant an (in (pajamas my))))
(shot I (elephant an) (in (pajamas my)))


In [71]:
from spacy import displacy
nlp = spacy.load("en_core_web_sm")  # download small model -> python -m spacy download en_core_web_sm
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
displacy.render(doc, style="dep", jupyter=True)

In [72]:
travel = 'I am traveling from Hyderabad to Bangalore'
travel_doc = nlp(travel)
displacy.render(travel_doc, style='dep', jupyter=True)

In [73]:
for token in travel_doc:
    print('token : ', token)
    print(10*'*')
    for a in token.ancestors:
        print("Ancestor : ", a)
        print("POS tags : ", a.pos_)
        print()
    print(10*'-')
    
for ent in travel_doc.ents:
    print("Named Entities: ", ent)

token :  I
**********
Ancestor :  traveling
POS tags :  VERB

----------
token :  am
**********
Ancestor :  traveling
POS tags :  VERB

----------
token :  traveling
**********
----------
token :  from
**********
Ancestor :  traveling
POS tags :  VERB

----------
token :  Hyderabad
**********
Ancestor :  from
POS tags :  ADP

Ancestor :  traveling
POS tags :  VERB

----------
token :  to
**********
Ancestor :  traveling
POS tags :  VERB

----------
token :  Bangalore
**********
Ancestor :  to
POS tags :  ADP

Ancestor :  traveling
POS tags :  VERB

----------
Named Entities:  Hyderabad
Named Entities:  Bangalore


  * the parser is loaded and enabled as part of the standard processing pipeline. If you don’t need any of the syntactic information, you should disable the parser. Disabling the parser will make spaCy load and run much faster. If you want to load the parser, but need to disable it for specific documents, you can also control its use on the nlp object.
    * nlp = spacy.load("en_core_web_sm", disable=["parser"])
    * doc = nlp(u"I don't want parsed", disable=["parser"])

#### Sentence Segmentation
  * A Doc object’s sentences are available via the Doc.sents property. **Unlike other libraries, spaCy uses the dependency parse to determine sentence boundaries.** This is usually more accurate than a rule-based approach, but it also means you’ll need a statistical model and accurate predictions. If your texts are closer to general-purpose news or web text, this should work well out-of-the-box.
  * For social media or conversational text that doesn’t follow the same rules, your application may benefit from a custom rule-based implementation. You can either use the built-in Sentencizer or plug an entirely custom rule-based function into your processing pipeline.

In [74]:
doc = nlp(u"This is a sentence. This is another sentence.")
for sent in doc.sents:
    print(sent.text)

This is a sentence.
This is another sentence.


  * Custom rule-based stratagy

In [75]:
text = u"this is a sentence...hello...and another sentence."
doc = nlp(text)
print("Before:", [sent.text for sent in doc.sents])

Before: ['this is a sentence...hello...and another sentence.']


In [76]:
def set_custom_boundaries(doc):
    for token in doc[:-1]:
        if token.text == "...":
            doc[token.i+1].is_sent_start = True
    return doc

nlp.add_pipe(set_custom_boundaries, before="parser")
doc = nlp(text)
print("After:", [sent.text for sent in doc.sents])

After: ['this is a sentence...', 'hello...', 'and another sentence.']


#### Named Entity Recognition
  * spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default model identifies a variety of named and numeric entities, including companies, locations, organizations and products. You can add arbitrary classes to the entity recognition system, and update the model with new examples.
  * A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction. Because models are statistical and strongly depend on the examples they were trained on, this doesn’t always work perfectly and might need some tuning later, depending on your use case.

  * Named entities are available as the **ents** property of a **Doc**

In [77]:
nlp = spacy.load("en_core_web_sm") 
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


In [78]:
displacy.render(doc, style="ent", jupyter=True)

  * doc.ents is the standard way to access **entity annotations** such as "text", "start_char", "end_char", "label_"

In [79]:
doc = nlp(u"San Francisco considers banning sidewalk delivery robots")

# document level
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print(ents)

[('San Francisco', 0, 13, 'GPE')]


In [80]:
print(doc[0]) # token at index 0

San


In [81]:
print(doc[1]) # token at index 1

Francisco


In [82]:
for t in doc:
    print(t)

San
Francisco
considers
banning
sidewalk
delivery
robots


  * doc[index] object will have **token entity annotations** such as "text", "ent_iob_"  (iob: inside|outside|between), "ent_type_"

In [83]:
# token level
ent_san = [doc[0].text, doc[0].ent_iob_, doc[0].ent_type_]
ent_francisco = [doc[1].text, doc[1].ent_iob_, doc[1].ent_type_]
out_side = [doc[2].text, doc[2].ent_iob_, doc[2].ent_type_]
print(ent_san)  # [u'San', u'B', u'GPE']
print(ent_francisco)  # [u'Francisco', u'I', u'GPE']
print(out_side)

['San', 'B', 'GPE']
['Francisco', 'I', 'GPE']
['considers', 'O', '']


In [52]:
[doc[2].text, doc[2].ent_iob_, doc[2].ent_type_]

['considers', '', '']

#### Setting Named Entity Annotation (Custom tagging):**  In below text we expect SpaCy to identify "FB" as named entity

In [85]:
doc = nlp(u"FB is hiring a new Vice President of global policy")
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('Before', ents)

Before []


  * **adding FB to named entities**

In [86]:
from spacy.tokens import Span
ORG = doc.vocab.strings[u"ORG"]  # get hash value of entity label
PERSON = doc.vocab.strings[u"PERSON"]
fb_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity
fb_ent1= Span(doc, 5, 7, label=PERSON)
doc.ents = list(doc.ents) + [fb_ent, fb_ent1]

  * Now recognizes FB as named entity

In [87]:
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('After', ents)

After [('FB', 0, 2, 'ORG'), ('Vice President', 19, 33, 'PERSON')]


In [89]:
doc = nlp(u"FB is hiring a new Vice President of global policy. FB is a abbreviated word for Facebook")
fb_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity
doc.ents = list(doc.ents) + [fb_ent]
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('After further modifications', ents)

After further modifications [('FB', 0, 2, 'ORG'), ('FB', 52, 54, 'ORG'), ('Facebook', 81, 89, 'PERSON')]


#### Rule-based entity recognition V2.1
  * The **EntityRuler** is an exciting new component that lets you add named entities based on pattern dictionaries, and makes it easy to combine rule-based and statistical named entity recognition for even more powerful models.
  * **Entity Patterns:** Entity patterns are dictionaries with two keys: 
    * **"label":** specifying the label to assign to the entity if the pattern is matched.
    * **"pattern":** the match pattern. 
  * The entity ruler accepts two types of patterns:
    * **Phrase patterns** for exact string matches (string).
      * {"label":"ORG", "pattern":"Apple"}
    * **Token Patterns** with one dictionary describing one token (list). 
      * {"label":"GPE", "pattern":\[{"lower":"san"}, {"lower":"francisci"}\]}


In [93]:
from spacy.lang.en import English
from spacy.pipeline import EntityRuler

nlp = English()
ruler = EntityRuler(nlp)
patterns = [{"label": "ABC", "pattern": "FB"}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp(u"FB is hiring a new Vice President of global policy. FB is a abbreviated word for Facebook")
print([(ent.text, ent.label_) for ent in doc.ents])

[('FB', 'ABC'), ('FB', 'ABC')]


In [94]:
nlp = spacy.load("en_core_web_sm") 
ruler = EntityRuler(nlp)
patterns = [{"label": "ORG", "pattern": "FB"}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp(u"FB is hiring a new Vice President of global policy. FB is a abbreviated word for Facebook")
print([(ent.text, ent.label_) for ent in doc.ents])

[('FB', 'ORG'), ('FB', 'ORG'), ('Facebook', 'PERSON')]


In [43]:
displacy.render(doc, style="ent", jupyter=True)

In [92]:
nlp = English()
ruler = EntityRuler(nlp)
patterns = [{"label":"GPE", "pattern":[{"lower":"san"}, {"lower":"francisci"}]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

doc = nlp(u"FB is has an office in san franciSci. FB is a abbreviated word for Facebook")
print([(ent.text, ent.label_) for ent in doc.ents])

[('san franciSci', 'GPE')]


#### Example: Using entities, part-of-speech tags and the dependency parse, refer to https://spacy.io/usage/rule-based-matching#models-rules-pos-dep

In [40]:
displacy.render(doc, style="ent", jupyter=True)

  * **Semantic Analysis:** focuses on meaning of sentences, phrases, paragraphs or even documents. Symantic analysis is heavily used in **Word Sense Disambiguation (WSD)** (word2vec is very useful in understing sematic similarities)
  * **Lexical Semantics:** focuses on understing meaning of words, sub-words, compound words and phrases.
    * **Hyponymy, Hypernym, Hyponyms: Hyponymy** describes relationship between "Generic Term" and its "Specific Instances". Here generic term is called as **Hypernym** specific instances are called as **Hyponyms**.
    * **Homonymy:** Homonymys are the words with same spelling or syntax but have different meaning.
      * Example: bear (an animal)/bear (to withstand or hold up), can (a metal container)/can (able to)
    * **Polysemy:** is an aspect of semantic ambiguity that concerns the multiplicity of word meanings.
      * Example: consider the meaning of the adjective good in the following sentences:
        * We had a good time yesterday. (good - meaning - pleasurable/enjoyable/satisfying)
        * That ticket is good for travel on any flight. (good - meaning - generally valid and acceptable.)  
          
&nbsp;          
          
  * **Handling Ambiguties:** First we need to understand different kinds of ambiguties and how to handle them. There are four kinds of ambiguties, namely 1) lexical ambiguity, 2) Syntactic ambiguity, 3) Semantic ambiguity and 4) Pragmatic ambiguity.
    * **Lexical ambiguity:** is the presence of two or more possible meanings within a single word. Also called **semantic ambiguity** or **homonymy**.
      * Example: I saw a bird (saw - verb), The saw machine is useful in cutting wood (saw - noun).
      * **How to fix:** proper POS tagging can fix this issue. On top of POS tagging, we can use WordNet sense, it has got various senses for words with different POS tags.
    * **Syntactic ambiguity:** there can be different way of interpreting sequence of words. This is also known as **Prepositional Phrase(PP) ambiguity**. (The preposition “on” in “The keys are on the table” shows location. The preposition “in” in “The movie starts in one hour” shows time.)
      * Example: "The man saw the girl with the telescope" - this sentence can be interpreted in two ways 
        * 1) "the man saw the girl - with the telescope -" or 
        * 2) "the man saw - the girl with - the telescope".
          * **How to fix:** need to calculate log-likelyhood ratio of ratio between (co-occurence) "probability of preposition preceded by verb" and "preposition preceded by noun".
            * F(v, n, p) = log $\frac{P(p/v)}{P(p/n)}$ 
              * if F(v, n, p) < 0 then attach "preposition" with "noun"
              * if F(v, n, p) > 0 then attach "preposition" with "verb"
    * **Semantic ambiguity:** happens when the meaning of the words themselves can be misinterpreted.
      * Example: ABC head seeks arms. Here, head - can be understood as owner or body part. arms - can be understood as weapons or body parts.
      * **How to fix:** Handling semantic ambiguity with high accuracy is an open research area. The word2vec representation technique is very useful in handling semantic ambiguity.
    * **Pragmatic ambiguity:** is defined as "the context of a phrase gives multiple interpretations". Pragrmatic ambiguity is still an open area of research. 
      * Example I have pens and books, give it to that boy. Here the amiguity is about which one to give.
    * **Discourse Integration:** deals with how immediatly preceding sentence affect the meaning and interpretation of the next sentnece. The context can be at paragraph level or docuemnt level. Discourse Integration is useful in NLG applications such as Chatbots.
    * **Pragmatic Analysis:** Pragmatic analysis deals with outside word knowledge, the knowledge that is external to the document and/or query.
      * Example: "Pruning tree is a long process". Here we are talking about tree pruning in computer science algorithm context not interms of cutting physical tree. The statment is ambiguous. This kind of ambiguity is still an open area of research.    
  
  
## Natural-language generation (NLG):
  * The probabilistic/rule based models which make "computer" generate natural lanuge. 
  * It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out loud by a text-to-speech system.  