In [1]:
# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})

# for custom notebook formatting.
from IPython.core.display import HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
HTML(open('../custom.css').read())


<br><br><br>


## Natural Language Processing
### :::: Overview ::::


    
<br><br><br><br><br><br>


<br><br><br>

## What is Natural Language?

<br><br><br><br><br><br>

<br><br><br>
## Natural vs. Unnatural (Formal) Languages

**Natural**
- Emerges from intelligent beings
- We **discover** the grammar.
- Full of ambiguity
- English, Spanish, Dolphin Language?

**Formal**
- Defined by humans
- We **prescribe** the grammar.
- Designed to **remove** ambiguity
- Python, math, ...
<br><br><br><br><br><br>

## NLP Examples

![figs/watson.jpg](figs/watson.jpg)
<br><br><br><br><br><br>

![figs/siri.png](figs/siri.png)
<br><br><br><br><br><br>

![figs/translate.jpg](figs/translate.jpg)
<br><br><br><br><br><br>

![figs/echo.jpg](figs/echo.jpg)
<br><br><br><br><br><br>

![figs/her.jpg](figs/her.jpg)

In [2]:
%%HTML
<video width="800" height="600" controls>
  <source src="figs/dave.mp4" type="video/mp4">
</video>
<br><br><br><br><br><br><br><br><br><br><br><br>


![figs/chatgpt.png](figs/chatgpt.png)
<br><br><br><br><br><br>


<br><br><br><br><br><br>

### Natural Language Processing = Linguistics + Logic + Machine Learning

Each has a long history...

<br><br><br><br><br><br>



## NLP History

- Dates back to first days of computing (Turing)
- Combines linguistics, formal logic, and statistics
<br><br><br><br><br><br>

## NLP History
### 1940s-1950s

**Logic**
- automaton (Turing)
- Context-free grammars (Chomsky)

**Statistics**
- McCulloch Pits neuron (perceptron)
- Noisy channel model (Shannon)

### Turing test (Alan Turing, 1950)


Q: Please write me a sonnet on the topic of the Forth Bridge.  
A: Count me out on this one. I never could write poetry  


Q: Add 34957 to 70764  
A: (Pause about 30 seconds and then give answer as) 105721.

<br><br><br><br>

## NLP History

### 1957-1970

**Logic**
- Theorem proving (Newell & Simon) 
- Logical inference

**Statistics**
- Neural nets
- Bayesian statistics (e.g., identify author of text)


#### ELIZA (1964, Weizenbaum)

User: You are like my father in some ways.  
ELIZA: WHAT RESEMBLANCE DO YOU SEE  
User: You are not very aggressive but I think you don’t want me to notice that.  
ELIZA: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE  
User: You don’t argue with me.  
ELIZA: WHY DO YOU THINK I DON’T ARGUE WITH YOU?  
User: You are afraid of me.  
ELIZA: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU

<br><br>

Online example: <http://www.masswerk.at/elizabot/>
<br><br><br><br>

## NLP History
### 1970-1983

**Logic**
  - Winograd's SHRDLU

**Statistics**
  - speech recognition (AT&T Bell Labs, IBM)
  - Hidden Markov Models

![figs/shrdlu.png](figs/shrdlu.png)


## NLP History

### 1983-1993
- Return to finite state models
- Empirical approach: (data driven) IBM speech recognition

### 1994-1999
- Empirical approach widespread
- Bayesian statistics
- graphical models

### 2000s
- Combinations of logical and empirical approaches
  - Markov logic networks, etc.
- Deep learning
  - revival of neural nets from 1960s
<br><br><br><br><br><br>

![figs/gpt_growth.png](figs/gpt_growth.png)

(Parmida Beigi, Amazon)

## Linguistics

- **Phonetics and Phonology:** The study of linguistic sounds.
  - /fəˈnediks/

<br><br><br><br>

- **Morphology:** The study of the meaningful components of words.

![figs/morph.png](figs/morph.png)

<br><br><br><br>

- **Syntax:** The study of the structural relationships between words.
  -  "*I’m I do, sorry that afraid Dave I’m can’t.*"
  
![figs/dog.png](figs/dog.png)

<br><br><br><br>

- **Semantics:** The study of meaning.

![figs/green.png](figs/green.png)


<br><br><br><br>
- **Pragmatics:** The study of how language is used to accomplish goals.
  - "*Honey, do you think it's cold in here?*"

<br><br><br><br>

- **Discourse:** The study of linguistic units larger than a single utterance.
  - **Dave**: Open the pod bay doors, HAL.
  - **HAL**: I'm sorry Dave, I can't do **<font color=blue>that</font>.**



<br><br><br><br>

## Ambiguity: The Good and the Bad

- Makes language fun and interesting for humans, but makes language difficult for computers.
- The central problem to NLP is **resolving ambiguity**.


- E.g., "*I made her duck*."

<br><br><br><br><br><br><br><br>



1. I cooked waterfowl for her.
2. I cooked waterfowl belonging to her.
3. I created the (plaster?) duck she owns.
4. I caused her to quickly lower her head or body.
5. I waved my magic wand and turned her into undifferentiated waterfowl.


- Syntactic ambiguity (1 vs 4): "duck" $\rightarrow$ verb or noun?  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **part-of-speech tagging, syntactic parsing**
- Semantic ambiguity (1 vs 3): "make" $\rightarrow$ *create* or *cook*? &nbsp;&nbsp; **word sense disambiguation**
- Phonetic ambiguity: "I" or "eye"; "made" or "maid"?  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **speech recognition**

<br><br><br>

## Models & Algorithms

- State machines
- Rule systems
- Logic
- Probability
- Dynamic programming
- Machine Learning
<br><br><br>

> While initial NLP approaches focused on writing complex rules to parse language, modern approaches instead rely on machine learning
- learn patterns from data to assign real-valued scores to different language constructs
- induce a probability over possible meanings

<br>

E.g., consider two ways of detecting whether an email is spam:

1. Write a bunch of `if...else` statements
  - `if "free" in email: return "spam"`
  
2. Collect a bunch of emails, annotate them as spam or not, and compute statistics over word occurrences
  - $p(\hbox{spam}| free) = 0.6$

<br><br>

**To the syllabus!**
<https://github.com/tulane-cmps6730/main>

<br>

but first

<br><br><br><br>



I like telling Dad jokes.

<br><br><br><br><br><br><br><br><br><br><br><br>


Sometimes he laughs!!

<br><br><br>
why did the natural language processor break up with the sentiment analyzer?

<br><br><br><br><br><br><br><br><br><br><br><br>

because it couldn't handle its emotional baggage.

<br><br><br><br><br><br><br><br><br><br><br><br>

Interviewer: What's your biggest strength?

Interviewee: I'm good at Machine Learning

Interviewer: Okay, what's 21+17

Interviewee: It's 5

Interviewer: Not even close. It's 38

Interviewee: It's 20

Interviewer: I said it's 38

Interviewee: It's 35

Interviewer: It's still 38

Interviewee: It's 38

Interviewer: Hired!er: Hired!

#### image sources

- https://www.cs.colorado.edu/~martin/SLP/

- https://www.washingtonpost.com/business/on-it/how-ibm-is-trying-to-commercialize-watson/2014/05/09/4f552506-d23c-11e3-937f-d3026234b51c_story.html

- http://www.howtogeek.com/229308/26-actually-useful-things-you-can-do-with-siri/

- http://mashable.com/2015/01/14/google-translate-word-lens/

- https://www.youtube.com/watch?v=ng7Sti29S5k

- http://www.kurzweilai.net/a-review-of-her-by-ray-kurzweil

- https://www.youtube.com/watch?v=9W5Am-a_xWw

- http://mosermichael.github.io/cstuff/all/blog/2015/02/05/nlp-revisited.html

- http://all-about-linguistics.group.shef.ac.uk/branches-of-linguistics/morphology/what-is-morphology/

- http://english.stackexchange.com/questions/294993/ambiguous-syntax-tree-and-phrase-structure-rules

- https://en.wikipedia.org/wiki/Talk%3AColorless_green_ideas_sleep_furiously

- http://www.salem-news.com/articles/september102009/oxycontin_wolf_9-10-09.php