# 1. What is Classification?

Given a set of classes, assigning the correct class label to the given input

##### **Examples of Text Classification*
- Topic identification (Politics, Sports, Technology, ...)
- Spam Detection
- Sentiment Analysis (Is this review positive or negative?)
- Spelling correction (weather or whether? color or colour?)

### 1) Supervised Learning

Humans learn from past experiences, machines learn from past instances!

##### **Supervised Classification*
Learning a **classification model** on properties ("features") and their importance ("weights") from labeled instances
- Training Phase (given Labeled input with Classification algorithm) 
- Labeled Data set is split into Training Data / Test Data
- -> Inference Phase (Classification Model given unlabeled input)
<br><br>
$ X $: set of attributes or features : $   {   \{x_1, x_2, ..., x_n\}} $
<br><br>
$ y $ : A "class" label from the label set $ Y = \{y_1, y_2, ..., y_n\} $
<br><br>

Apply the model on new instances to predict label.


### 2) Classification Paradigms
1. When there are only two possible classes -> ** Binary Classification **
2. More than two -> ** Multi-class Classification **
3. When data instances can have two or more labels -> ** Multi-label Classification **

##### **Questions to ask in supervised learning*
In Training phase,
- What are the features? How do you represent them?
- What is the classification model / algorithm?
- What are the model parameters?

In inference phase,
- What is the expected performance? What is a good measure?

# 2. Identifying Features from Text

##### **Why is textual data unique? *
- Textual data presents a unique set of challenges
- All the information you need is in the text
- But features can be pulled out from text at different granularities!

### 1) Types of textual features

1. Words
    - By far the most common class of features
    - Handling commonly-occurring words : ** Stop words**
    - Normalization : Make lower case vs. leave as-is
    - Stemming / Lemmatization
    <br><br>
    
2. Characteristic of words
    - Capitalization ( white house vs White House )
    - Parts of speech in a sentence
    - Grammatical structure, sentence parsing
    - Grouping words of similar meaning, semantics 
        - {buy, purchase}
        - {Mr., Ms.,.. }
        - Numbers, Digits, Dates ..
3. Other
    - Depending on classification tasks, features may come from inside words and word sequences 
        - ex. bigrams, trigrams, n-grams : "White House"
    - character sub-sequences in words : "ing", "ion", ...
    - 

# 3. Naive Bayes Classifier

##### *Case study : Classifying text search queries

Suppose you are interested in classifying search queries in three classes
         : <br>   *Entertainment, Computer Science, Zoology*

The most common class among three is "Entertainment".

<br>**1. Suppose the query is "Python"**
- Python, the snake (Zoology)
- Python, the programming language (Computer Science)
- Python, as in Monty Python (Entertainment)

Most common class, given "Python", is Zoology.

<br>**2. Now suppose the query is "Python download"**

Most probable class, is computer science.
<br><br><br>

**So what is happening? **

### 1) Probabilistic Model
Update the likelihood of the class given new information.

***Prior Probability*** : <br>Pr(y = Entertain), Pr(y = CS), Pr(y = Zoology)

***Posterior Probability*** :<br>Pr(y = Entertain | x = "Python")
<br><br>
##### ***Bayes' Rule*** : <br>
$$ Pr( y | X ) = \frac{ Pr(y)*Pr(X|y)}{Pr(X)} $$

### 2) Naive Bayes Classification

$$ Pr(y= CS | "Python" ) = \frac{Pr(y=CS)*Pr("Python"|y=CS)}{Pr("Pyton")}$$

<br>
$$ if Pr(y=CS | "Python") > Pr(y=Zoology | "Python") , y = CS $$

<br>
$$ y^* = argmax_yPr(y|X) = argmax_yPr(y)*Pr(X|y)$$
<br>
##### ***Naive assumption *** 
-> Given the class label, features are assumed to be independent of each other.

$$ y^* = argmax_yPr(y|X) = argmax_yPr(y)*\prod_{i=1}^nPr(X_i|y)$$


##### ***For example, ***
-> Query : "Python Download"

$$ y^* = argmax_yPr(y)*Pr("Python"|y)*Pr("Download"|y)$$


### 3) What are the parameters?

1. Prior probabilities : Pr(y) for all y in Y
2. Likelihood : $Pr(x_i|y)$ for all features $x_i$ and labels y in Y
<br>
If there are 3 classes and the dimension of the data element (features) is 100, how many parameters does the naive Bayes model have?

A naive Bayes Classifier has two kinds of parameters ;
1. Pr(y) for every y in Y: so if |Y| = 3, there are three such parameters.
2. Pr(x_i | y) for every binary feature x_i in X and y in Y. <br> Specifically, for a particular feature x_1, the parameters are Pr(x_1=1 |y) and Pr(x_1=0 | y). So if |X| = 100 binary features and |Y| = 3, there are (2*100) * 3 = 600 such features.
3. Hence in all, there are 603 features.


### 4) Training Parameters

1. Prior Probabilities : Pr(y) for all y in Y
    - Count the number of instances in each class
    - IF there are N instances in all, and n out of those are labeled as class y, Pr(y) = n / N<br><br>
2. Likelihood : Pr(x_i | y) for all features x_i and labels y in Y
    - Count how many times feature x_i appears in instances labeled as class y
    - IF there are p instances of class y, and x_i appears in k of those, Pr(x_i | y) = k / p<br><br>
3. **Smoothing**
    - What happens if Pr(x_i|y) = 0? -> x_i never occurs in label y
    - then, posterior prob Pr(y|x_i) will be 0 !!
    - Instead, we smooth the parameters. (add a dummy count)
    - ***Laplace smoothing*** or ***Additive smoothing*** : Add a dummy count
    - Pr(x_i|y ) = (k+1) / (p+n); where n is number of features<br>
    ( p + n , because I haved added n words as dummies)
    

# 4. Naive Bayes Variations

### 1) Two classic Naive Bayes Variants for Text

Two common options for Naive Bayes Classification you'll face.


1. Multinomial Naive Bayes
    - Assumes data follows a multinomial dist
    - Each feature value is a count (word occurrence count, TF-IDF weighting, ...)
    - often used in text documents
    <br><br>
2. Bernoulli Naive Bayes
    - Data follows multivariate bernoulli dist
    - Each feature is binary (word is present / absent)