# [The Perceptron](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=2ahUKEwiflsSE067dAhULx1kKHU2oAgEQFjACegQIChAL&url=https%3A%2F%2Ftowardsdatascience.com%2Fwhat-the-hell-is-perceptron-626217814f53&usg=AOvVaw2dGzhYaivl13eBWGSyhbzi)

The perceptron is a linear binary classifier that is analagous to a neuron in
that it takes in one or more inputs, processes it, and produces an output.

### Online Learning
* Learning algorithm can update models paremeters using:
  * Single training instance rather the entire batch
* Useful for learning from training sets too large to represent in memory

### Activation Functions 
The perceptron classifies instances by proessing a linear combination of feaures and model parameters
using an activation function  
$y = \phi(\sum^n_{i=1} w_i x_i + b)$, where $w_i$ are the models parameters, *b* is a constant bias term,
and $\phi$ is the activation function

The linear activation of the parameters and inputs is sometimes call __Preactivation__

##### Heaviside step function
Rosenbatt's orginal perceptron used a __Heaviside step function__ also called a unit step function  
g(x) = 1, if x>0  
g(x) = 0, otherwise

##### Logistic Sigmoid
$g(x) = \frac{1}{1 + e^{-x}}$, where *x* is the weighted sum of the inputs. Note: it is differential

#### Perceptron Learning Algorithm
$w_i(t+1) = w_i(t) + \alpha(d_j + y_j(t))x_{i,j}$, for all feature $0\le i \le n$  

$d_j$ - true class for instance j  
$y_j(t)$ - prediected class for instance j  
$x_{i,j}$ - value of the $i^{th}$ feature for instance j  
$\alpha$ - hyperparamter that controlls the learning rate   

* Set weights to zero/small random values
* Predict class for training instances
  * If correct: Continue to next instance without updating weights
  * If incorrect: Update the weights
    * Compute: $d_j + y_j(t) \times$ *the value feature* $\times$ *the learning rate*
* Each pass through the training instances is called an epoch
* The learning algorithm has converged when it complete an epoch without misclassifying
any of the instances

#### Document Classification
* Fit
* Predict
* Partial fit - allows classifier to be trained incrementally

In [2]:
#Document classification with the perceptron
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import Perceptron
from sklearn.metrics import f1_score, classification_report

# Fetch and read the dataset (fetch_20newsgroups)
categories = ['rec.sport.hockey', 'rec.sport.baseball', 'rec.autos']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, 
                                      remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, 
                                     remove=('headers', 'footers', 'quotes'))


vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(newsgroups_train.data)
X_test = vectorizer.transform(newsgroups_test.data)
clf = Perceptron(random_state=11)
clf.fit(X_train, newsgroups_train.target )
predictions = clf.predict(X_test)
print(classification_report(newsgroups_test.target, predictions))

             precision    recall  f1-score   support

          0       0.81      0.92      0.86       396
          1       0.87      0.76      0.81       397
          2       0.86      0.85      0.86       399

avg / total       0.85      0.84      0.84      1192





# Limitations of the Perceptron

* The perceptron uses a hyperplane to seporate positive and negative classes
  * Example: XOR
* In other words: Not possible for some datasets that are not linearly seporable
