# Perceptron, Adaline and Logistique Regression

We are interested in the implementation of the perceptron algorithm (Rosenblatt, 68), Adaline (Widrow et Hoff, 60) and Logisitc Regression (Cox, 66) whose pseudo-code are the following:

Perceptron:
`Input: Train, eta, MaxEp
init: w
epoch = 0
err = 1
m = len(Train)
while epoque <= MaxEp and err! = 0
    err = 0
    for i in 1: m
        choose randomly an example (x,y)
        h <- w * x
        if (y * h <= 0)
            w <- w + eta * y * x
            err <- err + 1
     epoch <- epoch + 1
output: w`

Adaline:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choose randomly an example (x,y)
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*(y-dp)*x
     epoque <- epoque+1
output: w`

Logistic Regression:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choose randomly an example (x,y)
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*y*(1-sigm(y*dp))*x
     epoque <- epoque+1
output: w`

1. Create a list of 4 elements corresponding to the logical AND example called `Train`:
$Train=\{((1,+1,+1),+1),((1,-1,+1),-1),((1,-1,-1),-1),((1,+1,-1),-1)\}$

Each element of the list is a list which last characteristic is the class of the example and the first characteristics their coordinates with the biais '1' included at the beginning of each vector.

    

In [26]:
Train=[
    ([1, +1, +1], +1),
    ([1, -1, +1], -1),
    ([1, -1, -1], -1),
    ([1, +1, -1], -1),
]
Train

[([1, 1, 1], 1), ([1, -1, 1], -1), ([1, -1, -1], -1), ([1, 1, -1], -1)]

In [55]:
def dot(x, y):
    return sum(x_i*y_i for x_i, y_i in zip(x, y))

from math import exp

def sigmoid(z):
    return (1.0/(1.0+exp(-z)))

2. Code the Perceptron, Adaline and LR (Logistic regression) programs

Hint: You can write a function that calculates the dot product between an example $\mathbf{x} = (x_1, \ldots, x_d)$ and the weight vector $\mathbf{w} = (w_0, w_1, \ldots, w_d)$: 
$ h(\mathbf{x},\mathbf{w}) = w_0 + \sum_ {j = 1} ^ d w_j x_j $.


In [56]:
def perceptron_update(w,x,h,y,eta):
    return [w_i + eta * y * x_i for w_i,x_i in zip(w,x)]

In [39]:
def adaline_update(w,x,h,y,eta):
    return [w_i + eta * (y - h) * x_i for w_i,x_i in zip(w,x)]

In [46]:
from random import randint

def gradient_descent(train, eta, max_epoch, update_rule, update_on_error=False):
    w = [1 for _ in range(len(train))]
    err = True
    epoch = 0
    
    while (epoch < max_epoch) and err:
        
        for i in range(len(train)):
            x,y = train[randint(0,3)]
            h = dot(w,x)
            
            error = (y * h <= 0)
            
            if error and update_on_error:
                w = update_rule(w,x,h,y,eta)
            
            elif not update_on_error:
                w = update_rule(w,x,h,y,eta)
                
        epoch += 1
    
    print(f"stopping after {epoch} epochs")
                
    return w

In [50]:
w_adaline = gradient_descent(Train, 0.005, 100, adaline_update)

stopping after 100 epochs


In [54]:
dot(w_adaline, [1,1,1])

0.8202167060338319

In [64]:
w_perceptron = gradient_descent(Train, 0.005, 200, perceptron_update, update_on_error=True)
dot(w_perceptron, [1,-1,1])

stopping after 200 epochs


-7.771561172376096e-16

3. Apply the three learning models on the logical AND, and calculate the model error rate on this basis.

Hint: You can write a function that takes a weight vector $\mathbf{w}$ and an example $(\mathbf{x},y)$ and calculates the error rate of the model with weight $\mathbf{w}$.

In [6]:
def EmpiricalRisk(Test,W):
    E=0.0
    m=len(Test)
    # The empirical error of a model with weight W on a test set of size m
    for Obs in Test:
        y = Obs[-1]
        h_w = h(Obs[:-1], W)
        if (y*h_w <= 0):
            E+=1.0
    return E/float(m)



4. We are now going to focus on the behavior of the three models on http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks), https://archive.ics.uci.edu/ml/datasets/spambase, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29, https://archive.ics.uci.edu/ml/datasets/Ionosphere. These files are in the current respository with the names `sonar.txt`; `spam.txt`; `wdbc.txt` and `ionoshpere.txt`. We can use the following `ReadCollection` function in order to read the files in the form of the training set that is requested. 

In [2]:
from math import sqrt
import pandas as pd
import random
from sklearn.model_selection import train_test_split

def Normalize(x):
    norm=0.0
    for e in x:
        norm+=e**2
    for i in range(len(x)):
        x[i]/=sqrt(norm)
    return x

def ReadCollection(filename):
    tag_df=pd.read_table(filename,sep=',',header=None)
    if("wdbc" in filename):
        Dic={'M': -1, 'B': +1}
    elif("sonar" in filename):
        Dic={'R': -1, 'M': +1}
    elif("iono" in filename):
        Dic={'g': -1, 'b': +1}
    elif("spam" in filename):
        Dic={0:-1, 1:+1}
        
    X=[]
    for e in range(len(tag_df)):
        x=list(tag_df.loc[e,:])
        if("wdbc" in filename):
            x.pop(0)
            cls=x.pop(0)
        else:
            cls=x.pop()
        x=Normalize(x)
        x.insert(len(x),Dic[cls])
        X.append(x)
    random.shuffle(X)

    return X

wdbc_col = ReadCollection("wdbc.txt")
sonar_col = ReadCollection("sonar.txt")
iono_col = ReadCollection("ionosphere.txt")
spam_col = ReadCollection("spam.txt")


 2. Run the three models on these files with $\eta=0.01$ et $\eta=0.1$ and `MaxEp=500`.
 
 3. Report in the table below the average of the error rates on the test by repeating each experiment 10 times. 
 
 <br>
 <br>
 
 
 <center> $\eta=0.01$, MaxE$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  | WDBC       |            |         |          |
  | Ionosphere |            |         |          |
  | Sonar      |            |         |          |
  | Spam       |            |         |          |
 
 <br><br>
  
  <center> $\eta=0.1$, MaxEp$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  | WDBC       |            |         |          |
  | Ionosphere |            |         |          |
  | Sonar      |            |         |          |
  | Spam       |            |         |          |
  
  Hint: you can use the following function

In [7]:
for eta in [0.01,0.1]:
    print(f'eta={eta}\n-------')
    for name, X in zip(["WDBC", "Ionosphere", "Sonar", "Spam"],[wdbc_col,iono_col,sonar_col,spam_col]):
        errP=errA=errL=0.0
        for i in range(20):
            x_train ,x_test = train_test_split(X,test_size=0.25)
            WLP=Perceptron(x_train,eta,500)
            errP+=EmpiricalRisk(x_test,WLP)
            WLA=Adaline(x_train,eta,500)
            errA+=EmpiricalRisk(x_test,WLA)
            WLR=LR(x_train,eta,500)
            errL+=EmpiricalRisk(x_test,WLR)
    
        print(f"| {name} | {errP/float(20):.4f} | {errA/float(20):.4f} | {errL/float(20):0.4}")

eta=0.01
-------
| WDBC | 1.0000 | 1.0000 | 1.0
| Ionosphere | 1.0000 | 1.0000 | 1.0
| Sonar | 1.0000 | 1.0000 | 1.0
| Spam | 1.0000 | 1.0000 | 1.0
eta=0.1
-------
| WDBC | 1.0000 | 1.0000 | 1.0
| Ionosphere | 1.0000 | 1.0000 | 1.0
| Sonar | 1.0000 | 1.0000 | 1.0
| Spam | 1.0000 | 1.0000 | 1.0


 4. Normalize the vector representations of observataions by dividing them with their norm and repeat quetions 2 and 3. Are there any significant change by normalizing? Please explain.


<b>Resp:</b>  