# Perceptron, Adaline and Logistique Regression

We are interested in the implementation of the perceptron algorithm (Rosenblatt, 68), Adaline (Widrow et Hoff, 60) and Logisitc Regression (Cox, 66) whose pseudo-code are the following:

Perceptron:
`Input: Train, eta, MaxEp
init: w
epoch = 0
err = 1
m = len(Train)
while epoque <= MaxEp and err! = 0
    err = 0
    for i in 1: m
        choose randomly an example (x,y)
        h <- w * x
        if (y * h <= 0)
            w <- w + eta * y * x
            err <- err + 1
     epoch <- epoch + 1
output: w`

Adaline:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choose randomly an example (x,y)
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*(y-dp)*x
     epoque <- epoque+1
output: w`

Logistic Regression:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choose randomly an example (x,y)
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*y*(1-sigm(y*dp))*x
     epoque <- epoque+1
output: w`

1. Create a list of 4 elements corresponding to the logical AND example called `Train`:
$Train=\{((1,+1,+1),+1),((1,-1,+1),-1),((1,-1,-1),-1),((1,+1,-1),-1)\}$

Each element of the list is a list which last characteristic is the class of the example and the first characteristics their coordinates with the biais '1' included at the beginning of each vector.

    

In [1]:
Train=[
    ([1, +1, +1], +1),
    ([1, -1, +1], -1),
    ([1, -1, -1], -1),
    ([1, +1, -1], -1),
]
Train

[([1, 1, 1], 1), ([1, -1, 1], -1), ([1, -1, -1], -1), ([1, 1, -1], -1)]

In [2]:
def dot(x, y):
    return sum(x_i*y_i for x_i, y_i in zip(x, y))

from math import exp

def sigmoid(z):
    return (1.0/(1.0+exp(-z)))

2. Code the Perceptron, Adaline and LR (Logistic regression) programs

Hint: You can write a function that calculates the dot product between an example $\mathbf{x} = (x_1, \ldots, x_d)$ and the weight vector $\mathbf{w} = (w_0, w_1, \ldots, w_d)$: 
$ h(\mathbf{x},\mathbf{w}) = w_0 + \sum_ {j = 1} ^ d w_j x_j $.


In [3]:
def _compute_update(w, x, term, eta):
    return [w_i + eta * term * x_i for w_i,x_i in zip(w,x)]

def perceptron_update(w,x,h,y,eta):
    return _compute_update(w, x, y, eta)

def adaline_update(w,x,h,y,eta):
    return _compute_update(w, x, y - h, eta)

def logreg_update(w,x,h,y,eta):
    return _compute_update(w,x, y * (1 - sigmoid(y * h)), eta)

In [4]:
from random import randint

def gradient_descent(train, eta, max_epoch, update_rule, update_on_error=False):
    w = [0.0 for _ in range(len(train))]
    err = 1
    epoch = 0
    
    while (epoch < max_epoch) and err > 0:
        err = 0
        
        for i in range(len(train)):
            x,y = train[randint(0, len(train) - 1)]
            h = dot(w,x)
            
            if (y * h <= 0): err += 1
            
            if err > 0 and update_on_error:
                w = update_rule(w,x,h,y,eta)
            
            elif not update_on_error:
                w = update_rule(w,x,h,y,eta)
                
        epoch += 1
                
    return w

In [5]:
w_adaline = gradient_descent(Train, 0.005, 50, adaline_update)
dot(w_adaline, [1,1,1])

0.029358700443557935

In [6]:
w_perceptron = gradient_descent(Train, 0.005, 50, perceptron_update, update_on_error=True)
dot(w_perceptron, [1,-1,1])

-0.009999999999999998

In [7]:
w_logreg = gradient_descent(Train, 0.005, 50, logreg_update)
dot(w_logreg, [1,-1,1])

0.009956225789509505

3. Apply the three learning models on the logical AND, and calculate the model error rate on this basis.

Hint: You can write a function that takes a weight vector $\mathbf{w}$ and an example $(\mathbf{x},y)$ and calculates the error rate of the model with weight $\mathbf{w}$.

In [8]:
Test = [
    ([1, +1, +1], +1),
    ([1, -1, +1], -1),
    ([1, -1, -1], -1),
    ([1, +1, -1], -1),
]

In [9]:
def EmpiricalRisk(Test,W):
    E=0.0
    m=len(Test)
    # The empirical error of a model with weight W on a test set of size m
    for xi, yi in Test:
        h_w = dot(W, xi)
        if (yi*h_w <= 0):
            E+=1.0
    return E/float(m)

In [10]:
print(f"Empirical risk Perceptron = {EmpiricalRisk(Test, w_perceptron)}")
print(f"Empirical risk Adaline = {EmpiricalRisk(Test, w_adaline)}")
print(f"Empirical risk LogReg = {EmpiricalRisk(Test, w_logreg)}")

Empirical risk Perceptron = 0.0
Empirical risk Adaline = 0.0
Empirical risk LogReg = 0.25


4. We are now going to focus on the behavior of the three models on http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks), https://archive.ics.uci.edu/ml/datasets/spambase, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29, https://archive.ics.uci.edu/ml/datasets/Ionosphere. These files are in the current respository with the names `sonar.txt`; `spam.txt`; `wdbc.txt` and `ionoshpere.txt`. We can use the following `ReadCollection` function in order to read the files in the form of the training set that is requested. 

In [11]:
from math import sqrt
import pandas as pd
import random
from sklearn.model_selection import train_test_split

def Normalize(x):
    norm=0.0
    for e in x:
        norm+=e**2
    for i in range(len(x)):
        x[i]/=sqrt(norm)
    return x

def ReadCollection(filename, normalize=True):
    tag_df=pd.read_table(filename,sep=',',header=None)
    if("wdbc" in filename):
        Dic={'M': -1, 'B': +1}
    elif("sonar" in filename):
        Dic={'R': -1, 'M': +1}
    elif("iono" in filename):
        Dic={'g': -1, 'b': +1}
    elif("spam" in filename):
        Dic={0:-1, 1:+1}
        
    X=[]
    for e in range(len(tag_df)):
        x=list(tag_df.loc[e,:])
        if("wdbc" in filename):
            x.pop(0)
            cls=x.pop(0)
        else:
            cls=x.pop()
            
        if normalize: x=Normalize(x)
        x.insert(len(x),Dic[cls])
        X.append(x)
    random.shuffle(X)

    return X

In [12]:
models = {
    'Perceptron': perceptron_update,
    'Adaline': adaline_update,
    'LogReg': logreg_update
}

In [13]:
datasets = {
    "WDBC": ReadCollection("wdbc.txt"), 
    "Ionosphere": ReadCollection("ionosphere.txt"),
    "Sonar": ReadCollection("sonar.txt"),
    "Spam": ReadCollection("spam.txt")
}

 2. Run the three models on these files with $\eta=0.01$ et $\eta=0.1$ and `MaxEp=500`.
 
 3. Report in the table below the average of the error rates on the test by repeating each experiment 10 times. 
 
 <br>
 
 
 <center> $\eta=0.01$, MaxE$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  | WDBC       |            |         |          |
  | Ionosphere |            |         |          |
  | Sonar      |            |         |          |
  | Spam       |            |         |          |
 
 <br><br>
  
  <center> $\eta=0.1$, MaxEp$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  | WDBC       |            |         |          |
  | Ionosphere |            |         |          |
  | Sonar      |            |         |          |
  | Spam       |            |         |          |
  
  Hint: you can use the following function

In [14]:
def get_x_y(dataset):
    return [(vector[:-1], vector[-1]) for vector in dataset]

In [15]:
for eta in [0.01,0.1]:
    print(f'eta = {eta}\n--------')
    
    results = pd.DataFrame(
            index=datasets.keys(),
            columns=models.keys(),
            dtype=float
    ).fillna(0.0)
    
    for name, X in datasets.items():
        for i in range(20):
            train, test = train_test_split(X, test_size=0.25)

            train = get_x_y(train)
            test = get_x_y(test)
            
            for model, update_fnc in models.items():
                
                results.loc[name, model] += EmpiricalRisk(
                    test, gradient_descent(train, eta, 500, update_fnc)
                )
                
    results = results.applymap(lambda x: round(x/20.0, 4))
    
    print(results)

eta = 0.01
--------
            Perceptron  Adaline  LogReg
WDBC            0.3699   0.0860  0.1017
Ionosphere      0.3114   0.1562  0.1517
Sonar           0.4587   0.2471  0.2548
Spam            0.3970   0.2626  0.2249
eta = 0.1
--------
            Perceptron  Adaline  LogReg
WDBC            0.3650   0.0892  0.0888
Ionosphere      0.3330   0.1841  0.1676
Sonar           0.4913   0.2769  0.2462
Spam            0.3987   0.2663  0.1470


 4. Normalize the vector representations of observataions by dividing them with their norm and repeat quetions 2 and 3. Are there any significant change by normalizing? Please explain.


<b>Resp:</b>  