# MACHINE LEARNING LAB ASSIGNMENT


# PERCEPTRON LEARNING ALGORITHM


### NAME     : **MOHIT TALREJA**

### ROLL NO. : **177237**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

The function predict takes two parameters:

- the weights array $w_{0},w_{1},...,w_{n}$ , here, n=number of features
- features array for a single row in the dataset, each entry being of the form $x_{0},x_{1},...,x_{n}$.

The function computes the equation,

$\sum_{i=0}^{n}w_{i}.x_{i}$

where $x_{0} = 1$, 

This is the scalar dot product of the 2 vectors, weights and features.

If this value is greater than 0, then perceptron should predict the output to be 1, otherwise 0.
It returns this prediction to the function call.

In [2]:
def predict(weights,features):
    
    summation = np.dot(weights,features.transpose()) #scalar dot product

    if(summation>0):
        prediction = 1
    else:
        prediction = -1
    
    return prediction

The perceptronTrain function below takes 4 parameters:
- features: the 2d features array containing all data points and their feature values(the first column must be filled with 1s)
- labels: the observed/actual values from the dataset
- num_iter: number of epochs for which the training algorithm is to be run
- l_rate: learning rate hyperparameter

It first initializes the weights to 0. The size of the array will be equal to the number of features (first column of the passed features array has all entries as 1, so it is actually number of features in the dataset + 1).
In each iteration, it will iterate through all the rows in the dataset. It calls the predict function above to get the predicted value of the label.

If the observed value is the same as the predicted value, then weights will remain the same.
Else, weights will be updated as per the equation 

$w_{i} = w_{i} + \alpha.(observed - predicted).x_{i}$

$\alpha$ = learning rate hyperparameter value

The function returns the weights array to the function call.

In [3]:
def perceptronTrain(features,labels,num_iter,l_rate):
    
    weights = np.full((1,features.shape[1]),0.2) #initializing weights array to 0s
  
    for epoch in range(num_iter):
        delta_w = np.zeros(shape=(1,features.shape[1]))
        for i in range(len(features)):
            
            prediction = predict(weights,features.iloc[i,:]) #prediction for current row
            print("Train ex = ",features.iloc[i,:],"prediction = ",prediction)
            delta_w += l_rate *(labels[i]-prediction)*features.iloc[i,:] #updating weights' values
            print("after Train example ",i,"delta w is ",delta_w)
        weights+=delta_w
        print("After epoch ",epoch,"weights are ",weights)
    return weights

**Dataset:**
Prediction of breast cancer whether it is Malignant or Benign

Multivariate dataset that has 32 attributes
1. ID number
2. Diagnosis (M = malignant, B = benign) -> Output Label

Ten real-valued features are computed for each cell nucleus:

1. radius (mean of distances from center to points on the perimeter)
2. texture (standard deviation of gray-scale values)
3. perimeter
4. area
5. smoothness (local variation in radius lengths)
6. compactness (perimeter^2 / area - 1.0)
7. concavity (severity of concave portions of the contour)
8. concave points (number of concave portions of the contour)
9. symmetry
10. fractal dimension ("coastline approximation" - 1)

In [4]:
# data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', 
#                    header = None)
# print(data)

### Preprocessing

Excluding the first 2 columns in feature vector as 1st column is ID number and 2nd column is the output label.
Output labels are str type, converting them to boolean values 1 for Malignant(M) and 0 for Benign(B).

In [5]:
# X = data.iloc[:,2:] 
# X.insert(0,0,1) #Setting 1st column to 1s, i.e, setting x0 in equation to 1
# y = data.iloc[:,1] #Output labels are as M,B string values
# y_ = [] #making new output labels as 1 or 0
# for label in y:
#     if(label == "M"):
#         y_.append(1) #1 for Malignant
#     else:
#         y_.append(0) #0 for Benign

### Training

In [6]:
# #Splitting the dataset into training and testing
# X_train, X_test, y_train, y_test = train_test_split(X, y_, test_size=0.33,random_state =  37)

In [7]:
# NUM_EPOCHS = 10
# LEARNING_RATE = 0.1

In [8]:
# #calling perceptron training algorithm to get learned weights 
# w = perceptronTrain(X_train,y_train,NUM_EPOCHS,LEARNING_RATE)
# print("Weights = ", end='')
# print(w)



### Prediction Metrics on Test Data

Generating the number of correctly predicted output labels on test data and then calculating the accuracy of the model as


100*(Number of correctly predicted outputs of Test Data)/(Size of Test Data)

In [9]:
# correct_predictions = 0

# for i in range(len(y_test)):
#     #getting prediction for corresponding test data values from the learned model
#     prediction = predict(w,X_test.iloc[i,:])
    
#     if(prediction == y_test[i]):
#         #if predicted value is the same as observed, it is a correct prediction
#         correct_predictions+=1; 

# print("Correctly predicted "+str(correct_predictions)+" out of "+str(len(y_test)))
# print("Accuracy = "+str(100*correct_predictions/len(y_test)))

### Perceptron Learning Algorithm on Logic Gates

In [10]:
X_LOGIC = pd.DataFrame([[1,0,0],[1,0,1],[1,1,0],[1,1,1]])
Y_AND = [-1,-1,-1,1] #output labels for AND gate
Y_OR = [0,1,1,1] #output labels for OR gate

#training on inputs
weights_AND = perceptronTrain(X_LOGIC,Y_AND,2,0.2)
weights_OR = perceptronTrain(X_LOGIC,Y_OR,2,0.2)

print("Weights for AND logic perceptron: ")
print(weights_AND)
print("Weights for OR logic perceptron: ")
print(weights_OR)

Train ex =  0    1
1    0
2    0
Name: 0, dtype: int64 prediction =  1
after Train example  0 delta w is  [[-0.4  0.   0. ]]
Train ex =  0    1
1    0
2    1
Name: 1, dtype: int64 prediction =  1
after Train example  1 delta w is  [[-0.8  0.  -0.4]]
Train ex =  0    1
1    1
2    0
Name: 2, dtype: int64 prediction =  1
after Train example  2 delta w is  [[-1.2 -0.4 -0.4]]
Train ex =  0    1
1    1
2    1
Name: 3, dtype: int64 prediction =  1
after Train example  3 delta w is  [[-1.2 -0.4 -0.4]]
After epoch  0 weights are  [[-1.  -0.2 -0.2]]
Train ex =  0    1
1    0
2    0
Name: 0, dtype: int64 prediction =  -1
after Train example  0 delta w is  [[0. 0. 0.]]
Train ex =  0    1
1    0
2    1
Name: 1, dtype: int64 prediction =  -1
after Train example  1 delta w is  [[0. 0. 0.]]
Train ex =  0    1
1    1
2    0
Name: 2, dtype: int64 prediction =  -1
after Train example  2 delta w is  [[0. 0. 0.]]
Train ex =  0    1
1    1
2    1
Name: 3, dtype: int64 prediction =  -1
after Train example  

In [11]:
print('AND LOGIC perceptron\n')
for i in range(len(X_LOGIC)):
    prediction = predict(weights_AND,X_LOGIC.iloc[i,:])
    print(X_LOGIC.iloc[[i],[1,2]].to_string(index=False,header=False),end = ' output = ')
    print(prediction)

AND LOGIC perceptron

 0  0 output = -1
 0  1 output = -1
 1  0 output = -1
 1  1 output = -1


In [12]:
print('OR LOGIC perceptron\n')
for i in range(len(X_LOGIC)):
    prediction = predict(weights_OR,X_LOGIC.iloc[i,:])
    print(X_LOGIC.iloc[[i],[1,2]].to_string(index=False,header=False),end = ' output = ')
    print(prediction)

OR LOGIC perceptron

 0  0 output = 1
 0  1 output = 1
 1  0 output = 1
 1  1 output = 1
