In [5]:
import numpy as np

## Perceptron class 

This is the original Perceptron Class provided by the professor, our goal is to implement a new class adaline, and modify the corresponding fi method to implement the adaline learning rule and test it on the iris data set.

In [6]:
class Perceptron:
    def __init__(self, eta):
        self.w = None
        self.b = None
        self.eta = eta

    # Heaviside step activation function
    def activation(self, z):
        return np.heaviside(z, 0) # Returns 0 if z ≤ 0, else 1

    # Fit method to train the perceptron model
    def fit(self, X, y, epochs):
        n_features = X.shape[1]
        n_samples = X.shape[0]

        # Initializing weights, bias and predictions
        self.w = np.random.rand(n_features)
        self.b = 0
        y_pred = np.zeros(n_samples)

        # Iterating until the indicated number of epochs
        for epoch in range(epochs):
            change=0
            for i in range(n_samples):
                y_old = y_pred[i]
                # Computing the dot product between sample and weights and adding the bias:
                z = np.dot(X[i], self.w) + self.b 
                y_pred[i] = self.activation(z) # Passing through an activation function
                if y_old != y_pred[i]:
                    change += 1
                # Updating weights and bias using the error
                self.w = self.w + self.eta * (y[i] - y_pred[i]) * X[i]
                self.b = self.b + self.eta * (y[i] - y_pred[i])
            print(f"\t Epoch: {epoch +1} with {change} changes.")
            if not change:
                print(f"No changes. The perceptron model converged.")
                break

    def predict(self, X):
        z = np.dot(X, self.w) + self.b
        return self.activation(z)

    def get_params(self):
        return self.w, self.b


## Adaline Class

We're supposed to implement the learning rule of the adaline over the Perceptron class.

In [7]:
class Adaline(Perceptron):
    def fit(self, X, y, epochs):
        n_features = X.shape[1]
        n_samples = X.shape[0]

        # Initializing weights, bias
        self.w = np.random.rand(n_features)
        self.b = 0

        # Iterating until the indicated number of epochs
        for epoch in range(epochs):
            cost = 0
            for i in range(n_samples):
                # Computing the dot product between sample and weights and adding the bias:
                z = np.dot(X[i], self.w) + self.b
                # Error: difference between the true label and the predicted label
                error = y[i] - z
                
                # Updating weights and bias using the error
                self.w = self.w + self.eta * error * X[i]
                self.b = self.b + self.eta * error
                
                # Accumulate cost
                cost += 0.5 * error**2
                if cost == 0:
                    print(f"No changes. The Adaline model converged.")
            print(f"\t Epoch: {epoch + 1} - Cost: {cost:.4f}")
    
    def predict(self, X):
        # In ADALINE, your model is trained by minimizing MSE between the raw output z and the true label y.
        # That means:
        #     •	The model is trying to make z=0 when the true label is 0
        #     •	And z=1 when the true label is 1
        # That's why we need to change the threshold to 0.5
        z = np.dot(X, self.w) + self.b
        return (z >= 0.5).astype(int)  # Threshold at 0.5 instead of 0 like in the perceptron
    

## Iris Dataset

Let's run our custom models with the iris dataset to see how it performs and to experiment the different parameters.

In [8]:
from sklearn.datasets import load_iris
iris = load_iris()

Let's reduce the features to two, and make it a binary classification problem, so we classify one vs all.

In [9]:
X = iris.data[:, (0,1)] # Let's reduce the dimensionality to 2
y = (iris.target == 0).astype(int) # Let's make it a binary classification problem.

Let's scale and split the dataset

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, 
                                                shuffle=True, stratify=y)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Let's initialize and train the perceptron model and see the results.

In [11]:
np.random.seed(seed=42) # For reproducibility

# Initialize the perceptron model with learning rate η=1
model = Perceptron(1)

model.fit(X_train, y_train, 5)
w, b = model.get_params()
print(f"model weights:{w}, model bias:{b}")

y_train_predicted = model.predict(X_train)
y_test_predicted = model.predict(X_test)

	 Epoch: 1 with 46 changes.
	 Epoch: 2 with 6 changes.
	 Epoch: 3 with 1 changes.
	 Epoch: 4 with 2 changes.
	 Epoch: 5 with 0 changes.
No changes. The perceptron model converged.
model weights:[-3.11008597  1.64546991], model bias:-2.0


In [12]:
from sklearn.metrics import accuracy_score
print(f"train accuracy:{accuracy_score(y_train, y_train_predicted)}")
print(f"test accuracy:{accuracy_score(y_test, y_test_predicted)}")

train accuracy:1.0
test accuracy:1.0


Now let's test it on the Adaline Model

In [13]:
np.random.seed(seed=42) # For reproducibility

# Initialize the perceptron model with learning rate η=0.6
model = Adaline(0.01)

model.fit(X_train, y_train, 10)
w, b = model.get_params()
print(f"model weights:{w}, model bias:{b}")

y_train_predicted = model.predict(X_train)
y_test_predicted = model.predict(X_test)

	 Epoch: 1 - Cost: 28.9331
	 Epoch: 2 - Cost: 5.7398
	 Epoch: 3 - Cost: 3.6971
	 Epoch: 4 - Cost: 3.4944
	 Epoch: 5 - Cost: 3.4680
	 Epoch: 6 - Cost: 3.4630
	 Epoch: 7 - Cost: 3.4617
	 Epoch: 8 - Cost: 3.4614
	 Epoch: 9 - Cost: 3.4613
	 Epoch: 10 - Cost: 3.4612
model weights:[-0.29250365  0.24496291], model bias:0.3408293533098461


In [14]:
from sklearn.metrics import accuracy_score
print(f"train accuracy:{accuracy_score(y_train, y_train_predicted)}")
print(f"test accuracy:{accuracy_score(y_test, y_test_predicted)}")

train accuracy:0.9925925925925926
test accuracy:1.0


## Observed results

The main difference with the Perceptron, is that the Adaline does not use the heaviside step activation function in the training, but it calculates the error and tries to minimize it.

To create the Adaline class we had to modify the fit and predict methods to update what was said before. The results are that Adaline seems more sensitive to the learning rate changes. When looking up or asking LLMs they indeed tell me:

"In Adaline, the learning rate becomes more critical because the weight updates are proportional to the gradient of the error function. You need to ensure the learning rate is chosen carefully to avoid slow convergence or instability."