<a href="https://colab.research.google.com/github/werowe/HypatiaAcademy/blob/master/ml/perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Perceptron

This code and the graphics are copied from https://towardsdatascience.com/first-neural-network-for-beginners-explained-with-code-4cfd37e06eaf. by Arthur Arnx writing in Towards Data Science

Each circle is a **neuron**.  Read from left to right we have the input layer, which is the training data.  The middle layers are the **hidden layers**.  The rightmost layer is the **output** layer. The output are scalara or probabilities. In this case here are two.  This network would work for a binary classification problem.  An example of that is what we describe below.

![](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*v1ohAG82xmU6WGsG2hoE8g.png)


#The XOR Problem
The XOR (**Exclusive Or**) problem is a class problem in neural networks.  It shows why the activation function cannot be linear. Because if it was the network would only work with problems with linear solutions.

XOR is the exclusive OR.  That means A XOR B is true when A or B is true and not both.  

We have four possibilities, where 1 is true and 0 is false.

| A      | B      | XOR |
| ------------- | ------------- |  ------------- |
| 1| 1 | 0 |
| 0| 1| 1 |
| 1 | 0 | 1 |
| 0| 0 | 1 |


# Neuron
You can think of the neural network of the solution to the Y = W X + b problem where W are the coefficient in the model.  Except W in this model, unlike linear regression, are functions.  It is easiest to this of them as compound functions like g(f(x)) where each hidden layer is one more level in this nesting.

It solves the network by muliplying the inputs x1, x2, x3 by the weights w1 and w2 and w3.  w1 and w2 are bias and we is weight.m x1 and x2 are inputs are x3 is observed output (i.e., y)

 Initially w1 , w2, w3 are just guesses.  This process repeats itself (called **backprogatopm** by adjusting the weights until they reach their minimum value.  This is usally dones by taking a partial derivative and looking for where the derivative is 0 or closest to zero, which is the **stochastic gradient solution** technique.  Each pass through the network moves the value of the weight in the opposite direction of the gradient.

$$\begin{bmatrix} 1 & 1 & 0 \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} = 1 \cdot w_1 + 1 \cdot w_2 + 0 \cdot w_3 = w_1 + w_2 $$


![](https://miro.medium.com/v2/resize:fit:1302/format:webp/1*UA30b0mJUPYoPvN8yJr2iQ.jpeg)

#Activation Function
This is a function that takes the value above and turns it into a number between 0 and 1.  

Frabcois Chollet, wrote wrote Keras, says, "It introduces non-linearity into the model, allowing it to learn complex patterns. Common activation functions include ReLU, sigmoid, and softmax."



![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*JHWL_71qml0kP_Imyx4zBg.png)

#Loss Function

Frabcois Chollet:  A **loss function** is a mathematical function used in machine learning to measure the difference between the predicted output of a model and the actual target values. It quantifies how well or poorly the model is performing. The goal of training a model is to minimize the loss function. The **sparse_categorical_crossentropy loss function** is used in machine learning for multi-class classification problems where the target labels are integers (i.e., not one-hot encoded).

#Learning Rate
This means how much to adjust the weights based upon the loss.  A larger number will make the model conver quicker, but it that has problems as it could skip over function that have more than one perceived minimum this picking the wrong minimum thus leading to the wrong answer.  The very small learing rate would mean the model takes longer to converge.  

#Make a Single Perceptron

Here we take Arthur Arnx's example solution to the XOR problem and add explanations to make it easier to understand.  It has three neurons and no hidden layer. So it's 1 or 0 going in and 1 or 0 going out.  



For a single-layer perceptron to correctly solve the XOR problem, it must be capable of learning non-linearly separable functions, which a single-layer perceptron cannot do. However, for the AND and OR functions, a single-layer perceptron can solve these problems.

#Explanation

1. AND Gate Training Data:

Inputs and outputs are defined for the AND problem.

2. Parameters:

* Learning rate, bias, and number of epochs are set.
* Weights Initialization:

Weights are initialized randomly.
3. Training Loop:

* For each epoch:
  * Iterate over all training examples.
  * Compute the weighted sum and apply the activation function.
  * Calculate the error and update the weights.
  * Track the total error for the epoch.
* Print the error every 100 epochs to monitor training progress.
* Stop training if the total error becomes zero.

5. Testing Function:

* test_perceptron(input1, input2): Performs a forward pass using the learned weights and applies the Heaviside step function to predict the output.

6. Testing the Model:

* The code tests the model with various inputs and compares the predicted output with the expected output.
* It prints the test input, expected output, and predicted output, and checks if the prediction is correct.

##Running the Test
By running the complete code, you can train the perceptron and test it with different inputs for the AND gate problem. The output will indicate whether the model correctly predicts the output for each test input. This way, you can verify the functionality of the perceptron with the AND gate.

In [19]:
import random
import numpy as np

# Define the inputs and outputs for the AND problem
inputs = np.array([[0, 0],
                   [0, 1],
                   [1, 0],
                   [1, 1]])

outputs = np.array([0, 0, 0, 1])

# Initialize parameters
lr = 0.1  # Learning rate
bias = 1
epochs = 1000

# Initialize weights randomly
weights = [random.random(), random.random(), random.random()]

# Training loop
for epoch in range(epochs):
    total_error = 0
    for i in range(len(inputs)):
        input1 = inputs[i][0]
        input2 = inputs[i][1]
        output = outputs[i]

        # Forward pass: calculate weighted sum of inputs and bias
        outputP = input1 * weights[0] + input2 * weights[1] + bias * weights[2]

        # Activation function: Heaviside step function
        if outputP > 0:
            outputP = 1
        else:
            outputP = 0

        # Calculate error
        error = output - outputP
        total_error += abs(error)

        # Update weights
        weights[0] += error * input1 * lr
        weights[1] += error * input2 * lr
        weights[2] += error * bias * lr

    # Print error at each epoch
    if epoch % 100 == 0:
        print(f"Epoch {epoch+1} \t Error: {total_error}")

    # Check for convergence
    if total_error == 0:
        print("Converged!")
        break

print("Final weights:", weights)

# Define a function to test the model with a given input
def test_perceptron(input1, input2):
    # Forward pass: calculate weighted sum of inputs and bias
    outputP = input1 * weights[0] + input2 * weights[1] + bias * weights[2]

    # Activation function: Heaviside step function
    if outputP > 0:
        outputP = 1
    else:
        outputP = 0

    return outputP

# Test the model with various inputs
test_cases = [
    (0, 0, 0),
    (0, 1, 0),
    (1, 0, 0),
    (1, 1, 1)
]

for test_input1, test_input2, expected_output in test_cases:
    predicted_output = test_perceptron(test_input1, test_input2)
    print(f"Test input: [{test_input1}, {test_input2}]")
    print(f"Expected output: {expected_output}")
    print(f"Predicted output: {predicted_output}")
    if predicted_output == expected_output:
        print("The model correctly predicts the output.\n")
    else:
        print("The model does not correctly predict the output.\n")


Epoch 1 	 Error: 3
Converged!
Final weights: [0.414495426706317, 0.11419809765335864, -0.49757664145338365]
Test input: [0, 0]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [0, 1]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [1, 0]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [1, 1]
Expected output: 1
Predicted output: 1
The model correctly predicts the output.



# Use Sigmoid

Using the sigmoid function as the activation function requires some adjustments to both the training and testing phases. The sigmoid function outputs values between 0 and 1, so we will need to use a threshold to decide whether the perceptron's output is closer to 0 or

In [20]:
# Use sigmoid function

import random
import numpy as np

# Define the sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Define the derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Define the inputs and outputs for the AND problem
inputs = np.array([[0, 0],
                   [0, 1],
                   [1, 0],
                   [1, 1]])

outputs = np.array([0, 0, 0, 1])

# Initialize parameters
lr = 0.1  # Learning rate
bias = 1
epochs = 10000

# Initialize weights randomly
weights = [random.random(), random.random(), random.random()]

# Training loop
for epoch in range(epochs):
    total_error = 0
    for i in range(len(inputs)):
        input1 = inputs[i][0]
        input2 = inputs[i][1]
        output = outputs[i]

        # Forward pass: calculate weighted sum of inputs and bias
        weighted_sum = input1 * weights[0] + input2 * weights[1] + bias * weights[2]

        # Apply sigmoid activation function
        outputP = sigmoid(weighted_sum)

        # Calculate error
        error = output - outputP
        total_error += abs(error)

        # Calculate delta (error * sigmoid_derivative)
        delta = error * sigmoid_derivative(outputP)

        # Update weights
        weights[0] += delta * input1 * lr
        weights[1] += delta * input2 * lr
        weights[2] += delta * bias * lr

    # Print error at each epoch
    if epoch % 1000 == 0:
        print(f"Epoch {epoch+1} \t Error: {total_error}")

    # Check for convergence
    if total_error < 1e-5:  # Convergence criteria
        print("Converged!")
        break

print("Final weights:", weights)

# Define a function to test the model with a given input
def test_perceptron(input1, input2):
    # Forward pass: calculate weighted sum of inputs and bias
    weighted_sum = input1 * weights[0] + input2 * weights[1] + bias * weights[2]

    # Apply sigmoid activation function
    outputP = sigmoid(weighted_sum)

    # Use a threshold to determine the final output (0 or 1)
    if outputP > 0.5:
        return 1
    else:
        return 0

# Test the model with various inputs
test_cases = [
    (0, 0, 0),
    (0, 1, 0),
    (1, 0, 0),
    (1, 1, 1)
]

for test_input1, test_input2, expected_output in test_cases:
    predicted_output = test_perceptron(test_input1, test_input2)
    print(f"Test input: [{test_input1}, {test_input2}]")
    print(f"Expected output: {expected_output}")
    print(f"Predicted output: {predicted_output}")
    if predicted_output == expected_output:
        print("The model correctly predicts the output.\n")
    else:
        print("The model does not correctly predict the output.\n")


Epoch 1 	 Error: 2.4556852197233354
Epoch 1001 	 Error: 0.6294930979876645
Epoch 2001 	 Error: 0.43547063092688776
Epoch 3001 	 Error: 0.34745748332905735
Epoch 4001 	 Error: 0.29564468027080426
Epoch 5001 	 Error: 0.2608602695241691
Epoch 6001 	 Error: 0.23557525167747478
Epoch 7001 	 Error: 0.21618945841715168
Epoch 8001 	 Error: 0.20074820148190664
Epoch 9001 	 Error: 0.1880910777666297
Final weights: [5.482220333554342, 5.481856657916964, -8.31506764646324]
Test input: [0, 0]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [0, 1]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [1, 0]
Expected output: 0
Predicted output: 0
The model correctly predicts the output.

Test input: [1, 1]
Expected output: 1
Predicted output: 1
The model correctly predicts the output.

