<a href="https://colab.research.google.com/github/xhxuciedu/CS284A/blob/master/simple_neural_net_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## A basic example of a neural network for binary classification implemented in Python from scratch.

Let's build a simple neural network for binary classification using the sigmoid activation function and binary cross-entropy loss.

1. **Binary Cross-Entropy Loss**: It is commonly used for binary classification
problems. The formula is:

$L(y, \hat{y}) = -\left( y \log(\hat{y}) + (1-y) \log(1-\hat{y}) \right)$




2.  **Sigmoid Activation**: Given an input \( x \), the sigmoid function returns a value between 0 and 1:

$\sigma(x) = \frac{1}{1 + e^{-x}}$


3.  **Gradient Descent**: We'll use gradient descent to update the weights and biases to minimize the loss.

Here's the neural network implemented in Python:

In [3]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

def binary_cross_entropy_loss(y_true, y_pred):
    epsilon = 1e-15  # To prevent log(0)
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))


In [4]:
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)
        self.bias_hidden = np.zeros((1, hidden_size))
        self.bias_output = np.zeros((1, output_size))

    def feedforward(self, X):
        self.hidden = sigmoid(np.dot(X, self.weights_input_hidden) + self.bias_hidden)
        self.output = sigmoid(np.dot(self.hidden, self.weights_hidden_output) + self.bias_output)
        return self.output

    def backpropagation(self, X, y, learning_rate):
        d_loss_d_output = -(y / (self.output + 1e-15) - (1 - y) / (1 - self.output + 1e-15))
        d_output_d_z = sigmoid_derivative(self.output)
        hidden_layer_error = d_loss_d_output.dot(self.weights_hidden_output.T)
        d_hidden_d_z = sigmoid_derivative(self.hidden)

        self.weights_hidden_output -= learning_rate * self.hidden.T.dot(d_loss_d_output * d_output_d_z)
        self.bias_output -= learning_rate * np.sum(d_loss_d_output * d_output_d_z, axis=0)
        self.weights_input_hidden -= learning_rate * X.T.dot(hidden_layer_error * d_hidden_d_z)
        self.bias_hidden -= learning_rate * np.sum(hidden_layer_error * d_hidden_d_z, axis=0)

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            y_pred = self.feedforward(X)
            self.backpropagation(X, y, learning_rate)

            if epoch % 1000 == 0:
                loss = binary_cross_entropy_loss(y, y_pred)
                print(f"Epoch {epoch}, Loss: {loss:.4f}")

In [18]:
# Example data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Create and train the neural network
np.random.seed(42)
nn = NeuralNetwork(input_size=2, hidden_size=5, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

Epoch 0, Loss: 1.3156
Epoch 1000, Loss: 0.0153
Epoch 2000, Loss: 0.0057
Epoch 3000, Loss: 0.0035
Epoch 4000, Loss: 0.0026
Epoch 5000, Loss: 0.0021
Epoch 6000, Loss: 0.0017
Epoch 7000, Loss: 0.0015
Epoch 8000, Loss: 0.0013
Epoch 9000, Loss: 0.0011


In [19]:
y

array([[0],
       [1],
       [1],
       [0]])

In [20]:
nn.feedforward(X).flatten()>0.5

array([False,  True,  True, False])

## Some Comments
*  This script creates a simple neural network that can learn to solve the XOR
problem—a problem where the output is true only when the inputs are different.

*  Note that this example omits many aspects of a robust neural network, such as regularization, proper initialization strategies, advanced optimization techniques, and batch processing.

*  The neural network defined here uses full-batch gradient descent, which means it updates weights using the gradients calculated on the entire dataset. For large datasets, mini-batch gradient descent or stochastic gradient descent is typically used.