# Intro to Neural Networks
Building a simple neural network in Python is a great way to understand the fundamentals of deep learning. We'll use the NumPy library for numerical operations, which keeps things straightforward without relying on higher-level frameworks initially.
Here, we'll create a basic feedforward neural network with one hidden layer to solve a simple classification problem (e.g., the XOR problem).

## Conceptual Overview
1. **Input Layer:** Receives the data.
2. **Hidden Layer(s):** Performs computations and learns patterns.
3. **Output Layer:** Produces the final prediction.

## Key Components:
- **Weights and Biases:** Parameters that the network learns during training. Weights determine the strength of the connection between neurons, and biases shift the activation function.
- **Activation Function:** Introduces non-linearity into the network, allowing it to learn complex relationships. Common ones include Sigmoid, ReLU, and Tanh. For this example, we'll use Sigmoid.
- **Forward Propagation:** The process of passing input data through the network to get an output.
- **Loss Function:** Measures how well the network's predictions match the actual values (e.g., Mean Squared Error).
- **Backpropagation:** The core algorithm for training neural networks. It calculates the gradients of the loss function with respect to the weights and biases, indicating how much to adjust them.
- **Gradient Descent:** An optimization algorithm that uses the gradients to iteratively update the weights and biases to minimize the loss.


# Implementation in Python

In [1]:
import numpy as np

## 1. Define the Sigmoid Activation Function and its Derivative 
$$\sigma(x) = \frac{1}{1+e^{-x}}$$
$$\frac{d}{dx}\sigma(x) = x(1-x)$$

In [8]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

## 2. Prepare the Dataset (XOR Problem)

In [9]:
# Input dataset
X = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])

# Output dataset
y = np.array([[0],
              [1],
              [1],
              [0]])

## 3. Set up Network Parameters

In [20]:
# Seed for reproducibility
np.random.seed(1)

# Number of neurons in the input layer
input_neurons = X.shape[1] # 2 features

# Number of neurons in the hidden layer
hidden_neurons = 50 # You can experiment with this number

# Number of neurons in the output layer
output_neurons = y.shape[1] # 1 output

# Initialize weights and biases randomly with mean 0
# Weights for input to hidden layer
weights_input_hidden = 2 * np.random.random((input_neurons, hidden_neurons)) - 1
# Biases for hidden layer
bias_hidden = np.zeros((1, hidden_neurons))

# Weights for hidden to output layer
weights_hidden_output = 2 * np.random.random((hidden_neurons, output_neurons)) - 1
# Biases for output layer
bias_output = np.zeros((1, output_neurons))

## 4. Training the Neural Network

In [21]:
epochs = 10000 # Number of iterations
learning_rate = 0.1 # How much to adjust weights each time

print("Training the Neural Network...")
for epoch in range(epochs):
    
    # --- Forward Propagation ---
    # Layer 1 (Hidden Layer)
    # Dot product of X and weights_input_hidden + bias
    layer1_input = np.dot(X, weights_input_hidden) + bias_hidden
    # Apply sigmoid activation
    layer1_output = sigmoid(layer1_input)

    # Layer 2 (Output Layer)
    # Dot product of layer1_output and weights_hidden_output + bias
    layer2_input = np.dot(layer1_output, weights_hidden_output) + bias_output
    # Apply sigmoid activation (our prediction)
    predicted_output = sigmoid(layer2_input)

    # --- Backpropagation ---
    # Calculate the error for the output layer
    # Error is the difference between actual (y) and predicted
    output_error = y - predicted_output
    # Apply the derivative of the sigmoid to the error
    output_delta = output_error * sigmoid_derivative(predicted_output)

    # Calculate the error for the hidden layer
    # How much did the hidden layer weights contribute to the output error?
    hidden_error = output_delta.dot(weights_hidden_output.T)
    # Apply the derivative of the sigmoid to the hidden layer error
    hidden_delta = hidden_error * sigmoid_derivative(layer1_output)

    # --- Update Weights and Biases ---
    # Adjust weights and biases based on the deltas and learning rate
    weights_hidden_output += layer1_output.T.dot(output_delta) * learning_rate
    bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate

    weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
    bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

    # Optional: Print loss every few epochs to monitor progress
    if epoch % 1000 == 0:
        loss = np.mean(np.abs(output_error))
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

print("\nTraining complete!")
print(f"Final Loss: {np.mean(np.abs(y - predicted_output)):.4f}")


Training the Neural Network...
Epoch 0, Loss: 0.5025
Epoch 1000, Loss: 0.3922
Epoch 2000, Loss: 0.1850
Epoch 3000, Loss: 0.1094
Epoch 4000, Loss: 0.0796
Epoch 5000, Loss: 0.0639
Epoch 6000, Loss: 0.0542
Epoch 7000, Loss: 0.0475
Epoch 8000, Loss: 0.0426
Epoch 9000, Loss: 0.0388

Training complete!
Final Loss: 0.0358


## 5. Make Predictions (Test the network)

In [22]:
print("\nPredictions after training:")

# You can use the same forward propagation logic to test with new data
def predict(X_new):
    layer1_input_test = np.dot(X_new, weights_input_hidden) + bias_hidden
    layer1_output_test = sigmoid(layer1_input_test)

    layer2_input_test = np.dot(layer1_output_test, weights_hidden_output) + bias_output
    predicted_output_test = sigmoid(layer2_input_test)
    return predicted_output_test

# Test with our training data to see how well it learned
predictions = predict(X)
print("Input:\n", X)
print("Actual Output:\n", y)
print("Predicted Output (Raw):\n", predictions)
print("Predicted Output (Rounded):\n", np.round(predictions))

# You can also test with a single new input
new_input = np.array([[0, 1]])
new_prediction = predict(new_input)
print(f"\nPrediction for input {new_input}: {new_prediction[0][0]:.4f} (Rounded: {np.round(new_prediction)[0][0]})")



Predictions after training:
Input:
 [[0 0]
 [0 1]
 [1 0]
 [1 1]]
Actual Output:
 [[0]
 [1]
 [1]
 [0]]
Predicted Output (Raw):
 [[0.03]
 [0.96]
 [0.96]
 [0.04]]
Predicted Output (Rounded):
 [[0.]
 [1.]
 [1.]
 [0.]]

Prediction for input [[0 1]]: 0.9645 (Rounded: 1.0)


# Explanation of the Code:
1. **sigmoid(x) and sigmoid_derivative(x):**
- The sigmoid function squashes any input value between 0 and 1. This is useful for probabilities or when you need a smooth, differentiable activation.
- Its derivative is crucial for backpropagation, as it tells us the "slope" of the activation function at a given point, which helps determine how much to adjust the weights.
  
2. **Dataset (X and y):**
- X: Our input data, representing the four possible combinations for the XOR problem ([0,0], [0,1], [1,0], [1,1]).
- y: The corresponding desired output for each input (0 for [0,0], 1 for [0,1], 1 for [1,0], 0 for [1,1]).
  
3. **Network Parameters:**
- np.random.seed(1): Ensures that the random weight initialization is the same every time you run the code, making results reproducible.
- input_neurons, hidden_neurons, output_neurons: Define the architecture of our network.
- weights_input_hidden, bias_hidden, weights_hidden_output, bias_output: These are the learnable parameters. They are initialized randomly (weights) or to zeros (biases) to break symmetry and allow the network to learn different features. The 2 * np.random.random(...) - 1 initializes weights between -1 and 1.

4. **Training Loop (for epoch in range(epochs)):**
- **epochs:** The number of times the entire dataset is passed forward and backward through the neural network. More epochs generally lead to better learning, but too many can lead to overfitting.
- **learning_rate:** Controls the step size during weight updates. A smaller learning rate means slower but potentially more stable learning. A larger learning rate can speed up training but might overshoot the optimal solution.
- **Forward Propagation:**
  - layer1_input: The weighted sum of inputs plus the bias for the hidden layer. This is Z = WX + B.
  - layer1_output: The result of applying the sigmoid activation function to layer1_input. This is A = sigmoid(Z).
  - The same process is repeated for the output layer.
- **Backpropagation:**
  - **output_error:** The difference between what the network should have predicted (y) and what it actually predicted (predicted_output). This is our primary error signal.
  - **output_delta:** This is the error scaled by the derivative of the output layer's activation function. This tells us how much to change the output layer's weights.
  - **hidden_error:** To calculate the error for the hidden layer, we backpropagate the output_delta through the weights_hidden_output. This tells us how much each hidden neuron contributed to the output error.
  - **hidden_delta:** Similar to output_delta, but for the hidden layer.
- **Update Weights and Biases:**
  - weights_hidden_output += ...: We update the weights by adding a fraction (learning_rate) of the dot product of the previous layer's output (which is the input to these weights) and the delta for the current layer.
  - bias_output += ...: Biases are updated by summing the delta values and multiplying by the learning rate. This effectively pushes the activation up or down.
  - The same update logic applies to weights_input_hidden and bias_hidden.

5. **Making Predictions:**
- After training, the predict function essentially performs forward propagation using the learned weights and biases on new input data.
