Why Do We Need Hidden Layers?

In the XOR problem, a simple linear model (without hidden layers) can’t solve the problem correctly, because XOR is a non-linear function. 

However, when we introduce hidden layers, we give the model more capacity to learn complex, non-linear patterns. 

Multi-layer Perceptrons (MLPs) can learn non-linear decision boundaries because of these hidden layers.

Structure of the Model:

Input Layer: Takes 2 features (X1 and X2) from the dataset.

Hidden Layer: A layer that applies non-linear transformations to the input. We’ll use an activation function like ReLU or Sigmoid here.

Output Layer: Produces a binary output (0 or 1 for XOR).

Steps:

Define a Neural Network: With one hidden layer.

Use Backpropagation: Update weights using gradients.

Train the Model: Use MSE Loss for the XOR dataset.

In [42]:
import torch

import torch.nn as nn

import torch.optim as optim

In [43]:
# Step 1: Create XOR Dataset

X = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]], dtype=torch.float32)

y = torch.tensor([[0.0], [1.0], [1.0], [0.0]], dtype=torch.float32)

In [44]:
# Step 2: Define a simple Neural Network with 1 hidden layer

class XORNetwork(nn.Module):

    def __init__(self):
        super(XORNetwork,self).__init__()

        # 2 input features -> 4 neurons in hidden layer -> 1 output

        self.fc1 = nn.Linear(2, 4)  # Input layer (2 features) to hidden layer (4 neurons)

        self.fc2 = nn.Linear(4, 1)  # Hidden layer (4 neurons) to output layer (1 neuron)

        self.sigmoid = nn.Sigmoid() # Activation function for output layer

    def forward(self,x):

        x = torch.relu(self.fc1(x)) # ReLU activation for hidden layer

        x = self.sigmoid(self.fc2(x)) # Sigmoid activation for output

        return x

In [45]:
# Step 3: Instantiate the model

model = XORNetwork()

In [46]:
# Step 4: Define a loss function and an optimizer

loss_fn = nn.MSELoss() # Mean Squared Error loss

optimizer = optim.SGD(model.parameters(), lr=0.1)

In [47]:
# Step 5: Train the model using backpropagation

epochs = 10000 # Number of training iterations

for epoch in range(epochs):

        # Forward pass: Compute predicted y by passing X to the model

        predictions = model(X)

        # Compute the loss (error)

        loss = loss_fn(predictions, y)

        # Zero the gradients before backward pass

        optimizer.zero_grad()

        # Backward pass: Compute gradients

        loss.backward()

        # Update weights and biases using gradient descent

        optimizer.step()

        # Print loss every 1000 epochs to monitor progress

        if epoch % 1000 == 0:
                
                    print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item()}')

Epoch 1/10000, Loss: 0.2650403082370758
Epoch 1001/10000, Loss: 0.18173818290233612
Epoch 2001/10000, Loss: 0.16746756434440613
Epoch 3001/10000, Loss: 0.1669771671295166
Epoch 4001/10000, Loss: 0.16685958206653595
Epoch 5001/10000, Loss: 0.16679951548576355
Epoch 6001/10000, Loss: 0.16676753759384155
Epoch 7001/10000, Loss: 0.166743665933609
Epoch 8001/10000, Loss: 0.16673122346401215
Epoch 9001/10000, Loss: 0.16672486066818237


In [48]:
# Inspect learned weights and bias for each layer

learned_weights_fc1 = model.fc1.weight.data
learned_bias_fc1 = model.fc1.bias.data

learned_weights_fc2 = model.fc2.weight.data
learned_bias_fc2 = model.fc2.bias.data

print(f"Learned Weights (Input → Hidden Layer):\n{learned_weights_fc1}")
print(f"Learned Bias (Hidden Layer):\n{learned_bias_fc1}")

print(f"Learned Weights (Hidden → Output Layer):\n{learned_weights_fc2}")
print(f"Learned Bias (Output Layer):\n{learned_bias_fc2}")


Learned Weights (Input → Hidden Layer):
tensor([[ 0.2998,  0.2307],
        [-0.6119, -0.6878],
        [-0.5591, -0.4051],
        [-1.9148,  1.9110]])
Learned Bias (Hidden Layer):
tensor([-0.5309, -0.3873, -0.2462, -0.0032])
Learned Weights (Hidden → Output Layer):
tensor([[0.0307, 0.3604, 0.0539, 2.6001]])
Learned Bias (Output Layer):
tensor([-0.6918])


In [49]:
# Step 6: Final predictions after training

with torch.no_grad(): # Disable gradient computation for inference

    final_predictions = model(X)

    print(f'Final Predctions \n')

    print(predictions)

Final Predctions 

tensor([[0.3336],
        [0.9862],
        [0.3336],
        [0.3336]], grad_fn=<SigmoidBackward0>)
