# Learning XOR with Pytorch

This notebook provides a neural network that learns the XOR logic operation:

|   x1  |   x2  |  y  |
|:-----:|:-----:|:---:|
|   1   |   1   |  0  |
|   1   |   0   |  1  |
|   0   |   1   |  1  |
|   0   |   0   |  0  |

This example is will provide an insight into a simple neural network and, at the same time, a gentle introduction to [PyTorch](https://pytorch.org).

## Importing the libraries

In [None]:
# import libraries 
import torch
import torch.nn as nn
from torch.autograd import Variable

## Device

PyTorch can use both CPU or GPU, depending on what is available:

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

## Dataset

The training dataset is composed of the inputs and required outputs of the neural network. In this case, the inputs are represented in a [tensor](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html) of $4 \times 2$ dimension and the output (labels) in a $4 \times 1$ tensor.

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, tensors are used to encode the inputs and outputs of a model, as well as the model’s parameters. We are also using a [`Dataset`](https://pytorch.org/docs/1.10.0/data.html#iterable-style-datasets) with the `DataLoader` to iterate the training set.

In [None]:
from torch.utils.data import Dataset

class XorDataset(Dataset):
    def __init__(self):
        self.Xs = torch.Tensor([[0., 0.],
               [0., 1.],
               [1., 0.],
               [1., 1.]])
        self.y = torch.Tensor([0., 1., 1., 0.])
        
    def __len__(self):
        return len(self.Xs)

    def __getitem__(self, idx):
        image = self.Xs[idx]
        label = self.y[idx]
        return image, label
    


In [None]:
from torch.utils.data import DataLoader

training_data = XorDataset()
train_dataloader = DataLoader(training_data, batch_size=1, shuffle=False)

In [None]:
# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
feature = train_features[0]
label = train_labels[0]
print(f"Features: {feature}; Label: {label}")

## Neural Network

The neural network is defined by subclassing `nn.Module`, and initialize the neural network layers in `__init__`. Every `nn.Module` subclass implements the operations on input data in the `forward` method.

For the XOR network, we need two layers (refer to the slides).

In [None]:
class XOR(nn.Module):
    def __init__(self):
        super(XOR, self).__init__()
        self.linear_xor_stack = nn.Sequential(
            nn.Linear(2, 2),
            nn.Sigmoid(),
            nn.Linear(2, 1),
            nn.Sigmoid(),
        )

    def forward(self, x):
        logits = self.linear_xor_stack(x)
        return logits

The neural network is instanciated by creating the class and associating it to a processing device:

In [None]:
xor_network = XOR()
model = xor_network.to(device)
print(model)

After the definition of the dataset and model architecture, we can train the network. We're using backpropagation, as depicted below:

In [None]:
from torch.utils.data import DataLoader
all_losses=[]
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 1 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    all_losses.append(test_loss)
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")   

We are using Min Square Error loss function and Stochastic Gradient Descent for training with 2000 epochs.

In [None]:
learning_rate = 1
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 2000
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(train_dataloader, model, loss_fn)
print("Done!")

Let's plot the loss function:

In [None]:
import matplotlib.pyplot as plt

plt.plot(all_losses)
plt.ylabel('Loss')
plt.show()

We can now check the network parameters:

In [None]:
# show weights and bias
for name, param in xor_network.named_parameters():
    if param.requires_grad:
        print(name, param.data)

## Check the model

Let's check if the model behaves as expected:

In [None]:
# test input
input = torch.Tensor([[0., 0.],
               [0., 1.],
               [1., 0.],
               [1., 1.]])
out = xor_network.forward(input)
print(out.round())