In [1]:
import torch

In [5]:
# Creating a multilayer perceptron with two hidden layers

class NeuralNetwork(torch.nn.Module):
  def __init__(self, num_inputs, num_outputs):
    super().__init__()
    self.layers = torch.nn.Sequential(
        # First hidden layer
        torch.nn.Linear(num_inputs, 30),
        torch.nn.ReLU(),

        # Second hidden layer
        torch.nn.Linear(30, 20),
        torch.nn.ReLU(),

        # Output layer
        torch.nn.Linear(20, num_outputs)
    )

  def forward(self, x):
    logits = self.layers(x)
    return logits


In [7]:
# Instantiate a new neural network object
model = NeuralNetwork(50, 3)
print(model)

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)


In [9]:
print(model.layers[0].weight)
print(model.layers[0].weight.shape)

Parameter containing:
tensor([[-0.0851, -0.0246,  0.0369,  ..., -0.0976, -0.0555,  0.0996],
        [-0.1009, -0.0855, -0.1353,  ...,  0.1170, -0.0491, -0.1183],
        [-0.0175, -0.0101, -0.1335,  ..., -0.0947, -0.0027,  0.1162],
        ...,
        [-0.1405,  0.0285, -0.0759,  ...,  0.0632, -0.0161,  0.0787],
        [ 0.0219, -0.1214, -0.0889,  ..., -0.0393,  0.0574,  0.0632],
        [-0.1153,  0.0546, -0.0619,  ..., -0.0803, -0.0399,  0.0012]],
       requires_grad=True)
torch.Size([30, 50])


The model weights are initialised with small random numbers, which differ each time we instantiate the network. In deep learning, initialising model weights with small random numbers is desired to break symmetry during training. Otherwise, the nodes would be performing the same operations and updates during backpropagation, which would not allow the network to learn complex mappings from inputs to outputs.

However, while we want to keep using small random numbers as initial values for our layer weights, we can make the random number initialisation reproducible by seeding PyTorch’s random number generator via `manual_seed`:


In [11]:
torch.manual_seed(123)
model = NeuralNetwork(50, 3)
print(model.layers[0].weight)

Parameter containing:
tensor([[-0.0577,  0.0047, -0.0702,  ...,  0.0222,  0.1260,  0.0865],
        [ 0.0502,  0.0307,  0.0333,  ...,  0.0951,  0.1134, -0.0297],
        [ 0.1077, -0.1108,  0.0122,  ...,  0.0108, -0.1049, -0.1063],
        ...,
        [-0.0787,  0.1259,  0.0803,  ...,  0.1218,  0.1303, -0.1351],
        [ 0.1359,  0.0175, -0.0673,  ...,  0.0674,  0.0676,  0.1058],
        [ 0.0790,  0.1343, -0.0293,  ...,  0.0344, -0.0971, -0.0509]],
       requires_grad=True)


Now that we have spent some time inspecting the `NeuralNetwork` instance, let’s briefly see how it’s used via the forward pass:

In [17]:
torch.manual_seed(123)
x = torch.rand(1, 50)
# creates a PyTorch tensor x with random values drawn from a uniform
# distribution on the interval [0, 1). The tensor has 1 row and 50 columns,
# making it a 1x50 tensor

out = model(x) # Will automatically execute  the forward pass of the model.

print(out) # These three numbers returned here correspond to a score assigned
# to each of the three output nodes.

tensor([[-0.1262,  0.1080, -0.1792]], grad_fn=<AddmmBackward0>)


Notice that the output tensor also includes a `grad_fn` value.

Here, `grad_fn=<AddmmBackward0>` represents the last-used function to compute a variable in the computational graph. In particular, `grad_fn=<AddmmBackward0>` means that the tensor we are inspecting was created via a `matrix multiplication` and `addition` operation. PyTorch will use this information when it computes gradients during backpropagation.

The `<AddmmBackward0>` part of `grad_fn=<AddmmBackward0>` specifies the operation performed. In this case, it is an `Addmm` operation. `Addmm` stands for `matrix multiplication (mm)` followed by an `addition (Add)`.

If we just want to use a network without training or backpropagation—for example, if we use it for prediction after training—constructing this computational graph for backpropagation can be wasteful as it performs unnecessary computations and consumes additional memory. So, when we use a model for inference (for instance, making predictions) rather than training, the best practice is to use the `torch.no_grad()` context manager. This tells PyTorch that it doesn’t need to keep track of the gradients, which can result in significant savings in memory and computation:

In [18]:
with torch.no_grad():
  out = model(x)
print(out)

tensor([[-0.1262,  0.1080, -0.1792]])


`Logits` are the **raw**, **unnormalised** scores produced by the last layer of a neural network before applying any non-linear activation like softmax or sigmoid. These values can be positive or negative and are not constrained to a specific range.

In PyTorch, models often output **logits (raw scores** from the final layer) **rather than probabilities** for classification tasks.

In PyTorch, it’s common practice to code models such that they return the outputs of the last layer (`logits`) without passing them to a nonlinear activation function. That’s because PyTorch’s commonly used loss functions combine the `softmax` (or `sigmoid` for binary classification) operation with the negative log-likelihood loss in a single class. The reason for this is numerical efficiency and stability. So, if we want to compute class-membership probabilities for our predictions, we have to call the `softmax` function explicitly.

In [19]:
with torch.no_grad():
  out = torch.softmax(model(x), dim=1)
  print(out)

tensor([[0.3113, 0.3934, 0.2952]])
