Can run in Colab
Version: Jan 2024

# Concise Implementation of Linear Regression

## Generating the Dataset

To start, we will generate the same dataset as in :numref:`sec_linear_scratch`.


In [22]:
import numpy as np
import torch
from torch.utils import data

In [23]:
def synthetic_data(w, b, num_examples):
    """Generate y = Xw + b + noise.
        Return X and y"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b  #torch.mv does not broadcast so use matmul
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

In [24]:
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

## Reading the Dataset

Rather than rolling our own iterator,
we can call upon the existing API in a framework to read data.
We pass in `features` and `labels` as arguments and specify `batch_size`
when instantiating a data iterator object.
Besides, the boolean value `is_train`
indicates whether or not
we want the data iterator object to shuffle the data
on each epoch (pass through the dataset).


In [25]:
def load_array(data_arrays, batch_size, is_train=True):
    """Construct a PyTorch data iterator."""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

In [26]:
batch_size = 10
data_iter = load_array((features, labels), batch_size)

In [27]:
print(type(data_iter))
print(type(iter(data_iter)))
print(next(iter(data_iter)))

<class 'torch.utils.data.dataloader.DataLoader'>
<class 'torch.utils.data.dataloader._SingleProcessDataLoaderIter'>
[tensor([[-0.0039, -1.5041],
        [ 1.3929,  0.2728],
        [ 0.1892, -0.0184],
        [-1.0933,  0.5864],
        [ 1.2372,  0.6163],
        [-1.0500,  0.3037],
        [-0.2264, -0.8773],
        [-0.0738, -0.1811],
        [-1.8291, -0.3908],
        [-0.7532, -0.5771]]), tensor([[9.2941],
        [6.0587],
        [4.6453],
        [0.0245],
        [4.5793],
        [1.0715],
        [6.7165],
        [4.6761],
        [1.8675],
        [4.6562]])]


In [29]:
test_list = [5,3,7,0]
iter(test_list)

iter(9)

TypeError: 'int' object is not iterable

Now we can use `data_iter` in much the same way as we called
the `data_iter` function in :numref:`sec_linear_scratch`.
To verify that it is working, we can read and print
the first minibatch of examples.
Comparing with :numref:`sec_linear_scratch`,
here we use `iter` to construct a Python iterator and use `next` to obtain the first item from the iterator.


## Defining the Model

For standard operations, we can use a framework's predefined layers,
which allow us to focus especially
on the layers used to construct the model
rather than having to focus on the implementation.

### nn.Sequential
We will first define a model variable `net`,
which will refer to an instance of the [`Sequential` class](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html).

**The `Sequential` class defines an ordered container
of modules.** The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network with several layers that will be chained together

Given input data, a `Sequential` instance passes it through
the first layer, in turn passing the output
as the second layer's input and so forth.
In the following example, our model consists of only one layer,
so we do not really need `Sequential`.
But since nearly all of our future models
will involve multiple layers,
we will use it anyway just to familiarize you
with the most standard workflow.


### nn.Linear
In PyTorch, the fully-connected layer is defined in the [`Linear` class](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html). This module applies a linear transformation on the input using its stored weights and biases.

We pass two arguments into `nn.Linear`. The first one specifies the input feature dimension, which is 2, and the second one is the output feature dimension, which is a single scalar and therefore 1.


In [32]:
# `nn` is an abbreviation for neural networks
from torch import nn

# linear regression with 2 features
#Define and initialise model
net = nn.Sequential(nn.Linear(in_features=2, out_features=1))

# nn.ReLU()

print(type(net))
print(net)

<class 'torch.nn.modules.container.Sequential'>
Sequential(
  (0): Linear(in_features=2, out_features=1, bias=True)
)


## Initializing Model Parameters

Before using `net`, we need to initialize the model parameters.

Here we specify that

-  each weight parameter should be randomly sampled from a normal distribution with mean 0 and standard deviation 0.01.
-  The bias parameter will be initialized to zero.


As we have specified the input and output dimensions when constructing `nn.Linear`,
now we can access the parameters directly to specify their initial values.
We first locate the layer by `net[0]`, which is the first layer in the network,
and then use the `weight.data` and `bias.data` methods to access the parameters.
Next we use the replace methods `normal_` and `fill_` to overwrite parameter values.


In [34]:
net[0]
print(type(net[0]))

<class 'torch.nn.modules.linear.Linear'>


In [35]:
#Initialise weights
net[0].weight.data.normal_(0, 0.01)

#Initialise bias
net[0].bias.data.fill_(0)

#print out initialised weight and bias
print(net[0].weight.data)
print(net[0].bias.data)

tensor([[ 0.0100, -0.0092]])
tensor([0.])


## Defining the Loss Function


The `MSELoss` class computes the mean squared error (without the $1/2$ factor in :eqref:`eq_mse`).
By default it returns the average loss over examples.


In [36]:
#set loss to MSE
# Initialise loss function
loss = nn.MSELoss()

## Defining the Optimization Algorithm


Minibatch stochastic gradient descent is a standard tool
for optimizing neural networks
and thus PyTorch supports it alongside a number of
variations on this algorithm in the `optim` module.
When we instantiate an `SGD` instance,
we will specify the parameters to optimize over
(obtainable from our net via `net.parameters()`), with a dictionary of hyperparameters
required by our optimization algorithm.
Minibatch stochastic gradient descent just requires that
we set the value `lr`.


In [37]:
#Choose SGD with learning rate, say, 0.03
#Initialise SGD
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

print(net.parameters())
print(net.parameters)
test_net_param = net.parameters()
next(test_net_param)

<generator object Module.parameters at 0x117c31dd0>
<bound method Module.parameters of Sequential(
  (0): Linear(in_features=2, out_features=1, bias=True)
)>


Parameter containing:
tensor([[ 0.0100, -0.0092]], requires_grad=True)

## Training

The training loop itself is strikingly similar
to what we did when implementing everything from scratch.

To refresh your memory: for some number of epochs,
we will make a complete pass over the dataset (`train_data`),
iteratively grabbing one minibatch of inputs
and the corresponding ground-truth labels.
For each minibatch, we go through the following ritual:

* Generate predictions by calling `net(X)` and calculate the loss `l` (the forward pass/propagation).
* Calculate gradients by running the backpropagation.
* Update the model parameters by invoking our optimizer.

For good measure, we compute the loss after each epoch and print it to monitor progress.


In [15]:
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        #===forward pass============
        l = loss(net(X) ,y)

        #===backward pass==============
        trainer.zero_grad()    #reset gradient before backprop
        l.backward()           #computed gradients using backprop

        #=update params================
        trainer.step()
    #===log results after each epoch. Loss computed using all training set============
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

epoch 1, loss 0.000261
epoch 2, loss 0.000100
epoch 3, loss 0.000101


Below, we compare the model parameters learned by training on finite data
and the actual parameters that generated our dataset.
To access parameters,
we first access the layer that we need from `net`
and then access that layer's weights and bias.
As in our from-scratch implementation,
note that our estimated parameters are
close to their ground-truth counterparts.


In [38]:
#Compare true to esimtated parameters
w = net[0].weight.data
print('error in estimating w:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('error in estimating b:', true_b - b)

error in estimating w: tensor([ 1.9900, -3.3908])
error in estimating b: tensor([4.2000])


In [39]:
next(net.parameters())

Parameter containing:
tensor([[ 0.0100, -0.0092]], requires_grad=True)

## Summary


* Using PyTorch's high-level APIs, we can implement models much more concisely.
* In PyTorch, the `data` module provides tools for data processing, the `nn` module defines a large number of neural network layers and common loss functions.
* We can initialize the parameters by replacing their values with methods ending with `_`.


## Activity
Things to try:

1. In SGD can we start with backpass? Try it.
1. In training, split the single line of code that estimates the model and obtains the loss into 2 separate lines of code: 1 to give output, and one to compute the loss based on that output.
1. If we replace `nn.MSELoss(reduction='sum')` with `nn.MSELoss()`, how can we change the learning rate for the code to behave identically. Why?
1. Review the PyTorch documentation to see what loss functions and initialization methods are provided. Replace the loss by some other loss like [absolute(L1)](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) or [Huber loss](https://pytorch.org/docs/stable/generated/torch.nn.HuberLoss.html).


[Discussions](https://discuss.d2l.ai/t/45)
