# Linear Regression Implementation from Scratch

We are going to implement all parts of linear regression: the data pipeline, the model, the loss function, and the gradient descent optimizer, from scratch. We will only rely on Pytorch's ``tensor`` and ``autograd`` packages.

In [None]:
# import packages

## Getting Data Sets

We will construct a simple artificial data set where the number of features are set to be 2 and the number of examples to 1000. $X \in \mathbb{R}^{1000 \times 2}$. We will synthesize our data by sampling each point $\mathbf{x}_i$ from a Gaussian distribution. Moreover, we will assume that the linearity assumption holds with true underlying parameters $\mathbf{w} = [2, 3.4]^{\top}$ and $b = 4.2$. Thus our synthetic labels will be given according to the following linear model which includes a noise term $\epsilon$ to account for measurement errors on the features and labels: $$\mathbf{y} = \mathbf{X} \mathbf{w} + b + \epsilon$$.

In [None]:
# code that generates our synthetic data set
def synthetic_data(w, b, num_examples):
    # insert your code here
    return X, y

true_w = # insert your code here
true_b = # insert your code here
num_examples = # insert your code here

features, labels = # insert your code here

By generating a scatter plot using one of the features and labels, we can clearly observe the linear relationship between the two.

In [None]:
# insert your code here

Recall that training models, consists of making multiple passes over the dataset, grabbing one mini-batch of examples at a time and using them to update our model. Since this process is so fundamental to training machine learning algorithms, we need a utility for shuffling the data and accessing in mini-batches.

In [None]:
batch_size = 64
def data_iter(batch_size, features, labels):
    num_examples = # insert your code here
    indices = # insert your code here
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        # insert your code here

In [None]:
# test your iterator
# insert your code here

## Initialize Model Parameters

Before we can begin optimizing our model’s parameters by gradient descent, we need to have some parameters
in the first place. In the following code, we initialize weights by sampling random numbers from a normal
distribution with mean 0 and a standard deviation of 0.01, setting the bias $b$ to 0.

In [None]:
w = # insert your code here
b = # insert your code here

We use automatic differentiation to compute the gradient. We need to invoke the ``requires_grad`` function.

In [None]:
# insert your code here

## Define the Model

Here, we define our fancy linear model.

In [None]:
def linreg(X, w, b):
    # insert your code here

## Define the Loss Function

Here, we will use the squared loss function.

In [None]:
def squared_loss(y_hat, y):
    # insert your code here

## Define the Optimization Algorithm

Here, we introduce your first working examples of stochastic gradient descent (SGD).

In [None]:
def sgd(params, lr):
    # insert your code here

## Training

Now that we have all of the parts in place, we are ready to implement the main training loop. It is crucial
that you understand this code because you will see training loops that are nearly identical to this one over
and over again throughout your career in deep learning.

In [None]:
lr = 0.03 # learning rate
num_epochs = 10 # number of iterations
net = linreg # our linear model
loss = squared_loss # 0.5(y - y')^2

# insert your code here

In [None]:
print('Error in estimating w', # insert your code here)
print('Error in estimating b', # insert your code here)

Let's plot predicted values versus the true values.

In [None]:
# insert your code here

Now, let's compute the analytic solution for $\mathbf{w}$:

In [None]:
w_analytic = # insert your code here

In [None]:
print('Analytic solution for w is', # insert your code here)