## Implement Linear Regression on Boston Housing Dataset by PyTorch
https://medium.com/analytics-vidhya/implement-linear-regression-on-boston-housing-dataset-by-pytorch-c5d29546f938


This article aims to share with you some methods to implement linear regression on a real dataset, which includes data including, data analysis, datasets split and regression construction itself. To learn PyTorch well, I’d demonstrate regression by PyTorch and show you the charm of PyTorch in forward and backward.
This story has a hypothesis that all the readers have been familiar with the principle of linear regression. Readers should understand the meaning and solution methods of W and b of the equation `Y = XW + b`. To have a better experience, it’s better to understand the gradient descent method that can be used to solve the problem and understand the MSE used to evaluate the regression performance.

## Boston Housing Dataset processing
Boston Housing Dataset is collected by the U.S Census Service concerning housing in the area of Boston Mass.

### Packages we need

In [3]:
from sklearn.datasets import load_boston
import pandas as pd

We utilize datasets built in sklearn to load our housing dataset, and process it by pandas.
### Peek dataset

In [7]:
bos = load_boston()
print(bos.keys())
print(bos.data.shape)

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
(506, 13)


The datasets we loaded has been formatted a dict, hence we can know what fields it has by using .keys() method.


As we can see, there exist six fields:


1. data: the content of features, which are what we focus on.
2. target: the price of houses, which are what we need to predict.
3. feature_names: as its name, feature names. storing the meanings of each column respectively.
4. DESCR: the description of this dataset.
5. filename: the path of this dataset storing.


Much more, watch the size of the dataset.


### Preprocessing
Firstly, load our data to DataFrame by Pandas. DataFrame can be recognized as a high dimension sheet, we use it here as a two-dimension matrix.


In [8]:
df = pd.DataFrame(bos.data)
df.columns = bos.feature_names
df['Price'] = bos.target
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Price
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


For easy viewing, we map the name of the future to each column of DataFrame. Then peek the first 5 rows of data by .head() after adding a ‘Price’ column to our data.


Check the description of the data by .describe().

In [9]:
df.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Price
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


It can be seen that the value range of data is different and the difference is large, so we need to make standardization. Suppose each feature has a mean value μ and a standard deviation σ on the whole dataset. Hence we can subtract each value of the feature and then divide μ by σ to get the normalized value of each feature.

In [10]:
data = df[df.columns[:-1]]
data = data.apply(
    lambda x: (x - x.mean()) / x.std()
)

data['Price'] = df.Price

Lambda expression is used to simplify code.

### Split training data and testing data
Format data as an array in numpy first.

In [11]:
import numpy as np

X = data.drop('Price', axis=1).to_numpy()
Y = data['Price'].to_numpy()

Then, divide our data as a training set and a testing set.

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

(354, 13)
(152, 13)
(354,)
(152,)


We’ll get the following result.

## Construct Linear Regression by PyTorch
Import PyTorch first.

In [13]:
import torch

print(torch.__version__)

1.4.0


### Data processing
Convert data to tensor which is supported by PyTorch.


In [14]:
n_train = X_train.shape[0]
X_train = torch.tensor(X_train, dtype=torch.float)
X_test = torch.tensor(X_test, dtype=torch.float)
Y_train = torch.tensor(Y_train, dtype=torch.float).view(-1, 1)
Y_test = torch.tensor(Y_test, dtype=torch.float).view(-1, 1)

### Construct the neural network
We use nn.Sequential defines a neural network with one layer and initialize it.

In [16]:
w_num = X_train.shape[1]
net = torch.nn.Sequential(
    torch.nn.Linear(w_num, 1)
)

torch.nn.init.normal_(net[0].weight, mean=0, std=0.1)
torch.nn.init.constant_(net[0].bias, val=0)

Parameter containing:
tensor([0.], requires_grad=True)

Only two parameters are accepted by nn.Linear, which are the dimension of weight and the dimension of output respectively.
Parameters don’t need to be initialized in our examination because Linear will do it automatically.

### The usage of DataLoader
DataLoader is implemented in PyTorch, which will return an iterator to iterate training data by batch. It’s easy to use, let’s start from constructing a Dataset of Tensor.


Then, generate a DataLoder by using this Dataset.

In [17]:
datasets = torch.utils.data.TensorDataset(X_train, Y_train)
train_iter = torch.utils.data.DataLoader(datasets, batch_size=10, shuffle=True)

batch_size is the size of each batch in which data returned. Data will be returned in random sequence if shuffle is True.
### Loss function and optimizer
We must define loss function before training the neural network, here we use Mean Square Error(MSE).


After that, optimize the neural network by stochastic gradient descent.


In [19]:
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.05)

### Training and evaluation
Now, let’s start training.

In [20]:
num_epochs = 5
for epoch in range(num_epochs):
    for x, y in train_iter:
        output = net(x)
        l = loss(output, y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
    print("epoch {} loss: {:.4f}".format(epoch + 1, l.item()))

epoch 1 loss: 7.7347
epoch 2 loss: 18.9003
epoch 3 loss: 24.1309
epoch 4 loss: 21.3636
epoch 5 loss: 14.5073


Train the training set for 5 epochs. The training process is roughly as follows.


1. Load a batch of data.
2. Predict the batch of the data through net.
3. Calculate the loss value by predict value and true value.
4. Clear the grad value optimizer stored.
5. Backpropagate the loss value.
6. Update optimizer.


Now, let’s check its performance on the testing dataset.

In [21]:
print(loss(net(X_test), Y_test).item())

25.897653579711914


It is not much different from the training set.