# Boston House Prices

## Columns
- 0 - **CRIM** per capita crime rate by town
- 1 - **ZN** proportion of residential land zoned for lots over 25,000 sq.ft.
- 2 - **INDUS** proportion of non-retail business acres per town
- 3 - **CHAS** Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- 4 - **NOX** nitric oxides concentration (parts per 10 million)
- 5 - **RM** average number of rooms per dwelling
- 6 - **AGE** proportion of owner-occupied units built prior to 1940
- 7 - **DIS** weighted distances to five Boston employment centres
- 8 - **RAD** index of accessibility to radial highways
- 9 - **TAX** full-value property-tax rate per \$10,000
- 10 - **PTRATIO** pupil-teacher ratio by town
- 11 - **B** 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- 12 - **LSTAT** % lower status of the population
- 13 - **MEDV** Median value of owner-occupied homes in \$1000's

In [17]:
import urllib

def load_data_from_github():
    with urllib.request.urlopen('https://raw.githubusercontent.com/xeloo/dll-7/master/housing/housing.csv') as f:
        return parse_data(f)

def load_data():
    with open('housing.csv') as f:
        return parse_data(f)

def parse_data(f):
    features = []
    labels = []
    for line in f:
        line = line.split()
        features.append([float(n) for n in line[:-1]])
        labels.append(float(line[-1]))
    return features, labels

In [32]:
import torch

# features, labels = load_data()
features, labels = load_data_from_github()

features = torch.tensor(features)
labels = torch.tensor(labels)

In [33]:
from torch.utils.data import DataLoader, TensorDataset

BATCH_SIZE = 20
NUM_EPOCHS = 20_000
LR = 0.0000001

dataset = TensorDataset(features, labels)
batch_iter = DataLoader(dataset, BATCH_SIZE, shuffle=True)
model = torch.nn.Sequential(torch.nn.Linear(len(features[0]), 1))
loss = torch.nn.MSELoss(reduction='mean')
trainer = torch.optim.SGD(model.parameters(), lr=LR)

for epoch in range(1, NUM_EPOCHS + 1):
    for F, L in batch_iter:
        trainer.zero_grad()
        l = loss(model(F).reshape(-1), L)
        l.backward()
        trainer.step()

    if epoch % 1000 == 0:
        train_loss = loss(model(features).reshape(-1), labels)
        weight = model[0].weight[0].data.numpy()
        bias = model[0].bias[0].data.numpy()
        print(f'Epoch {epoch}, loss {train_loss}, W ({weight}), B {bias}')


Epoch 1000, loss 67.50804901123047, W ([-0.18727596  0.09416303 -0.1659706  -0.21928729 -0.00580175 -0.1694187
  0.03244738 -0.01084442 -0.13882078  0.01088397  0.28539988  0.04264949
 -0.12453445]), B 0.21817080676555634
Epoch 2000, loss 63.10318374633789, W ([-0.1787045   0.11728089 -0.17330188 -0.21710706 -0.00479079 -0.13765936
  0.06374736 -0.01162837 -0.12226164  0.00902472  0.29705265  0.03982295
 -0.2147703 ]), B 0.22060246765613556
Epoch 3000, loss 60.1593132019043, W ([-0.17062223  0.12067683 -0.17792766 -0.21500935 -0.00380981 -0.10755022
  0.07637528 -0.01271831 -0.10658175  0.00859999  0.3086477   0.03861367
 -0.29441768]), B 0.22295576333999634
Epoch 4000, loss 57.680946350097656, W ([-0.16277708  0.12000422 -0.18036896 -0.21296851 -0.00281846 -0.07838359
  0.0848378  -0.01371609 -0.09240991  0.00892879  0.32067227  0.0382662
 -0.36359844]), B 0.22530409693717957
Epoch 5000, loss 55.815128326416016, W ([-0.15513408  0.1193818  -0.18116768 -0.21097988 -0.0018141  -0.049949