# Boston House Prices

## Columns
- 0 - **CRIM** per capita crime rate by town
- 1 - **ZN** proportion of residential land zoned for lots over 25,000 sq.ft.
- 2 - **INDUS** proportion of non-retail business acres per town
- 3 - **CHAS** Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- 4 - **NOX** nitric oxides concentration (parts per 10 million)
- 5 - **RM** average number of rooms per dwelling
- 6 - **AGE** proportion of owner-occupied units built prior to 1940
- 7 - **DIS** weighted distances to five Boston employment centres
- 8 - **RAD** index of accessibility to radial highways
- 9 - **TAX** full-value property-tax rate per $10,000
- 10 - **PTRATIO** pupil-teacher ratio by town
- 11 - **B** 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- 12 - **LSTAT** % lower status of the population
- 13 - **MEDV** Median value of owner-occupied homes in $1000's


In [1]:
import torch
from torch.utils.data import DataLoader, TensorDataset

In [10]:
import urllib

def load_data_from_github():
    with urllib.request.urlopen('https://raw.githubusercontent.com/xeloo/dll-7/main/housing/housing.csv') as f:
        return parse_data(f)

def load_data():
    with open('housing.csv') as f:
        return parse_data(f)

def parse_data(f):
    features = []
    labels = []
    for line in f:
        line = line.split()
        features.append([float(n) for n in line[:-1]])
        labels.append(float(line[-1]))
    return features, labels

In [16]:
# features, labels = load_data()
features, labels = load_data_from_github()

features = torch.tensor(features)
labels = torch.tensor(labels)

HTTPError: HTTP Error 404: Not Found

In [15]:
BATCH_SIZE = 20
NUM_EPOCHS = 200
LR = 0.0000001

dataset = TensorDataset(features, labels)
batch_iter = DataLoader(dataset, BATCH_SIZE, shuffle=True)
model = torch.nn.Sequential(torch.nn.Linear(len(features[0]), 1))
loss = torch.nn.MSELoss(reduction='mean')
trainer = torch.optim.SGD(model.parameters(), lr=LR)

for epoch in range(1, NUM_EPOCHS + 1):
    for F, L in batch_iter:
        # print(F, L)
        trainer.zero_grad()
        l = loss(model(F).reshape(-1), L)
        l.backward()
        trainer.step()

    if epoch % 100 == 0:
        train_loss = loss(model(features).reshape(-1), labels)
        print(f'Epoch {epoch}, loss {train_loss}, W ({model[0].weight.data.numpy()}), B {model[0].bias.data.numpy()}')


Epoch 100, loss 103.60211944580078, W ([[ 0.05708145  0.02877379 -0.03246138  0.26366943  0.12525679 -0.19989628
  -0.16121057 -0.20068854  0.1839577   0.01257791 -0.06925323  0.08150072
  -0.00768908]]), B [-0.22627367]
Epoch 200, loss 94.06243133544922, W ([[ 0.05112356  0.03825352 -0.03191194  0.26392016  0.12550278 -0.1949134
  -0.11426363 -0.20005128  0.18086499  0.00749515 -0.06435675  0.07822218
  -0.01430069]]), B [-0.22578734]
