# Julia 機器學習：GLM 線性迴歸

## 作業 027：波士頓房價預測資料集

請使用 GLM 中的模型，建立一個預測模型來預測波士頓的房價。

In [1]:
using GLM, RDatasets, MLDataUtils

## 讀取資料

In [2]:
boston = dataset("MASS", "Boston")
first(boston, 10)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222
7,0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311
8,0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311
9,0.21124,12.5,7.87,0,0.524,5.631,100.0,6.0821,5,311
10,0.17004,12.5,7.87,0,0.524,6.004,85.9,6.5921,5,311


## 轉換資料

In [3]:
boston[!, :LogMedV] = log.(boston[!, :MedV])

506-element Array{Float64,1}:
 3.1780538303479458
 3.0726933146901194
 3.5467396869528134
 3.5085558999826545
 3.5890591188317256
 3.3568971227655755
 3.131136910560194
 3.299533727885655
 2.803360380906535
 2.9391619220655967
 2.70805020110221
 2.9391619220655967
 3.077312260546414
 ⋮
 3.1986731175506815
 3.139832617527748
 2.9806186357439426
 2.9069010598473755
 3.054001181677967
 2.8622008809294686
 2.8213788864092133
 3.109060958860994
 3.0252910757955354
 3.173878458937465
 3.091042453358316
 2.4765384001174837

## 訓練與測試資料集

In [4]:
indecies = MLDataUtils.shuffleobs(collect(1:nrow(boston)))
train_ind, test_ind = MLDataUtils.splitobs(indecies, at = 0.8);

In [5]:
train = boston[train_ind, :]
test = boston[test_ind, :]

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.21038,20.0,3.33,0,0.4429,6.812,32.2,4.1007,5,216
2,0.00906,90.0,2.97,0,0.4,7.088,20.8,7.3073,1,285
3,0.06162,0.0,4.39,0,0.442,5.898,52.3,8.0136,3,352
4,88.9762,0.0,18.1,0,0.671,6.968,91.9,1.4165,24,666
5,0.01965,80.0,1.76,0,0.385,6.23,31.5,9.0892,1,241
6,0.06129,20.0,3.33,1,0.4429,7.645,49.7,5.2119,5,216
7,0.09103,0.0,2.46,0,0.488,7.155,92.2,2.7006,3,193
8,0.11132,0.0,27.74,0,0.609,5.983,83.5,2.1099,4,711
9,0.34006,0.0,21.89,0,0.624,6.458,98.9,2.1185,4,437
10,0.06047,0.0,2.46,0,0.488,6.153,68.8,3.2797,3,193


## 使用線性迴歸模型

In [6]:
model = lm(@formula(LogMedV ~ Crim + Chas + NOx + Rm + Dis + Rad + Tax + PTRatio + Black + LStat), train)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

LogMedV ~ 1 + Crim + Chas + NOx + Rm + Dis + Rad + Tax + PTRatio + Black + LStat

Coefficients:
───────────────────────────────────────────────────────────────────────────────────────
                 Estimate   Std. Error    t value  Pr(>|t|)     Lower 95%     Upper 95%
───────────────────────────────────────────────────────────────────────────────────────
(Intercept)   4.13296      0.226896      18.2152     <1e-53   3.68689       4.57904
Crim         -0.00823228   0.00181616    -4.53279    <1e-5   -0.0118029    -0.0046617
Chas          0.107854     0.0373998      2.88381    0.0041   0.0343258     0.181382
NOx          -0.698663     0.151816      -4.60205    <1e-5   -0.997132     -0.400193
Rm            0.0842509    0.0177436      4.74825    <1e-5    0.049367      0.119135
Dis          -0.0418938    0.00708269    -5.9

## 預測

In [7]:
exp.(predict(model, test))

101-element Array{Float64,1}:
 37.16698000821248
 27.88827621653114
 17.273591211359612
  9.708507428065937
 18.37646209147955
 44.4133115271711
 34.253570122355406
 15.461026650861474
 18.13499688456796
 23.742390244559296
  8.496982534938871
 24.64921505177357
 36.913262548995384
  ⋮
 25.082619670017458
 11.17036776171758
 16.639479297863645
 20.659693746874623
 31.245948400597115
 11.779712764226371
 30.64610634228063
 26.848530034619174
  8.413988374221933
 23.226916628776465
 21.47384235527036
 17.997012974313197

## 評估模型

In [8]:
adjr²(model)

0.7862249645410637