# Ridge regression

## Using MLJ

In [1]:
using MLJ, RDatasets

### Load data

In [2]:
boston = RDatasets.dataset("MASS", "Boston")
first(boston, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


### Casting scientific types

In [3]:
y, X = unpack(boston, ==(:MedV), colname -> true);
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


In [4]:
first(X, 6) |> pretty

┌[0m────────────[0m┬[0m────────────[0m┬[0m────────────[0m┬[0m───────[0m┬[0m────────────[0m┬[0m────────────[0m┬[0m────[0m ⋯
│[0m[1m Crim       [0m│[0m[1m Zn         [0m│[0m[1m Indus      [0m│[0m[1m Chas  [0m│[0m[1m NOx        [0m│[0m[1m Rm         [0m│[0m[1m Age[0m ⋯
│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Int64 [0m│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Flo[0m ⋯
│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Count [0m│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Con[0m ⋯
├[0m────────────[0m┼[0m────────────[0m┼[0m────────────[0m┼[0m───────[0m┼[0m────────────[0m┼[0m────────────[0m┼[0m────[0m ⋯
│[0m 0.00632    [0m│[0m 18.0       [0m│[0m 2.31       [0m│[0m 0.0   [0m│[0m 0.538      [0m│[0m 6.575      [0m│[0m 65.[0m ⋯
│[0m 0.02731    [0m│[0m 0.0        [0m│[0m 7.07       [0m│[0m 0.0   [0m│

In [5]:
X = coerce(X, autotype(X, rules=(:discrete_to_continuous,)))
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0
6,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0


### Training/testing set

In [6]:
train, test = partition(eachindex(y), 0.7, shuffle=true)

([255, 72, 476, 426, 325, 504, 155, 254, 247, 299  …  201, 382, 193, 9, 471, 212, 423, 452, 279, 459], [128, 7, 422, 46, 232, 65, 444, 329, 365, 122  …  393, 499, 367, 81, 96, 465, 134, 251, 317, 357])

### Model

In [7]:
model = @load RidgeRegressor pkg=MLJLinearModels

RidgeRegressor(
    lambda = 1.0,
    fit_intercept = true,
    penalize_intercept = false,
    solver = nothing)[34m @ 1…32[39m

In [8]:
match = machine(model, X, y)

[34mMachine{RidgeRegressor} @ 1…33[39m


### Training

In [9]:
fit!(match, rows=train)

┌ Info: Training [34mMachine{RidgeRegressor} @ 1…33[39m.
└ @ MLJBase C:\Users\a504082002\.julia\packages\MLJBase\qJs1o\src\machines.jl:182


[34mMachine{RidgeRegressor} @ 1…33[39m


### Predict

In [10]:
ŷ = MLJ.predict(match, rows=test)

152-element Array{Float64,1}:
 15.217052283390242
 22.01135621148103
 17.76646453908079
 21.26911071558601
 32.93606280886065
 25.327841606587825
 19.167540247748445
 21.662498423911835
 41.48793795396132
 20.645349641925307
 24.948324646679673
  5.891139083639469
 30.572772126072202
  ⋮
 29.51882103788002
 27.987121029077247
  9.12650922573252
 21.69075919133258
 13.383495882561519
 28.71224521238985
 28.108047839276328
 20.553184295956253
 16.001575594037806
 25.418675611054447
 17.689178302918886
 21.620093819675205

### Evaluation

In [11]:
rms(ŷ, y[test])

5.463355765275269

### View model parameters

In [12]:
coefs, intercept = fitted_params(match)
coefs

13-element Array{Float64,1}:
 -0.10027994610962947
  0.0323462650835045
 -0.06119807960814794
  3.649152571146082
 -4.883391203835472
  5.361128741051077
 -0.027397232551043378
 -1.0875535497431095
  0.16402046279194696
 -0.007924616949429912
 -0.5852599954560127
  0.011015236434199221
 -0.3993216460290642