# Ridge regression

## Using MLJ

In [1]:
using MLJ, RDatasets

### Load data

In [2]:
boston = RDatasets.dataset("MASS", "Boston")
first(boston, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


### Casting scientific types

In [3]:
y, X = unpack(boston, ==(:MedV), colname -> true);
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


In [4]:
first(X, 6) |> pretty

┌[0m────────────[0m┬[0m────────────[0m┬[0m────────────[0m┬[0m───────[0m┬[0m────────────[0m┬[0m────────────[0m┬[0m────[0m ⋯
│[0m[1m Crim       [0m│[0m[1m Zn         [0m│[0m[1m Indus      [0m│[0m[1m Chas  [0m│[0m[1m NOx        [0m│[0m[1m Rm         [0m│[0m[1m Age[0m ⋯
│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Int64 [0m│[0m[90m Float64    [0m│[0m[90m Float64    [0m│[0m[90m Flo[0m ⋯
│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Count [0m│[0m[90m Continuous [0m│[0m[90m Continuous [0m│[0m[90m Con[0m ⋯
├[0m────────────[0m┼[0m────────────[0m┼[0m────────────[0m┼[0m───────[0m┼[0m────────────[0m┼[0m────────────[0m┼[0m────[0m ⋯
│[0m 0.00632    [0m│[0m 18.0       [0m│[0m 2.31       [0m│[0m 0.0   [0m│[0m 0.538      [0m│[0m 6.575      [0m│[0m 65.[0m ⋯
│[0m 0.02731    [0m│[0m 0.0        [0m│[0m 7.07       [0m│[0m 0.0   [0m│

In [5]:
X = coerce(X, autotype(X, rules=(:discrete_to_continuous,)))
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0
6,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0


### Training/testing set

In [6]:
train, test = partition(eachindex(y), 0.7, shuffle=true)

([371, 472, 32, 59, 8, 75, 250, 11, 133, 121  …  129, 66, 200, 461, 142, 102, 374, 253, 40, 320], [190, 450, 55, 455, 314, 197, 122, 350, 256, 356  …  136, 305, 290, 232, 229, 300, 29, 251, 6, 244])

### Model

In [7]:
model = @load RidgeRegressor pkg=MLJLinearModels

RidgeRegressor(
    lambda = 1.0,
    fit_intercept = true,
    penalize_intercept = false,
    solver = nothing)[34m @ 8…61[39m

In [8]:
match = machine(model, X, y)

[34mMachine{RidgeRegressor} @ 6…38[39m


### Training

In [9]:
fit!(match, rows=train)

┌ Info: Training [34mMachine{RidgeRegressor} @ 6…38[39m.
└ @ MLJBase /home/yuehhua/.julia/packages/MLJBase/O5b6j/src/machines.jl:187


[34mMachine{RidgeRegressor} @ 6…38[39m


### Predict

In [10]:
ŷ = MLJ.predict(match, rows=test)

152-element Array{Float64,1}:
 33.75442075227571
 18.844818142865357
 15.86352195427326
 16.327806631319355
 25.68445933452029
 35.48503112035486
 23.546774592502864
 23.87256125385063
 21.133343844667458
 18.381282624607838
 26.90960496182523
 29.332656873024444
 14.519445842951926
  ⋮
 18.622908655566526
 21.07971112973137
 19.399021954943493
 34.08938727509364
 26.609477552192395
 33.93171852276875
 36.19119271251318
 31.491193143841627
 21.242348221265107
 24.585590803592137
 25.40200969689247
 26.671274629370657

### Evaluation

In [11]:
rms(ŷ, y[test])

4.329127011345622

### View model parameters

In [12]:
coefs, intercept = fitted_params(match)
coefs

13-element Array{Pair{Symbol,Float64},1}:
    :Crim => -0.11780029949322966
      :Zn => 0.055558632922787524
   :Indus => 0.05511214911088222
    :Chas => 2.1549069063045105
     :NOx => -6.401888402053627
      :Rm => 5.142170086807805
     :Age => 0.003307246228248168
     :Dis => -1.1707498937513303
     :Rad => 0.28855929116455653
     :Tax => -0.01457888864025983
 :PTRatio => -0.608630261655872
   :Black => 0.012784703609545563
   :LStat => -0.49976568964713336