# LASSO regression

## Using Lasso

In [1]:
using Lasso, RDatasets, MLDataUtils

### Load data

In [2]:
boston = RDatasets.dataset("MASS", "Boston")
first(boston, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


### Training/Testing set

In [3]:
indecies = MLDataUtils.shuffleobs(collect(1:nrow(boston)))
train_ind, test_ind = MLDataUtils.splitobs(indecies, at=0.8);

In [4]:
train = boston[train_ind, :]
test = boston[test_ind, :];

### Model

In [5]:
model = fit(LassoModel,
    @formula(MedV ~ Crim + Zn + Indus + Chas + NOx + Rm + Age + Dis + Rad + Tax + PTRatio + Black + LStat), train)

LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
─────────────────
         Estimate
─────────────────
x1    33.6991
x2    -0.0901501
x3     0.0456008
x4     0.0
x5     2.89879
x6   -16.4004
x7     4.0931
x8     0.0
x9    -1.42865
x10    0.220366
x11   -0.010076
x12   -0.945803
x13    0.00874348
x14   -0.493709
─────────────────


### Prediction

In [6]:
predict(model)

405-element Array{Float64,1}:
 27.752020451183977
 21.59252478272552
 20.671747131917282
 24.118865630632897
 20.582304142058305
 31.029472416364648
 20.94414136737522
 17.899650100562912
 19.00507055075345
 37.04001902169538
 23.245148065028637
 23.928919139630473
 19.76534956949493
  ⋮
 42.62485230644667
 19.50268223125579
 19.39546884999927
 17.419972655856235
 37.632685471412344
 14.191526815145519
 22.206978391075197
 37.667079042304785
  9.523972979410038
 23.864934702026268
 24.52991781120857
 27.89109519735464

## Using MLJ

In [7]:
using MLJ



### Casting scientific types

In [8]:
y, X = unpack(boston, ==(:MedV), colname -> true);
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


In [9]:
first(X, 6) |> pretty

┌[0m────────────────────────────[0m┬[0m────────────────────────────[0m┬[0m───────────────────[0m ⋯
│[0m[1m Crim                       [0m│[0m[1m Zn                         [0m│[0m[1m Indus             [0m ⋯
│[0m[90m Float64                    [0m│[0m[90m Float64                    [0m│[0m[90m Float64           [0m ⋯
│[0m[90m ScientificTypes.Continuous [0m│[0m[90m ScientificTypes.Continuous [0m│[0m[90m ScientificTypes.Co[0m ⋯
├[0m────────────────────────────[0m┼[0m────────────────────────────[0m┼[0m───────────────────[0m ⋯
│[0m 0.00632                    [0m│[0m 18.0                       [0m│[0m 2.31              [0m ⋯
│[0m 0.02731                    [0m│[0m 0.0                        [0m│[0m 7.07              [0m ⋯
│[0m 0.02729                    [0m│[0m 0.0                        [0m│[0m 7.07              [0m ⋯
│[0m 0.03237                    [0m│[0m 0.0                        [0m│[0m 2.18              [0m ⋯
│[0m 0.06905

In [10]:
X = coerce(X, autotype(X, rules=(:discrete_to_continuous,)))
# X = coerce(X, :Chas => MLJ.Continuous, :Rad => MLJ.Continuous, :Tax => MLJ.Continuous)
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0
6,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0


### Training/testing set

In [11]:
train, test = partition(eachindex(y), 0.7, shuffle=true)

([29, 169, 108, 265, 159, 158, 59, 122, 302, 105  …  133, 82, 307, 17, 429, 464, 295, 249, 432, 2], [308, 325, 346, 53, 397, 468, 153, 165, 181, 315  …  258, 76, 282, 71, 331, 54, 63, 423, 204, 465])

### Model

In [12]:
model = @load LassoRegressor pkg=MLJLinearModels

LassoRegressor(
    lambda = 1.0,
    fit_intercept = true,
    penalize_intercept = false,
    solver = nothing)[34m @ 9…58[39m

In [13]:
match = machine(model, X, y)

[34mMachine{LassoRegressor} @ 1…83[39m


### Training

In [14]:
fit!(match, rows=train)

┌ Info: Training [34mMachine{LassoRegressor} @ 1…83[39m.
└ @ MLJBase /home/yuehhua/.julia/packages/MLJBase/O5b6j/src/machines.jl:187
└ @ MLJLinearModels /home/yuehhua/.julia/packages/MLJLinearModels/CcbwD/src/fit/proxgrad.jl:64


[34mMachine{LassoRegressor} @ 1…83[39m


### Predict

In [15]:
ŷ = MLJ.predict(match, rows=test)

152-element Array{Float64,1}:
 32.11530869304262
 26.0724510137471
 20.608349361198176
 25.97220223000138
 20.88871052087215
 17.462522176049067
 18.907501532748036
 21.677195017837096
 30.610023510379676
 27.04583128861639
 31.30811352589208
 26.748596200294866
 25.41468700286563
  ⋮
 17.93660134176985
 21.829839116846056
 33.113267947188426
 22.255873127337704
 27.962330409175202
 21.986123614712017
 18.369908251888
 22.826954106185987
 31.115898487386985
 19.741700408739543
 36.513513387911665
 22.140502451646594

### Evaluation

In [16]:
rms(ŷ, y[test])

5.9351483048236835

### View model parameters

In [17]:
coefs, intercept = fitted_params(match)
coefs

13-element Array{Pair{Symbol,Float64},1}:
    :Crim => -0.13774540096392754
      :Zn => 0.09744794499844583
   :Indus => 0.04743046557190644
    :Chas => 0.16870104133046204
     :NOx => 0.07888240302941212
      :Rm => 1.6769195516397581
     :Age => 0.08493152015487405
     :Dis => -0.045530907543701295
     :Rad => 0.24386518278638805
     :Tax => -0.013329468899205818
 :PTRatio => 0.5394016798529406
   :Black => 0.02051275165213367
   :LStat => -0.7336475888699403