# LASSO regression

## Using Lasso

In [1]:
using Lasso, RDatasets, MLDataUtils

### Load data

In [2]:
boston = RDatasets.dataset("MASS", "Boston")
first(boston, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


### Training/Testing set

In [3]:
indecies = MLDataUtils.shuffleobs(collect(1:nrow(boston)))
train_ind, test_ind = MLDataUtils.splitobs(indecies, at=0.8);

In [4]:
train = boston[train_ind, :]
test = boston[test_ind, :];

### Model

In [5]:
model = fit(LassoModel,
    @formula(MedV ~ Crim + Zn + Indus + Chas + NOx + Rm + Age + Dis + Rad + Tax + PTRatio + Black + LStat), train)

LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
─────────────────
         Estimate
─────────────────
x1    35.5457
x2    -0.120674
x3     0.0365959
x4    -0.00453331
x5     1.44784
x6   -18.1414
x7     4.19204
x8     0.0
x9    -1.44372
x10    0.239212
x11   -0.00997407
x12   -0.988778
x13    0.00706187
x14   -0.501752
─────────────────


### Prediction

In [6]:
predict(model)

405-element Array{Float64,1}:
 24.338883895359977
 35.23638223205804
 14.359030934640193
 30.30466339689464
 24.37197276392295
 17.49238762819723
 11.868696245591039
  9.553004384303525
 15.08646568648892
 15.876649737378907
 28.643294908666473
 31.360788549815965
 21.422336361071665
  ⋮
 40.7290018314796
 27.448872794987025
  8.96614303664683
 35.06115007350472
 20.187362267528712
 27.71703316919858
 33.11804168819215
 13.561358907105879
 29.125522572551255
 17.11812971031536
 39.81074560724911
 16.8968216429447

## Using MLJ

In [7]:
using MLJ



### Casting scientific types

In [8]:
y, X = unpack(boston, ==(:MedV), colname -> true);
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222


In [9]:
first(X, 6) |> pretty

┌[0m────────────────────────────[0m┬[0m────────────────────────────[0m┬[0m───────────────────[0m ⋯
│[0m[1m Crim                       [0m│[0m[1m Zn                         [0m│[0m[1m Indus             [0m ⋯
│[0m[90m Float64                    [0m│[0m[90m Float64                    [0m│[0m[90m Float64           [0m ⋯
│[0m[90m ScientificTypes.Continuous [0m│[0m[90m ScientificTypes.Continuous [0m│[0m[90m ScientificTypes.Co[0m ⋯
├[0m────────────────────────────[0m┼[0m────────────────────────────[0m┼[0m───────────────────[0m ⋯
│[0m 0.00632                    [0m│[0m 18.0                       [0m│[0m 2.31              [0m ⋯
│[0m 0.02731                    [0m│[0m 0.0                        [0m│[0m 7.07              [0m ⋯
│[0m 0.02729                    [0m│[0m 0.0                        [0m│[0m 7.07              [0m ⋯
│[0m 0.03237                    [0m│[0m 0.0                        [0m│[0m 2.18              [0m ⋯
│[0m 0.06905

In [10]:
X = coerce(X, autotype(X, rules=(:discrete_to_continuous,)))
# X = coerce(X, :Chas => MLJ.Continuous, :Rad => MLJ.Continuous, :Tax => MLJ.Continuous)
first(X, 6)

Unnamed: 0_level_0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0
6,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0


### Training/testing set

In [11]:
train, test = partition(eachindex(y), 0.7, shuffle=true)

([94, 164, 71, 383, 42, 166, 102, 489, 413, 459  …  169, 109, 266, 281, 138, 29, 480, 337, 469, 221], [237, 177, 263, 227, 265, 210, 97, 136, 157, 162  …  224, 376, 447, 402, 382, 373, 437, 202, 339, 390])

### Model

In [12]:
model = @load LassoRegressor pkg=MLJLinearModels

LassoRegressor(
    lambda = 1.0,
    fit_intercept = true,
    penalize_intercept = false,
    solver = nothing)[34m @ 1…53[39m

In [13]:
match = machine(model, X, y)

[34mMachine{LassoRegressor} @ 1…68[39m


### Training

In [14]:
fit!(match, rows=train)

┌ Info: Training [34mMachine{LassoRegressor} @ 1…68[39m.
└ @ MLJBase /home/yuehhua/.julia/packages/MLJBase/qJs1o/src/machines.jl:182
└ @ MLJLinearModels /home/yuehhua/.julia/packages/MLJLinearModels/4VdUV/src/fit/proxgrad.jl:64


[34mMachine{LassoRegressor} @ 1…68[39m


### Predict

In [15]:
ŷ = MLJ.predict(match, rows=test)

152-element Array{Float64,1}:
 25.850097644777854
 21.03718466814825
 32.551221144157324
 34.32851636766815
 28.65879599998246
 15.575606664892303
 22.30631152778553
 22.04262865249816
 11.217096979274036
 31.504583913450993
 20.637152874686464
 26.66842441869621
 31.246982451875834
  ⋮
 25.555506344325902
 27.762318363321068
 27.87238835962927
 25.444564809234635
 19.856848095776606
 19.00249345847234
 18.61601052635859
 25.81775408501557
 13.462414659684093
 26.668709547890046
 24.78082377944837
 17.21425900011676

### Evaluation

In [16]:
rms(ŷ, y[test])

6.629712200477897

### View model parameters

In [17]:
coefs, intercept = fitted_params(match)
coefs

13-element Array{Float64,1}:
 -0.10412475612038934
  0.09355993006835857
  0.07229933436987476
  0.10863995394904914
  0.07733168450700742
  1.9967222411087018
  0.07209829603014707
 -0.2846335116479614
  0.26537070302347343
 -0.016925905489813865
  0.6060300792360868
  0.018549699694932364
 -0.7477371916374584