Skip to content

Add svm_linear() using LiblineaR engine #424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 8, 2021
Merged

Add svm_linear() using LiblineaR engine #424

merged 6 commits into from
Feb 8, 2021

Conversation

juliasilge
Copy link
Member

This PR adds a new model svm_linear() with the LiblineaR engine.

The LiblineaR package does support sparse matrices for modeling but unfortunately 😩 not using the Matrix package. Instead it uses the sparseM package, and I don't think this is going to work out well for us. The sparse matrix implementation in sparseM does not keep column names, for example. Overall, it is not as nice and our current infrastructure does not support it. Right now, I have set allow_sparse_x = FALSE for this new model. I wonder if the LiblineaR maintainers would be open to supporting Matrix sparse matrices instead of sparseM ones? They are not on GitHub. We can think about that more.

For the future, adding LiblineaR as an engine will mean we can use it as another engine for logistic regression as well. 🤠

🎯 The different types of SVM can be specified right now as an engine-specific parameter. I probably should add some error checking for classification vs. regression and that type argument.

None of the SVM models support class probabilities.

library(tidymodels)

data(two_class_dat, package = "modeldata")
example_split <- initial_split(two_class_dat, prop = 0.99)
example_train <- training(example_split)
example_test  <-  testing(example_split)

set.seed(123)
mod <- svm_linear() %>%
  set_engine("LiblineaR", type = 2) %>%
  set_mode("classification") %>%
  fit(Class ~ ., example_train)

mod
#> parsnip model object
#> 
#> Fit time:  3ms 
#> $TypeDetail
#> [1] "L2-regularized L2-loss support vector classification primal (L2R_L2LOSS_SVC)"
#> 
#> $Type
#> [1] 2
#> 
#> $W
#>              A         B     Bias
#> [1,] 0.4168073 -1.343176 1.335922
#> 
#> $Bias
#> [1] 1
#> 
#> $ClassNames
#> [1] Class1 Class2
#> Levels: Class1 Class2
#> 
#> $NbClass
#> [1] 2
#> 
#> attr(,"class")
#> [1] "LiblineaR"

predict(mod, new_data = example_test)
#> # A tibble: 7 x 1
#>   .pred_class
#>   <fct>      
#> 1 Class1     
#> 2 Class2     
#> 3 Class1     
#> 4 Class2     
#> 5 Class2     
#> 6 Class1     
#> 7 Class1
predict(mod, new_data = example_test, type = "prob")
#> Error: The LiblineaR engine does not support class probabilities for any `svm` models.
predict(mod, new_data = example_test, type = "raw")
#> $predictions
#> [1] Class1 Class2 Class1 Class2 Class2 Class1 Class1
#> Levels: Class1 Class2

Created on 2021-02-01 by the reprex package (v1.0.0)

There are several options for regression SVMs.

library(tidymodels)

car_split <- initial_split(mtcars)
car_tr <- training(car_split)
car_te <- testing(car_split)

mod <- svm_linear() %>%
  set_engine("LiblineaR") %>%
  set_mode("regression") %>%
  fit(mpg ~ ., car_tr)

mod
#> parsnip model object
#> 
#> Fit time:  3ms 
#> $TypeDetail
#> [1] "L2-regularized L2-loss support vector regression primal (L2R_L2LOSS_SVR)"
#> 
#> $Type
#> [1] 11
#> 
#> $W
#>            cyl        disp         hp      drat        wt     qsec         vs
#> [1,] 0.1453677 -0.04193837 0.05087321 0.2278812 0.0690725 1.032539 0.06168365
#>             am      gear         carb       Bias
#> [1,] 0.0553116 0.2206979 -0.001296262 0.05185603
#> 
#> $Bias
#> [1] 1
#> 
#> $NbClass
#> [1] 2
#> 
#> attr(,"class")
#> [1] "LiblineaR"

predict(mod, new_data = car_te)
#> # A tibble: 8 x 1
#>   .pred
#>   <dbl>
#> 1  14.2
#> 2  20.3
#> 3  18.4
#> 4  18.6
#> 5  19.0
#> 6  14.2
#> 7  22.9
#> 8  17.2
predict(mod, new_data = car_te, type = "raw")
#> $predictions
#> [1] 14.20855 20.28900 18.41048 18.59350 19.00997 14.21174 22.92824 17.23101

Created on 2021-02-01 by the reprex package (v1.0.0)

@juliasilge
Copy link
Member Author

juliasilge commented Feb 2, 2021

Addresses #336

Addresses #419 for SVMs but would still like to do another PR for logistic regression

Copy link
Member

@topepo topepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a kernlab engine too (but that could be a different PR).

Please add a note to the NEWS file.

As an aside, I've experienced that tuning the SVM cost with linear kernels to be fruitless. With caret, I always had identical performance across different cost values. Not sure why. The same seems to be the case here:

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.2 ──
#> ✓ broom     0.7.3           ✓ recipes   0.1.15.9000
#> ✓ dials     0.0.9.9000      ✓ rsample   0.0.8.9001 
#> ✓ dplyr     1.0.3           ✓ tibble    3.0.6      
#> ✓ ggplot2   3.3.3           ✓ tidyr     1.1.2      
#> ✓ infer     0.5.3           ✓ tune      0.1.2.9000 
#> ✓ modeldata 0.1.0.9000      ✓ workflows 0.2.1      
#> ✓ parsnip   0.1.5           ✓ yardstick 0.0.7.9000 
#> ✓ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(doMC)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
registerDoMC(cores = 20)

data(two_class_dat)

set.seed(1)
folds <- vfold_cv(two_class_dat)

svm_spec <- 
  svm_linear(cost = tune()) %>% 
  set_engine("LiblineaR") %>% 
  set_mode("classification")

grid <- tibble(cost = 10^(-10:3))

set.seed(1)
svm_res <-
  svm_spec %>% 
  tune_grid(Class ~ ., resamples = folds, grid = grid, metrics = metric_set(accuracy), 
            control = control_grid(parallel_over = "everything"))

svm_res %>% 
  collect_metrics() %>% 
  pluck("mean") %>% 
  range()
#> [1] 0.8153323 0.8153323

Created on 2021-02-02 by the reprex package (v0.3.0)

This is probably good (since it is reproducible) but I'm not really sure why it is the case.

juliasilge and others added 5 commits February 4, 2021 11:07
Merge branch 'master' into liblinear-svm

# Conflicts:
#	DESCRIPTION
Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
@juliasilge
Copy link
Member Author

Added that error checking for LiblineaR's type:

library(parsnip)

svm_linear() %>%
  set_engine("LiblineaR", type = 1) %>%
  set_mode("regression") %>%
  fit(mpg ~ ., mtcars)
#> Error: The LiblineaR engine argument of `type` = 1 does not correspond to an SVM regression model.

svm_linear() %>%
  set_engine("LiblineaR", type = "potato") %>%
  set_mode("regression") %>%
  fit(mpg ~ ., mtcars)
#> Error: The LiblineaR engine argument of `type` = potato does not correspond to an SVM regression model.

Created on 2021-02-04 by the reprex package (v1.0.0)

@juliasilge juliasilge merged commit 687b251 into master Feb 8, 2021
@juliasilge juliasilge deleted the liblinear-svm branch February 8, 2021 18:11
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants