-
Notifications
You must be signed in to change notification settings - Fork 92
Add svm_linear() using LiblineaR engine #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a kernlab
engine too (but that could be a different PR).
Please add a note to the NEWS file.
As an aside, I've experienced that tuning the SVM cost with linear kernels to be fruitless. With caret
, I always had identical performance across different cost values. Not sure why. The same seems to be the case here:
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.2 ──
#> ✓ broom 0.7.3 ✓ recipes 0.1.15.9000
#> ✓ dials 0.0.9.9000 ✓ rsample 0.0.8.9001
#> ✓ dplyr 1.0.3 ✓ tibble 3.0.6
#> ✓ ggplot2 3.3.3 ✓ tidyr 1.1.2
#> ✓ infer 0.5.3 ✓ tune 0.1.2.9000
#> ✓ modeldata 0.1.0.9000 ✓ workflows 0.2.1
#> ✓ parsnip 0.1.5 ✓ yardstick 0.0.7.9000
#> ✓ purrr 0.3.4
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x recipes::step() masks stats::step()
library(doMC)
#> Loading required package: foreach
#>
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#>
#> accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
registerDoMC(cores = 20)
data(two_class_dat)
set.seed(1)
folds <- vfold_cv(two_class_dat)
svm_spec <-
svm_linear(cost = tune()) %>%
set_engine("LiblineaR") %>%
set_mode("classification")
grid <- tibble(cost = 10^(-10:3))
set.seed(1)
svm_res <-
svm_spec %>%
tune_grid(Class ~ ., resamples = folds, grid = grid, metrics = metric_set(accuracy),
control = control_grid(parallel_over = "everything"))
svm_res %>%
collect_metrics() %>%
pluck("mean") %>%
range()
#> [1] 0.8153323 0.8153323
Created on 2021-02-02 by the reprex package (v0.3.0)
This is probably good (since it is reproducible) but I'm not really sure why it is the case.
Merge branch 'master' into liblinear-svm # Conflicts: # DESCRIPTION
Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
Added that error checking for LiblineaR's library(parsnip)
svm_linear() %>%
set_engine("LiblineaR", type = 1) %>%
set_mode("regression") %>%
fit(mpg ~ ., mtcars)
#> Error: The LiblineaR engine argument of `type` = 1 does not correspond to an SVM regression model.
svm_linear() %>%
set_engine("LiblineaR", type = "potato") %>%
set_mode("regression") %>%
fit(mpg ~ ., mtcars)
#> Error: The LiblineaR engine argument of `type` = potato does not correspond to an SVM regression model. Created on 2021-02-04 by the reprex package (v1.0.0) |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
This PR adds a new model
svm_linear()
with the LiblineaR engine.The LiblineaR package does support sparse matrices for modeling but unfortunately 😩 not using the Matrix package. Instead it uses the sparseM package, and I don't think this is going to work out well for us. The sparse matrix implementation in sparseM does not keep column names, for example. Overall, it is not as nice and our current infrastructure does not support it. Right now, I have set
allow_sparse_x = FALSE
for this new model. I wonder if the LiblineaR maintainers would be open to supporting Matrix sparse matrices instead of sparseM ones? They are not on GitHub. We can think about that more.For the future, adding LiblineaR as an engine will mean we can use it as another engine for logistic regression as well. 🤠
🎯 The different types of SVM can be specified right now as an engine-specific parameter. I probably should add some error checking for classification vs. regression and that
type
argument.None of the SVM models support class probabilities.
Created on 2021-02-01 by the reprex package (v1.0.0)
There are several options for regression SVMs.
Created on 2021-02-01 by the reprex package (v1.0.0)