Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernlab engine for svm_linear() #438

Merged
merged 4 commits into from
Mar 3, 2021
Merged

Conversation

juliasilge
Copy link
Member

Closes #336

This PR adds a second engine for svm_linear(). We already have "LiblineaR" and this PR adds "kernlab".

library(tidymodels)

data(two_class_dat, package = "modeldata")
example_split <- initial_split(two_class_dat, prop = 0.99)
example_train <- training(example_split)
example_test  <-  testing(example_split)

set.seed(123)
mod <- svm_linear() %>%
  set_engine("kernlab") %>%
  set_mode("classification") %>%
  fit(Class ~ ., example_train)
#>  Setting default kernel parameters

mod
#> parsnip model object
#> 
#> Fit time:  856ms 
#> Support Vector Machine object of class "ksvm" 
#> 
#> SV type: C-svc  (classification) 
#>  parameter : cost C = 1 
#> 
#> Linear (vanilla) kernel function. 
#> 
#> Number of Support Vectors : 358 
#> 
#> Objective Function Value : -355.0963 
#> Training error : 0.179847 
#> Probability model included.

predict(mod, new_data = example_test)
#> # A tibble: 7 x 1
#>   .pred_class
#>   <fct>      
#> 1 Class2     
#> 2 Class1     
#> 3 Class2     
#> 4 Class1     
#> 5 Class1     
#> 6 Class1     
#> 7 Class2
predict(mod, new_data = example_test, type = "prob")
#> # A tibble: 7 x 2
#>   .pred_Class1 .pred_Class2
#>          <dbl>        <dbl>
#> 1        0.457       0.543 
#> 2        0.850       0.150 
#> 3        0.172       0.828 
#> 4        0.980       0.0203
#> 5        0.698       0.302 
#> 6        0.983       0.0166
#> 7        0.439       0.561
predict(mod, new_data = example_test, type = "raw")
#> [1] Class2 Class1 Class2 Class1 Class1 Class1 Class2
#> Levels: Class1 Class2

Created on 2021-03-01 by the reprex package (v1.0.0)

Unlike the "LiblineaR" engine, the "kernlab" engine does support class probabilities.

@juliasilge juliasilge requested a review from DavisVaughan March 1, 2021 20:11
R/svm_linear.R Show resolved Hide resolved
})

test_that('engine arguments', {

LiblineaR_type <- svm_linear(mode = "regression") %>% set_engine("LiblineaR", type = 12)
kernlab_cv <- svm_linear(mode = "regression") %>% set_engine("kernlab", cross = 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this cross arg seems to do internal cross validation, using accuracy as a metric for classification. When doing binary classification, do you happen to know if it is considering the first or second level as the event level? Does that matter at all here?

Copy link
Member Author

@juliasilge juliasilge Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is doing the same thing as what parsnip is doing, with no problems like what xgboost had:

library(parsnip)
library(tidyverse)
library(kernlab)
#> 
#> Attaching package: 'kernlab'
#> The following object is masked from 'package:purrr':
#> 
#>     cross
#> The following object is masked from 'package:ggplot2':
#> 
#>     alpha

data("PimaIndiansDiabetes", package = "mlbench")
df <- PimaIndiansDiabetes %>%
  mutate(diabetes = fct_relevel(diabetes, 'pos'))


set.seed(234)
parsnip_fit <- svm_linear(mode = "classification") %>% 
  set_engine("kernlab", cross = 10) %>%
  fit(diabetes ~ ., df)
#>  Setting default kernel parameters

set.seed(234)
kernlab_fit <- ksvm(diabetes ~ ., data = df, kernel = "vanilladot", cross = 10)
#>  Setting default kernel parameters

parsnip_fit
#> parsnip model object
#> 
#> Fit time:  833ms 
#> Support Vector Machine object of class "ksvm" 
#> 
#> SV type: C-svc  (classification) 
#>  parameter : cost C = 1 
#> 
#> Linear (vanilla) kernel function. 
#> 
#> Number of Support Vectors : 401 
#> 
#> Objective Function Value : -396.4286 
#> Training error : 0.226562 
#> Cross validation error : 0.233117 
#> Probability model included.
kernlab_fit
#> Support Vector Machine object of class "ksvm" 
#> 
#> SV type: C-svc  (classification) 
#>  parameter : cost C = 1 
#> 
#> Linear (vanilla) kernel function. 
#> 
#> Number of Support Vectors : 401 
#> 
#> Objective Function Value : -396.4286 
#> Training error : 0.226562 
#> Cross validation error : 0.233117

identical(parsnip_fit$fit@alpha, kernlab_fit@alpha)
#> [1] TRUE

Created on 2021-03-03 by the reprex package (v1.0.0)

Created on 2021-03-03 by the reprex package (v1.0.0)

If there is a problem (I don't think there is), it would apply to all the kernlab engines and we should open a new issue and fix it in a new PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, just wanted to check

@juliasilge juliasilge merged commit 65a5ab8 into master Mar 3, 2021
@juliasilge juliasilge deleted the kernlab-linear-svm branch March 3, 2021 17:55
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linear svm models
2 participants