Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tune_bayes() error: Gaussian process model: data length exceeds size of matrix #269

Closed
jcpsantiago opened this issue Aug 30, 2020 · 4 comments
Labels
feature a feature request or enhancement

Comments

@jcpsantiago
Copy link

jcpsantiago commented Aug 30, 2020

The problem

I'm having trouble with using tune_bayes() in my pipeline. I realized it also fails with the example found in the docs https://www.tidymodels.org/learn/work/bayes-opt/, with the exact same error.

Reproducible example

library(tidymodels)
#> ── Attaching packages ───────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0      ✓ recipes   0.1.13
#> ✓ dials     0.0.8      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.1      ✓ tibble    3.0.3 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.1 
#> ✓ infer     0.5.3      ✓ tune      0.1.1 
#> ✓ modeldata 0.0.2      ✓ workflows 0.1.3 
#> ✓ parsnip   0.1.3      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ──────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(modeldata)

# Load data
data(cells)

set.seed(2369)
tr_te_split <- initial_split(cells %>% select(-case), prop = 3/4)
cell_train <- training(tr_te_split)
cell_test  <- testing(tr_te_split)

set.seed(1697)
folds <- vfold_cv(cell_train, v = 10)

library(themis)
#> Registered S3 methods overwritten by 'themis':
#>   method               from   
#>   bake.step_downsample recipes
#>   bake.step_upsample   recipes
#>   prep.step_downsample recipes
#>   prep.step_upsample   recipes
#>   tidy.step_downsample recipes
#>   tidy.step_upsample   recipes
#> 
#> Attaching package: 'themis'
#> The following objects are masked from 'package:recipes':
#> 
#>     step_downsample, step_upsample, tunable.step_downsample,
#>     tunable.step_upsample

cell_pre_proc <-
  recipe(class ~ ., data = cell_train) %>%
  step_YeoJohnson(all_predictors()) %>%
  step_normalize(all_predictors()) %>%
  step_pca(all_predictors(), num_comp = tune()) %>%
  step_downsample(class)

svm_mod <-
  svm_rbf(mode = "classification", cost = tune(), rbf_sigma = tune()) %>%
  set_engine("kernlab")

svm_wflow <-
  workflow() %>%
  add_model(svm_mod) %>%
  add_recipe(cell_pre_proc)

svm_set <- parameters(svm_wflow)
svm_set
#> Collection of 3 parameters for tuning
#> 
#>         id parameter type object class
#>       cost           cost    nparam[+]
#>  rbf_sigma      rbf_sigma    nparam[+]
#>   num_comp       num_comp    nparam[+]

svm_set <- 
  svm_set %>% 
  update(num_comp = num_comp(c(0L, 20L)))

set.seed(12)
search_res <-
  svm_wflow %>% 
  tune_bayes(
    resamples = folds,
    # To use non-default parameter ranges
    param_info = svm_set,
    # Generate five at semi-random to start
    initial = 1,
    iter = 50,
    # How to measure performance?
    metrics = metric_set(roc_auc),
    control = control_bayes(no_improve = 30, verbose = TRUE)
  )
#> 
#> >  Generating a set of 1 initial parameter results
#> ✓ Initialization complete
#> 
#> Optimizing roc_auc using the expected improvement
#> 
#> ── Iteration 1 ────────────────────────────────────────────────────────────────────────
#> 
#> i Current best:      roc_auc=0.8678 (@iter 0)
#> i Gaussian process model
#> ! Gaussian process model: X should be in range (0, 1), data length exceeds...
#> x Gaussian process model: Error in GP_deviance(beta = row, X = X, Y = Y, n...
#> ! An error occurred when creating candidates parameters:  Error in GP_deviance(beta = row, X = X, Y = Y, nug_thres = nug_thres,  : 
#>   Infinite values of the Deviance Function, 
#>             unable to find optimum parameters
#> x Skipping to next iteration
#> Error in eval(expr, p): no loop for break/next, jumping to top level
#> x Optimization stopped prematurely; returning current results.

Created on 2020-08-30 by the reprex package (v0.3.0)

Session info
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#>  [1] themis_0.1.2     yardstick_0.0.7  workflows_0.1.3  tune_0.1.1      
#>  [5] tidyr_1.1.1      tibble_3.0.3     rsample_0.0.7    recipes_0.1.13  
#>  [9] purrr_0.3.4      parsnip_0.1.3    modeldata_0.0.2  infer_0.5.3     
#> [13] ggplot2_3.3.2    dplyr_1.0.1      dials_0.0.8      scales_1.1.1    
#> [17] broom_0.7.0      tidymodels_0.1.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] splines_4.0.2      foreach_1.5.0      prodlim_2019.11.13 assertthat_0.2.1  
#>  [5] highr_0.8          unbalanced_2.0     GPfit_1.0-8        renv_0.11.0       
#>  [9] yaml_2.2.1         globals_0.12.5     ipred_0.9-9        pillar_1.4.6      
#> [13] backports_1.1.8    lattice_0.20-41    glue_1.4.1         pROC_1.16.2       
#> [17] digest_0.6.25      checkmate_2.0.0    hardhat_0.1.4      colorspace_1.4-1  
#> [21] mlr_2.17.1         htmltools_0.5.0    Matrix_1.2-18      plyr_1.8.6        
#> [25] timeDate_3043.102  pkgconfig_2.0.3    lhs_1.0.2          DiceDesign_1.8-1  
#> [29] listenv_0.8.0      parallelMap_1.5.0  RANN_2.6.1         gower_0.2.2       
#> [33] lava_1.6.7         generics_0.0.2     ellipsis_0.3.1     withr_2.2.0       
#> [37] furrr_0.1.0        ROSE_0.0-3         nnet_7.3-14        cli_2.0.2         
#> [41] survival_3.2-3     magrittr_1.5       crayon_1.3.4       evaluate_0.14     
#> [45] future_1.18.0      fansi_0.4.1        doParallel_1.0.15  MASS_7.3-51.6     
#> [49] class_7.3-17       FNN_1.1.3          data.table_1.13.0  tools_4.0.2       
#> [53] BBmisc_1.11        lifecycle_0.2.0    stringr_1.4.0      kernlab_0.9-29    
#> [57] munsell_0.5.0      compiler_4.0.2     rlang_0.4.7        grid_4.0.2        
#> [61] iterators_1.0.12   rstudioapi_0.11    rmarkdown_2.3      gtable_0.3.0      
#> [65] codetools_0.2-16   R6_2.4.1           ParamHelpers_1.14  lubridate_1.7.9   
#> [69] knitr_1.29         fastmatch_1.1-0    stringi_1.4.6      parallel_4.0.2    
#> [73] Rcpp_1.0.5         vctrs_0.3.2        rpart_4.1-15       tidyselect_1.1.0  
#> [77] xfun_0.16

Setting initial argument to > 1 fixes this issue.

@topepo
Copy link
Member

topepo commented Aug 30, 2020

I think the issue is related to:

    # Generate five at semi-random to start
    initial = 1,

You can't fit a model to one data point (not without getting infinite solutions). I try to make the initial set a minimum of the number of parameters + 1.

Can you try that and let us know the results?

@jcpsantiago
Copy link
Author

Yes that works :) it was not obvious at the time, but now it makes sense of course. I had a wrong understand of the system. Maybe it should have an if guard-rail for distracted people like me?

@topepo
Copy link
Member

topepo commented Aug 30, 2020

That's a good point!

@juliasilge juliasilge added the feature a feature request or enhancement label Aug 31, 2020
@topepo topepo closed this as completed Oct 7, 2020
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants