Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_embed() "argument is of length zero" error #49

Closed
quantumlinguist opened this issue Jul 4, 2020 · 18 comments
Closed

step_embed() "argument is of length zero" error #49

quantumlinguist opened this issue Jul 4, 2020 · 18 comments

Comments

@quantumlinguist
Copy link

I'm trying to learn how to do entity embedding for categorical variables but I keep getting this error and I can't figure out why.

rec <- recipe(Case~AnimacyObj+ Participants+ AgencySubj, data=causee)%>%
step_embed(AnimacyObj,Participants, AgencySubj,
outcome=vars(Case),
num_terms=3,
hidden_units=10,
options= embed_control(epochs = 25, validation_split=0.2))%>%
prep()

Error in if (is.na(b)) return(1L) : argument is of length zero

@juliasilge juliasilge transferred this issue from tidymodels/tidymodels.org-legacy Jul 4, 2020
@juliasilge
Copy link
Member

That sounds frustrating! Can you create a reprex, a small reproducible example, so that we can find the source of your problem? If you haven't created a reprex before, this is a helpful introduction.

@quantumlinguist
Copy link
Author

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(embed)
rec <-  recipe(Case~AnimacyObj+ Participants+ AgencySubj, data=causee)%>%
  step_embed(Participants,
             outcome=vars(Case),
             num_terms=1,
             hidden_units=10,
             options = embed_control(epochs = 25, validation_split=0.2)) %>%
  prep()
#> Error in is_tibble(data): object 'causee' not found

Created on 2020-07-05 by the reprex package (v0.3.0)

@quantumlinguist
Copy link
Author

I see now it says my dataset is not a tibble? So I should just transform it?

@juliasilge
Copy link
Member

No, in that reprex, you haven't loaded any data at all, so that's an error for not finding causee. Check out this section and this section about possibilities for getting some example data into the reprex.

@quantumlinguist
Copy link
Author

I can't get my dataset to be found. It certainly is loaded.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(embed)
library(datapasta)
dpasta(causee)
#> Error in is_tibble(input): object 'causee' not found
rec <-  recipe(Case~AnimacyObj+ Participants+ AgencySubj, data=causee)%>%
  step_embed(Participants,
             outcome=vars(Case),
             num_terms=1,
             hidden_units=10,
             options = embed_control(epochs = 25, validation_split=0.2)) %>%
  prep()
#> Error in is_tibble(data): object 'causee' not found

Created on 2020-07-05 by the reprex package (v0.3.0)

@juliasilge
Copy link
Member

Take a look at the animated GIF in this section, and notice when you get out some output that can be pasted into a reprex (the wings object).

The idea here, which this explains really well if you can take some time to look through the slides, is to create a self-contained, rigorous example (including containing insides it a hopefully small dataset that reproduces your problem) so that others can understand your problem.

@quantumlinguist
Copy link
Author

Thanks, here is what I got now.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(embed)
library(datapasta)

str(mini_df)
#> Error in str(mini_df): object 'mini_df' not found
dpasta(mini_df)
#> Error in is_tibble(input): object 'mini_df' not found
mini_df=tibble::tribble(
     ~Case.Participants.AnimacyObj.AgencySubj,
  "1  DAT            1          1          1",
  "2  DAT            1          1          0",
  "3  ACC            1          1          1",
  "4  ACC            1          1          0",
  "5  ACC            1          1          0",
  "6  ACC            0          1          0"
  )
head(mini_df)
#> # A tibble: 6 x 1
#>   Case.Participants.AnimacyObj.AgencySubj  
#>   <chr>                                    
#> 1 1  DAT            1          1          1
#> 2 2  DAT            1          1          0
#> 3 3  ACC            1          1          1
#> 4 4  ACC            1          1          0
#> 5 5  ACC            1          1          0
#> 6 6  ACC            0          1          0


rec <-  recipe(Case~AnimacyObj+ Participants+ AgencySubj, data=mini_df)%>%
  step_embed(Participants,
             outcome=vars(Case),
             num_terms=3,
             hidden_units=10,
             options = embed_control(epochs = 25, validation_split=0.2)) %>%
  prep()
#> Error in eval(predvars, data, env): object 'Case' not found

Created on 2020-07-05 by the reprex package (v0.3.0)

@topepo
Copy link
Member

topepo commented Jul 5, 2020

Can you run sessioninfo::session_info() after one of the failures (or use reprex::reprex(si = TRUE))?

@quantumlinguist
Copy link
Author

Here it is

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(embed)
library(datapasta)

causee= read.csv("causee_only_data.csv", sep=";")
#> Warning in file(file, "rt"): cannot open file 'causee_only_data.csv': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
mini_df= select(causee,Case, Participants, AnimacyObj, AgencySubj)%>%
  mutate_if(is.character, factor)
#> Error in select(causee, Case, Participants, AnimacyObj, AgencySubj): object 'causee' not found
rec <-  recipe(Case~., data=mini_df)%>%
  step_embed(Participants,
             outcome=vars(Case),
             num_terms=3,
             hidden_units=10,
             options = embed_control(epochs = 25, validation_split=0.2)) %>%
  prep(training=mini_df)
#> Error in is_tibble(data): object 'mini_df' not found

Created on 2020-07-05 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Oslo                 
#>  date     2020-07-05                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date       lib source                             
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                     
#>  backports      1.1.8      2020-06-17 [1] CRAN (R 3.6.2)                     
#>  base64enc      0.1-3      2015-07-28 [1] CRAN (R 3.6.0)                     
#>  bayesplot      1.7.2.9000 2020-06-21 [1] Github (stan-dev/bayesplot@7709e43)
#>  boot           1.3-24     2019-12-20 [2] CRAN (R 3.6.3)                     
#>  callr          3.4.3      2020-03-28 [1] CRAN (R 3.6.2)                     
#>  class          7.3-15     2019-01-01 [2] CRAN (R 3.6.3)                     
#>  cli            2.0.2      2020-02-28 [1] CRAN (R 3.6.0)                     
#>  codetools      0.2-16     2018-12-24 [2] CRAN (R 3.6.3)                     
#>  colorspace     1.4-1      2019-03-18 [1] CRAN (R 3.6.0)                     
#>  colourpicker   1.0        2017-09-27 [1] CRAN (R 3.6.0)                     
#>  crayon         1.3.4      2017-09-16 [1] CRAN (R 3.6.0)                     
#>  crosstalk      1.1.0.1    2020-03-13 [1] CRAN (R 3.6.0)                     
#>  datapasta    * 3.1.0      2020-01-17 [1] CRAN (R 3.6.0)                     
#>  desc           1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                     
#>  devtools       2.3.0      2020-04-10 [1] CRAN (R 3.6.2)                     
#>  digest         0.6.25     2020-02-23 [1] CRAN (R 3.6.0)                     
#>  dplyr        * 1.0.0      2020-05-29 [1] CRAN (R 3.6.2)                     
#>  DT             0.13       2020-03-23 [1] CRAN (R 3.6.0)                     
#>  dygraphs       1.1.1.6    2018-07-11 [1] CRAN (R 3.6.0)                     
#>  ellipsis       0.3.1      2020-05-15 [1] CRAN (R 3.6.2)                     
#>  embed        * 0.1.1      2020-07-03 [1] CRAN (R 3.6.3)                     
#>  evaluate       0.14       2019-05-28 [1] CRAN (R 3.6.0)                     
#>  fansi          0.4.1      2020-01-08 [1] CRAN (R 3.6.0)                     
#>  fastmap        1.0.1      2019-10-08 [1] CRAN (R 3.6.0)                     
#>  fs             1.4.1      2020-04-04 [1] CRAN (R 3.6.3)                     
#>  generics       0.0.2      2018-11-29 [1] CRAN (R 3.6.0)                     
#>  ggplot2        3.3.2      2020-06-19 [1] CRAN (R 3.6.3)                     
#>  ggridges       0.5.2      2020-01-12 [1] CRAN (R 3.6.0)                     
#>  glue           1.4.1      2020-05-13 [1] CRAN (R 3.6.2)                     
#>  gower          0.2.1      2019-05-14 [1] CRAN (R 3.6.0)                     
#>  gridExtra      2.3        2017-09-09 [1] CRAN (R 3.6.0)                     
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 3.6.0)                     
#>  gtools         3.8.2      2020-03-31 [1] CRAN (R 3.6.2)                     
#>  highr          0.8        2019-03-20 [1] CRAN (R 3.6.0)                     
#>  htmltools      0.5.0      2020-06-16 [1] CRAN (R 3.6.2)                     
#>  htmlwidgets    1.5.1      2019-10-08 [1] CRAN (R 3.6.0)                     
#>  httpuv         1.5.4      2020-06-06 [1] CRAN (R 3.6.2)                     
#>  igraph         1.2.5      2020-03-19 [1] CRAN (R 3.6.0)                     
#>  inline         0.3.15     2018-05-18 [1] CRAN (R 3.6.0)                     
#>  ipred          0.9-9      2019-04-28 [1] CRAN (R 3.6.0)                     
#>  jsonlite       1.6.1      2020-02-02 [1] CRAN (R 3.6.0)                     
#>  keras          2.3.0.0    2020-05-19 [1] CRAN (R 3.6.2)                     
#>  knitr          1.28       2020-02-06 [1] CRAN (R 3.6.0)                     
#>  later          1.1.0.1    2020-06-05 [1] CRAN (R 3.6.2)                     
#>  lattice        0.20-38    2018-11-04 [2] CRAN (R 3.6.3)                     
#>  lava           1.6.7      2020-03-05 [1] CRAN (R 3.6.0)                     
#>  lifecycle      0.2.0      2020-03-06 [1] CRAN (R 3.6.0)                     
#>  lme4           1.1-23     2020-04-07 [1] CRAN (R 3.6.2)                     
#>  loo            2.2.0      2019-12-19 [1] CRAN (R 3.6.0)                     
#>  lubridate      1.7.9      2020-06-08 [1] CRAN (R 3.6.2)                     
#>  magrittr       1.5        2014-11-22 [1] CRAN (R 3.6.0)                     
#>  markdown       1.1        2019-08-07 [1] CRAN (R 3.6.0)                     
#>  MASS           7.3-51.5   2019-12-20 [2] CRAN (R 3.6.3)                     
#>  Matrix         1.2-18     2019-11-27 [1] CRAN (R 3.6.0)                     
#>  matrixStats    0.56.0     2020-03-13 [1] CRAN (R 3.6.0)                     
#>  memoise        1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                     
#>  mime           0.9        2020-02-04 [1] CRAN (R 3.6.0)                     
#>  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 3.6.0)                     
#>  minqa          1.2.4      2014-10-09 [1] CRAN (R 3.6.0)                     
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 3.6.0)                     
#>  nlme           3.1-144    2020-02-06 [2] CRAN (R 3.6.3)                     
#>  nloptr         1.2.2.1    2020-03-11 [1] CRAN (R 3.6.0)                     
#>  nnet           7.3-14     2020-04-26 [1] CRAN (R 3.6.2)                     
#>  pillar         1.4.4      2020-05-05 [1] CRAN (R 3.6.2)                     
#>  pkgbuild       1.0.8      2020-05-07 [1] CRAN (R 3.6.2)                     
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 3.6.0)                     
#>  pkgload        1.1.0      2020-05-29 [1] CRAN (R 3.6.2)                     
#>  plyr           1.8.6      2020-03-03 [1] CRAN (R 3.6.0)                     
#>  prettyunits    1.1.1      2020-01-24 [1] CRAN (R 3.6.0)                     
#>  processx       3.4.2      2020-02-09 [1] CRAN (R 3.6.0)                     
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 3.6.0)                     
#>  promises       1.1.1      2020-06-09 [1] CRAN (R 3.6.2)                     
#>  ps             1.3.3      2020-05-08 [1] CRAN (R 3.6.2)                     
#>  purrr          0.3.4      2020-04-17 [1] CRAN (R 3.6.2)                     
#>  R6             2.4.1      2019-11-12 [1] CRAN (R 3.6.0)                     
#>  Rcpp           1.0.4.6    2020-04-09 [1] CRAN (R 3.6.3)                     
#>  RcppParallel   5.0.1      2020-05-06 [1] CRAN (R 3.6.2)                     
#>  recipes      * 0.1.13     2020-06-23 [1] CRAN (R 3.6.2)                     
#>  remotes        2.1.1      2020-02-15 [1] CRAN (R 3.6.0)                     
#>  reshape2       1.4.4      2020-04-09 [1] CRAN (R 3.6.2)                     
#>  reticulate     1.16       2020-05-27 [1] CRAN (R 3.6.2)                     
#>  rlang          0.4.6      2020-05-02 [1] CRAN (R 3.6.2)                     
#>  rmarkdown      2.3        2020-06-18 [1] CRAN (R 3.6.2)                     
#>  rpart          4.1-15     2019-04-12 [2] CRAN (R 3.6.3)                     
#>  rprojroot      1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                     
#>  rsconnect      0.8.16     2019-12-13 [1] CRAN (R 3.6.3)                     
#>  rstan          2.19.3     2020-02-11 [1] CRAN (R 3.6.0)                     
#>  rstanarm       2.19.2     2019-10-03 [1] CRAN (R 3.6.0)                     
#>  rstantools     2.0.0      2019-09-15 [1] CRAN (R 3.6.0)                     
#>  scales         1.1.1      2020-05-11 [1] CRAN (R 3.6.2)                     
#>  sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                     
#>  shiny          1.4.0.2    2020-03-13 [1] CRAN (R 3.6.0)                     
#>  shinyjs        1.1        2020-01-13 [1] CRAN (R 3.6.0)                     
#>  shinystan      2.5.0      2018-05-01 [1] CRAN (R 3.6.0)                     
#>  shinythemes    1.1.2      2018-11-06 [1] CRAN (R 3.6.0)                     
#>  StanHeaders    2.21.0-5   2020-06-09 [1] CRAN (R 3.6.2)                     
#>  statmod        1.4.34     2020-02-17 [1] CRAN (R 3.6.0)                     
#>  stringi        1.4.6      2020-02-17 [1] CRAN (R 3.6.0)                     
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                     
#>  survival       3.2-3      2020-06-13 [1] CRAN (R 3.6.2)                     
#>  tensorflow     2.2.0      2020-05-11 [1] CRAN (R 3.6.2)                     
#>  testthat       2.3.2      2020-03-02 [1] CRAN (R 3.6.0)                     
#>  tfruns         1.4        2018-08-25 [1] CRAN (R 3.6.0)                     
#>  threejs        0.3.3      2020-01-21 [1] CRAN (R 3.6.0)                     
#>  tibble         3.0.1      2020-04-20 [1] CRAN (R 3.6.2)                     
#>  tidyr          1.1.0      2020-05-20 [1] CRAN (R 3.6.2)                     
#>  tidyselect     1.1.0      2020-05-11 [1] CRAN (R 3.6.2)                     
#>  timeDate       3043.102   2018-02-21 [1] CRAN (R 3.6.0)                     
#>  usethis        1.6.1      2020-04-29 [1] CRAN (R 3.6.2)                     
#>  uwot           0.1.8      2020-03-16 [1] CRAN (R 3.6.0)                     
#>  vctrs          0.3.1      2020-06-05 [1] CRAN (R 3.6.2)                     
#>  whisker        0.4        2019-08-28 [1] CRAN (R 3.6.0)                     
#>  withr          2.2.0      2020-04-20 [1] CRAN (R 3.6.2)                     
#>  xfun           0.14       2020-05-20 [1] CRAN (R 3.6.2)                     
#>  xtable         1.8-4      2019-04-21 [1] CRAN (R 3.6.0)                     
#>  xts            0.12-0     2020-01-19 [1] CRAN (R 3.6.0)                     
#>  yaml           2.2.1      2020-02-01 [1] CRAN (R 3.6.0)                     
#>  zeallot        0.1.0      2018-01-28 [1] CRAN (R 3.6.0)                     
#>  zoo            1.8-8      2020-05-02 [1] CRAN (R 3.6.2)                     
#> 
#> [1] /Users/ggu020/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

@topepo
Copy link
Member

topepo commented Jul 6, 2020

Can you execute this code and send the output?

library(tidymodels)
library(embed)
library(modeldata)


data(ames, package = "modeldata")

rec <- 
  recipe(Sale_Price ~ MS_SubClass + Neighborhood, data = ames) %>% 
  step_log(Sale_Price, base = 10) %>% 
  step_embed(all_predictors(), outcome = vars(Sale_Price))

rec %>% prep()

tensorflow::tf_version()

rec %>% prep()

@quantumlinguist
Copy link
Author

Yes, thanks for your help by the way.

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom     0.5.6      ✓ recipes   0.1.13
#> ✓ dials     0.0.7      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.0      ✓ tibble    3.0.1 
#> ✓ ggplot2   3.3.2      ✓ tune      0.1.0 
#> ✓ infer     0.5.2      ✓ workflows 0.1.1 
#> ✓ parsnip   0.1.1      ✓ yardstick 0.0.6 
#> ✓ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(embed)
library(modeldata)


data(ames, package = "modeldata")

rec <- 
  recipe(Sale_Price ~ MS_SubClass + Neighborhood, data = ames) %>% 
  step_log(Sale_Price, base = 10) %>% 
  step_embed(all_predictors(), outcome = vars(Sale_Price))

rec %>% prep()
#> Error in if (is.na(b)) return(1L): argument is of length zero

tensorflow::tf_version()
#> NULL

rec %>% prep()
#> Error in if (is.na(b)) return(1L): argument is of length zero

Created on 2020-07-06 by the reprex package (v0.3.0)

@topepo
Copy link
Member

topepo commented Jul 6, 2020

Can you run this command twice and send the results?

tensorflow::tf_config()

@quantumlinguist
Copy link
Author

tensorflow::tf_config()
#> Installation of TensorFlow not found.
#> 
#> Python environments searched for 'tensorflow' package:
#>  /Users/ggu020/Library/r-miniconda/envs/r-reticulate/bin/python3.6
#> 
#> You can install TensorFlow using the install_tensorflow() function.
#> 

Created on 2020-07-06 by the reprex package (v0.3.0)

tensorflow::tf_config()
#> Installation of TensorFlow not found.
#> 
#> Python environments searched for 'tensorflow' package:
#>  /Users/ggu020/Library/r-miniconda/envs/r-reticulate/bin/python3.6
#> 
#> You can install TensorFlow using the install_tensorflow() function.
#> 

Created on 2020-07-06 by the reprex package (v0.3.0)

@topepo
Copy link
Member

topepo commented Jul 6, 2020

You can install TensorFlow using the install_tensorflow() function... so give that a try and let us know if it works.

@quantumlinguist
Copy link
Author

quantumlinguist commented Jul 6, 2020

I followed the step of installing tensor flow and it works now. I had installed it via install.packages(), I didn't know it required a different method. Thanks for your help!

@topepo
Copy link
Member

topepo commented Jul 6, 2020

Sooo... here's why this is confusing (for everyone). There is a bug in the CRAN version of reticulate that will not find the python install the first time you ask for it. That is fixed in the devel version.

For some reason, this does work in a reprex. I should have been more clear; it needed to be run (without reprex) in a new R session twice.

The overall solution is to install the version of reticulate that is on Github.

@quantumlinguist
Copy link
Author

Ohh I see! Ok, thanks for the clarification!

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants