Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot specify col_types for columns with missing names #311

Closed
stephenbalogun opened this issue Feb 13, 2021 · 3 comments
Closed

Cannot specify col_types for columns with missing names #311

stephenbalogun opened this issue Feb 13, 2021 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@stephenbalogun
Copy link

I have a relatively heavy .csv file (over 110MB) that I routinely analyze and use the vroom() for data import. Recently, I decided to be explicit with the col_types parameter but vroom keeps throwing an error because one of the column names is missing. I noticed that this does not happen with the readr::read_csv() but it is significantly slower. Would be helpful to have this addressed also in vroom. Thank you.

library(vroom)
vroom("x,\n1,2\n3,4", delim = ",", col_types = cols(
    x = col_double(),
    `...2` = col_double()
))
#> New names:                                                                                                             
#> * NA -> ...2
#> Error: Invalid input type, expected 'list' actual 'NULL'
#> In addition: Warning message:
#> The following named parsers don't match the column names: ...2 

library(readr)
read_csv("x,\n1,2\n3,4", col_types = cols(
    x = col_double(),
    `X2` = col_double()
))

#> # A tibble: 2 x 2
#>     x    X2
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4
#> Warning message:
#> Missing column names filled in: 'X2' [2] 
@jimhester jimhester added the bug an unexpected problem or unintended behavior label Feb 16, 2021
@jimhester
Copy link
Collaborator

Thanks for opening the issue and a reproducible example! I will see if we can fix this.

A workaround is to specify the types without names., e.g.

vroom::vroom("x,\n1,2\n3,4", delim = ",", col_types = "dd")
#> New names:
#> * `` -> ...2
#> # A tibble: 2 x 2
#>       x  ...2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     3     4

Created on 2021-02-16 by the reprex package (v1.0.0)

@stephenbalogun
Copy link
Author

Thank you for the feedback. Your recommendation is exactly what I am currently doing. The only challenge with that is that the number of columns in my datafile keeps changing when new variables are added. In the original script, I used the cols_only() function with the col_types argument so my code does not break when newer variables are added e.g.

# this works
vroom("x,y\n1,2\n3,4", delim = ",", col_types = cols_only(
   x = col_double(),
   y = col_double()
))
#> # A tibble: 2 x 2                                                                                                           
#>      x     y
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4


# when additional columns are added, it still works!
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = cols_only(
   x = col_double(),
   y = col_double()
))
#> # A tibble: 2 x 2                                                                                                           
#>      x     y
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4

# unlike this when  the number of variables change
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = "dd")
#> Error: Unnamed `col_types` must have the same length as `col_names`. 

I hope that your team is able to fix this soon.

Again, thank you for providing an incredibly fast package for importing flat files that works well with other tidyverse packages.

@jimhester
Copy link
Collaborator

Thank you for opening the issue and for supplying a reproducible example, it is a big help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants