Cannot specify col_types for columns with missing names #311

stephenbalogun · 2021-02-13T04:57:50Z

I have a relatively heavy .csv file (over 110MB) that I routinely analyze and use the vroom() for data import. Recently, I decided to be explicit with the col_types parameter but vroom keeps throwing an error because one of the column names is missing. I noticed that this does not happen with the readr::read_csv() but it is significantly slower. Would be helpful to have this addressed also in vroom. Thank you.

library(vroom)
vroom("x,\n1,2\n3,4", delim = ",", col_types = cols(
    x = col_double(),
    `...2` = col_double()
))
#> New names:                                                                                                             
#> * NA -> ...2
#> Error: Invalid input type, expected 'list' actual 'NULL'
#> In addition: Warning message:
#> The following named parsers don't match the column names: ...2 

library(readr)
read_csv("x,\n1,2\n3,4", col_types = cols(
    x = col_double(),
    `X2` = col_double()
))

#> # A tibble: 2 x 2
#>     x    X2
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4
#> Warning message:
#> Missing column names filled in: 'X2' [2]

The text was updated successfully, but these errors were encountered:

jimhester · 2021-02-16T14:11:33Z

Thanks for opening the issue and a reproducible example! I will see if we can fix this.

A workaround is to specify the types without names., e.g.

vroom::vroom("x,\n1,2\n3,4", delim = ",", col_types = "dd")
#> New names:
#> * `` -> ...2
#> # A tibble: 2 x 2
#>       x  ...2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     3     4

^{Created on 2021-02-16 by the reprex package (v1.0.0)}

stephenbalogun · 2021-02-16T14:36:26Z

Thank you for the feedback. Your recommendation is exactly what I am currently doing. The only challenge with that is that the number of columns in my datafile keeps changing when new variables are added. In the original script, I used the cols_only() function with the col_types argument so my code does not break when newer variables are added e.g.

# this works
vroom("x,y\n1,2\n3,4", delim = ",", col_types = cols_only(
   x = col_double(),
   y = col_double()
))
#> # A tibble: 2 x 2                                                                                                           
#>      x     y
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4


# when additional columns are added, it still works!
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = cols_only(
   x = col_double(),
   y = col_double()
))
#> # A tibble: 2 x 2                                                                                                           
#>      x     y
#>  <dbl> <dbl>
#> 1     1     2
#> 2     3     4

# unlike this when  the number of variables change
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = "dd")
#> Error: Unnamed `col_types` must have the same length as `col_names`.

I hope that your team is able to fix this soon.

Again, thank you for providing an incredibly fast package for importing flat files that works well with other tidyverse packages.

jimhester · 2021-04-28T17:14:44Z

Thank you for opening the issue and for supplying a reproducible example, it is a big help!

jimhester added the bug an unexpected problem or unintended behavior label Feb 16, 2021

jimhester closed this as completed in b3553f0 Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot specify col_types for columns with missing names #311

Cannot specify col_types for columns with missing names #311

stephenbalogun commented Feb 13, 2021

jimhester commented Feb 16, 2021

stephenbalogun commented Feb 16, 2021

jimhester commented Apr 28, 2021

Cannot specify col_types for columns with missing names #311

Cannot specify col_types for columns with missing names #311

Comments

stephenbalogun commented Feb 13, 2021

jimhester commented Feb 16, 2021

stephenbalogun commented Feb 16, 2021

jimhester commented Apr 28, 2021