You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a relatively heavy .csv file (over 110MB) that I routinely analyze and use the vroom() for data import. Recently, I decided to be explicit with the col_types parameter but vroom keeps throwing an error because one of the column names is missing. I noticed that this does not happen with the readr::read_csv() but it is significantly slower. Would be helpful to have this addressed also in vroom. Thank you.
library(vroom)
vroom("x,\n1,2\n3,4", delim = ",", col_types = cols(
x = col_double(),
`...2` = col_double()
))
#> New names:
#> * NA -> ...2
#> Error: Invalid input type, expected 'list' actual 'NULL'
#> In addition: Warning message:
#> The following named parsers don't match the column names: ...2
library(readr)
read_csv("x,\n1,2\n3,4", col_types = cols(
x = col_double(),
`X2` = col_double()
))
#> # A tibble: 2 x 2
#> x X2
#> <dbl> <dbl>
#> 1 1 2
#> 2 3 4
#> Warning message:
#> Missing column names filled in: 'X2' [2]
The text was updated successfully, but these errors were encountered:
Thank you for the feedback. Your recommendation is exactly what I am currently doing. The only challenge with that is that the number of columns in my datafile keeps changing when new variables are added. In the original script, I used the cols_only() function with the col_types argument so my code does not break when newer variables are added e.g.
# this works
vroom("x,y\n1,2\n3,4", delim = ",", col_types = cols_only(
x = col_double(),
y = col_double()
))
#> # A tibble: 2 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 2
#> 2 3 4
# when additional columns are added, it still works!
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = cols_only(
x = col_double(),
y = col_double()
))
#> # A tibble: 2 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 2
#> 2 3 4
# unlike this when the number of variables change
vroom("x,y,z\n1,2,3\n4,5,6", delim = ",", col_types = "dd")
#> Error: Unnamed `col_types` must have the same length as `col_names`.
I hope that your team is able to fix this soon.
Again, thank you for providing an incredibly fast package for importing flat files that works well with other tidyverse packages.
I have a relatively heavy
.csv
file (over 110MB) that I routinely analyze and use thevroom()
for data import. Recently, I decided to be explicit with thecol_types
parameter but vroom keeps throwing an error because one of the column names is missing. I noticed that this does not happen with thereadr::read_csv()
but it is significantly slower. Would be helpful to have this addressed also invroom
. Thank you.The text was updated successfully, but these errors were encountered: