You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vroom doesn't fail, stop, or raise any errors when a file has a row with more columns than expected. Instead, any remaining values (separator and all) are forced into the final column of the output. A warning is given, but it's cryptic.
> vroom::vroom("test.tsv")
Rows: 3 Columns: 4
── Column specification ──────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): chr, num2, num3
dbl (1): num
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 3 × 4
num chr num2 num3
<dbl> <chr> <chr> <chr>
1 1 charab 123 "434"
2 2 charact NA "345\t2345"
3 3 NA chaaa "3123\t1231"
Warning message:
One or more parsing issues, see `problems()` for details
> problems()
Error in vroom_materialize(x, replace = FALSE) :
argument "x" is missing, with no default
The output I would expect is a more descriptive error, like data.table::fread() gives:
num chr num2 num3
1: 1 charab 123 434
Warning message:
In data.table::fread("test.tsv") :
Stopped early on line 3. Expected 4 fields but found 5. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<2 charact 345 2345>>
And either raising an error, discarding the offending rows, or stopping the read after the first offending row.
The text was updated successfully, but these errors were encountered:
Upon reading into problems(), and passing the returned array as an argument, the error descriptions are sufficient, though a big obfuscated for my tastes. It would still be nice to have the offending rows dealt with in some other way then forcing all values into the last column.
Your data would actually be read correctly with readr::read_table() which handles whitespace delimited files with any number of whitespace characters between columns. Unfortunately, we are not currently pursuing replicating this feature in vroom (see #186).
text<-glue::glue(
'x\ty\tz\n1\t2\t\t3\n4\t\t5\t6\n')
tf<-withr::local_tempfile(lines=text)
# read_table() handles this messy datareadr::read_table(tf, show_col_types=FALSE)
#> # A tibble: 2 × 3#> x y z#> <dbl> <dbl> <dbl>#> 1 1 2 3#> 2 4 5 6
Created on 2022-08-26 by the reprex package (v2.0.1.9000)
Vroom doesn't fail, stop, or raise any errors when a file has a row with more columns than expected. Instead, any remaining values (separator and all) are forced into the final column of the output. A warning is given, but it's cryptic.
Take this tsv file:
The function used and following result:
The output I would expect is a more descriptive error, like data.table::fread() gives:
And either raising an error, discarding the offending rows, or stopping the read after the first offending row.
The text was updated successfully, but these errors were encountered: