Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fwf_positions with start column > end column produces unusual failure behavior. #217

Closed
aaronrudkin opened this issue Feb 27, 2020 · 0 comments

Comments

@aaronrudkin
Copy link

aaronrudkin commented Feb 27, 2020

When calling vroom_fwf with a fwf_positions call where the start column is above the end column, vroom will error by saying "R character strings are limited to 2^31-1 bytes" rather than a useful user facing error. However, if the user specifies a default column type, vroom will correctly read the file, but produce the character string error when attempting to access the illegally specified column or convert the resulting data frame.

The example pastebin file described below contains the following text:

ABCDE12345hello
FGHIJ67890world
library(reprex)
#> Warning: package 'reprex' was built under R version 3.5.2
library(vroom)
#> Warning: package 'vroom' was built under R version 3.5.2
begin_col <- c(1, 6, 11)
end_col <- c(5, 10, 2)
col_names <- c("alpha", "numeral", "error")
positions <- fwf_positions(start = begin_col, 
                           end = end_col, 
                           col_names = col_names)

# Will immediately produce character string error
error_immediately <- vroom_fwf("https://pastebin.com/raw/eRZqfMRd", 
                               col_positions = positions)
#> Error in vroom_fwf_(file, col_positions$begin, col_positions$end, trim_ws = trim_ws, : R character strings are limited to 2^31-1 bytes

error_later <- vroom_fwf("https://pastebin.com/raw/eRZqfMRd", 
                         col_positions = positions,
                         col_types = cols(.default = col_character()))
error_later$alpha # Works fine
#> [1] "ABCDE"
error_later$error # Character string error
#> Error in print.default(x): R character strings are limited to 2^31-1 bytes

Seems to me that this could be fixed in fwf_positions by adding a friendly error if any start > any end (literally stopifnot(all(start < end)) at a minimum), or else fixed downstream in the compiled library.

Tagging @deholliday who alerted me to the bug when he ran into it in an upstream package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant