Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delim = "" should generate clear error #557

Closed
cboettig opened this issue Nov 29, 2016 · 3 comments
Closed

delim = "" should generate clear error #557

cboettig opened this issue Nov 29, 2016 · 3 comments
Labels
feature a feature request or enhancement read 📖

Comments

@cboettig
Copy link
Contributor

Consider this minimal example with a classic CO2 dataset:

base R version works fine:

co2 <- read.delim("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
                  sep = "", comment = "#", 
                  col.names = c("year", "month", "decimal_date", "average", "interpolated", "trend", "days"),
                  na.strings = c("-1", "-99.99"))
co2 %>% head()

readr function not so much

co2 <- read_delim("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", trim_ws = TRUE,
                  delim = "", comment = "#", 
                  col_names = c("year", "month", "decimal_date", "average", "interpolated", "trend", "days"),
                  col_types = c("iiddddi"),
                  na = c("-1", "-99.99"))
co2 %>% head()

The problem seems to be in the file being whitespace delimited, read.delim seems to interpret sep="" somewhat surprisingly (but conveniently in this case) as "any number of spaces". read_delim does not.

I haven't figured out a way to parse this file with readr functions, though I could be missing something obvious. It seems like an option for delim_whitespace (as in pandas), or perhaps better, the ability to use regex expressions for delimiters would help?

(A bit unrelated, but it might also be convenient for the comment symbol to permit regex patterns?)

@lukas-rokka
Copy link

Use ´readr::read_table()` for whitespace separated columns.

@cboettig
Copy link
Contributor Author

@lukas-rokka Thanks, that's great. Unfortunately it looks like read_table lacks an argument for comment symbol though -- is there a good reason for this or could it be added?

(of course one could use skip but having to count comment lines is obviously not ideal).

@cboettig
Copy link
Contributor Author

cboettig commented Dec 6, 2016

proposed fix in PR #563

@hadley hadley changed the title unexpected behavior of delim in read_delim delim = "" should generate clear error Dec 22, 2016
@hadley hadley added feature a feature request or enhancement read 📖 labels Dec 22, 2016
@lock lock bot locked and limited conversation to collaborators Sep 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement read 📖
Projects
None yet
Development

No branches or pull requests

3 participants