Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delim = "" should generate clear error #557

Closed
cboettig opened this issue Nov 29, 2016 · 3 comments
Closed

delim = "" should generate clear error #557

cboettig opened this issue Nov 29, 2016 · 3 comments

Comments

@cboettig
Copy link
Contributor

@cboettig cboettig commented Nov 29, 2016

Consider this minimal example with a classic CO2 dataset:

base R version works fine:

co2 <- read.delim("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
                  sep = "", comment = "#", 
                  col.names = c("year", "month", "decimal_date", "average", "interpolated", "trend", "days"),
                  na.strings = c("-1", "-99.99"))
co2 %>% head()

readr function not so much

co2 <- read_delim("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", trim_ws = TRUE,
                  delim = "", comment = "#", 
                  col_names = c("year", "month", "decimal_date", "average", "interpolated", "trend", "days"),
                  col_types = c("iiddddi"),
                  na = c("-1", "-99.99"))
co2 %>% head()

The problem seems to be in the file being whitespace delimited, read.delim seems to interpret sep="" somewhat surprisingly (but conveniently in this case) as "any number of spaces". read_delim does not.

I haven't figured out a way to parse this file with readr functions, though I could be missing something obvious. It seems like an option for delim_whitespace (as in pandas), or perhaps better, the ability to use regex expressions for delimiters would help?

(A bit unrelated, but it might also be convenient for the comment symbol to permit regex patterns?)

@lukas-rokka
Copy link

@lukas-rokka lukas-rokka commented Nov 30, 2016

Use ´readr::read_table()` for whitespace separated columns.

@cboettig
Copy link
Contributor Author

@cboettig cboettig commented Nov 30, 2016

@lukas-rokka Thanks, that's great. Unfortunately it looks like read_table lacks an argument for comment symbol though -- is there a good reason for this or could it be added?

(of course one could use skip but having to count comment lines is obviously not ideal).

@cboettig
Copy link
Contributor Author

@cboettig cboettig commented Dec 6, 2016

proposed fix in PR #563

@hadley hadley changed the title unexpected behavior of delim in read_delim delim = "" should generate clear error Dec 22, 2016
jimhester added a commit that referenced this issue Feb 3, 2017
jimhester added a commit that referenced this issue Feb 15, 2017
@lock lock bot locked and limited conversation to collaborators Sep 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants