New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple comment symbols #766

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
1 participant
@zeehio
Copy link
Contributor

zeehio commented Dec 17, 2017

This pull request allows to provide more than one comment pattern to readr. Example:

readr::read_delim(c(
  "# A comment",
  "x,y",
  "1,2",
  "// another comment",
  "3,4"), delim = ",", comment = c("#", "//"))

It is applied on top of this other PR #677, so merging this will be easier once that PR is reviewed.

zeehio added some commits Dec 17, 2017

Provide encoding to datasource()
If this encoding is ambiguous in endianness (like UTF-16 or UTF-32 which mandate
a Byte Order Mark) the BOM is detected and skipped (as before) and the encoding
is updated to reflect the endianness (UTF-16LE, UTF16-BE...)
Allow multiple comment patterns
This commit allows users to define multiple comments using a character vector.

Additionally, comment detection is now handled exclusively by the datasource, before it was splitted between both the datasource and the tokenizer.

The `comment` argument in the tokenizer functions is deprecated and will be removed in future versions.

@zeehio zeehio force-pushed the zeehio:multiple_comment_prefixes branch from 828d243 to b5760c3 Dec 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment