New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement comment skipping for read_fwf() #334

Closed
yportier opened this Issue Dec 12, 2015 · 4 comments

Comments

Projects
None yet
5 participants
@yportier

yportier commented Dec 12, 2015

I have fixed width text files with headers, the size of which (the amount of lines to skip) varies greatly from file to file. I could first read a file and figure out the amount of lines to skip before reloading it with the skip value set, but it feels clumsy and it would be nice to be able to do that in one go. Headers/Comments being often identifiable by the first few characters (in my case, the line starts by the letter H).
It would also be nice to have the possibility to save those skipped lines (the whole header) in a variable or a file at the same time as one may need to parse it further to collect some information from it.

@jennybc

This comment has been minimized.

Member

jennybc commented Dec 12, 2015

I was about to say that the new-ish argument comment should work for the header skipping (as long as data lines don't start with H). See #68, now closed/fixed. But now I see that comment is not (yet?) available for read_fwf().

@yportier

This comment has been minimized.

yportier commented Dec 14, 2015

Something like comment could definitely work, yes.
It may be worth nothing though that in the case of a header, it is not necessary to test every single line as it is clear that after we've encountered the first row of data, there is no more header.

@holstius

This comment has been minimized.

holstius commented May 19, 2016

Two suggestions for interested parties (not just Hadley!)

  1. Assign comments to the returned object using 'comment()<-'.
  2. Design an API for a more general 2-part approach. Maybe 'meta=list(prefix="#", parser=as.character, simplify=FALSE)' would accomplish the above? Support for emerging CSV metadata conventions could be implemented as extensions.

@hadley hadley changed the title from Feature Request: conditional skip to Add comment argument to `read_fwf()` Jun 2, 2016

@hadley hadley changed the title from Add comment argument to `read_fwf()` to Implement comment skipping for read_fwf() Jun 2, 2016

@hadley

This comment has been minimized.

Member

hadley commented Jun 2, 2016

@holstius that's unfortunately rather difficult to do without losing the performance benefits of the way that readr is structured.

@hadley hadley modified the milestone: 0.3.0 Jul 13, 2016

jimhester added a commit to jimhester/readr that referenced this issue Jul 14, 2016

Allow skipping lines in fwf based on a comment string
I also modified the fixed width example file to be a little more
substantial that the previous examples.

Fixes tidyverse#334

jimhester added a commit to jimhester/readr that referenced this issue Jul 14, 2016

Allow skipping lines in fwf based on a comment string
I also modified the fixed width example file to be a little more
substantial that the previous examples.

Fixes tidyverse#334

jimhester added a commit to jimhester/readr that referenced this issue Jul 14, 2016

Allow skipping lines in fwf based on a comment string
I also modified the fixed width example file to be a little more
substantial that the previous examples.

Fixes tidyverse#334

@jimhester jimhester self-assigned this Jul 14, 2016

@jimhester jimhester added in progress and removed ready labels Jul 14, 2016

@jimhester jimhester removed the in progress label Jul 14, 2016

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.