Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement comment skipping for read_fwf() #334

Closed
yportier opened this issue Dec 12, 2015 · 4 comments
Closed

Implement comment skipping for read_fwf() #334

yportier opened this issue Dec 12, 2015 · 4 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@yportier
Copy link

I have fixed width text files with headers, the size of which (the amount of lines to skip) varies greatly from file to file. I could first read a file and figure out the amount of lines to skip before reloading it with the skip value set, but it feels clumsy and it would be nice to be able to do that in one go. Headers/Comments being often identifiable by the first few characters (in my case, the line starts by the letter H).
It would also be nice to have the possibility to save those skipped lines (the whole header) in a variable or a file at the same time as one may need to parse it further to collect some information from it.

@jennybc
Copy link
Member

jennybc commented Dec 12, 2015

I was about to say that the new-ish argument comment should work for the header skipping (as long as data lines don't start with H). See #68, now closed/fixed. But now I see that comment is not (yet?) available for read_fwf().

@yportier
Copy link
Author

Something like comment could definitely work, yes.
It may be worth nothing though that in the case of a header, it is not necessary to test every single line as it is clear that after we've encountered the first row of data, there is no more header.

@dholstius
Copy link

Two suggestions for interested parties (not just Hadley!)

  1. Assign comments to the returned object using 'comment()<-'.
  2. Design an API for a more general 2-part approach. Maybe 'meta=list(prefix="#", parser=as.character, simplify=FALSE)' would accomplish the above? Support for emerging CSV metadata conventions could be implemented as extensions.

@hadley hadley changed the title Feature Request: conditional skip Add comment argument to read_fwf() Jun 2, 2016
@hadley hadley changed the title Add comment argument to read_fwf() Implement comment skipping for read_fwf() Jun 2, 2016
@hadley
Copy link
Member

hadley commented Jun 2, 2016

@holstius that's unfortunately rather difficult to do without losing the performance benefits of the way that readr is structured.

@hadley hadley added feature a feature request or enhancement ready labels Jun 2, 2016
@hadley hadley modified the milestone: 0.3.0 Jul 13, 2016
jimhester added a commit to jimhester/readr that referenced this issue Jul 14, 2016
I also modified the fixed width example file to be a little more
substantial that the previous examples.

Fixes tidyverse#334
@jimhester jimhester self-assigned this Jul 14, 2016
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants