New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore comments in tokenizer #68

Closed
hadley opened this Issue Mar 11, 2015 · 13 comments

Comments

Projects
None yet
@hadley
Member

hadley commented Mar 11, 2015

read.csv() etc support a comment argument to ignore (e.g.) everything after #.

Please 馃憤 this issue if you'd like this feature.

@hadley hadley added the community label Mar 11, 2015

@hadley hadley changed the title from Support for comments to Ignore comments in tokenizer Mar 11, 2015

@lmullen

This comment has been minimized.

lmullen commented Mar 11, 2015

馃憤 I sometimes encounter CSV files with comments. I'm sorry to say I even used to add them myself.

@jimhester

This comment has been minimized.

Member

jimhester commented Mar 11, 2015

馃憥 for me, the only time I have used this feature in read.csv() is when it causes an incorrect parse due to my data having # in it and I have to read the man page to turn it off.

If you do decide to add it please make the default off!

@leondutoit

This comment has been minimized.

leondutoit commented Mar 12, 2015

馃憤 default == off, I see people commenting in delimited files all the time

@hadley

This comment has been minimized.

Member

hadley commented Mar 12, 2015

The default would definitely be off - I've also had bad experiences where I had to turn it off

@jennybc

This comment has been minimized.

Member

jennybc commented Mar 12, 2015

馃憤

Some instruments write reasonable delimited files but put metadata in the file itself, not necessarily at the top, so this would be a good complement to skip =. Also more flexible than skip =, which I assume requires a specific number of lines.

@PeteHaitch

This comment has been minimized.

PeteHaitch commented Mar 16, 2015

馃憤 for the reasons @jennybc said.

@davharris

This comment has been minimized.

davharris commented Apr 10, 2015

馃憤

I was just about to open an issue to suggest this. My current use case is reading in files produced by the Stan package for MCMC (example output; note the comment lines at the beginning, middle, and end of the file).

Currently, I have to make two passes through the file and do some extra fiddling, and it would be nice if all of that could be automated.

If it doesn't add too much complication, it might be nice if the comments (or at least their positions) could be included as an attribute as well, similar to how "problems" are handled.

Thanks for making yet another great package!

@defconst

This comment has been minimized.

defconst commented Apr 26, 2015

馃憤

Sometimes needed for comments and metadata.

@rpruim

This comment has been minimized.

rpruim commented May 8, 2015

My bad for just adding an issue to request this -- sorry for the cruft. Seems to me that it does no harm when not used and can be make it or break it when the file you need need to read has comments in it.

Frankly, I'd like to see more people put meta data in commented portions of delimited files because otherwise it tends to go missing.

@sjackman

This comment has been minimized.

sjackman commented Jun 19, 2015

馃憤

@stefano-meschiari

This comment has been minimized.

stefano-meschiari commented Jul 2, 2015

馃憤

1 similar comment
@kcha

This comment has been minimized.

kcha commented Sep 1, 2015

馃憤

@hadley hadley closed this in 2ccdde4 Sep 23, 2015

@jennybc

This comment has been minimized.

Member

jennybc commented Sep 23, 2015

This is very exciting. 馃帀

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.