Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore comments in tokenizer #68

Closed
hadley opened this issue Mar 11, 2015 · 13 comments
Closed

Ignore comments in tokenizer #68

hadley opened this issue Mar 11, 2015 · 13 comments

Comments

@hadley
Copy link
Member

@hadley hadley commented Mar 11, 2015

read.csv() etc support a comment argument to ignore (e.g.) everything after #.

Please 馃憤 this issue if you'd like this feature.

@hadley hadley added the community label Mar 11, 2015
@hadley hadley changed the title Support for comments Ignore comments in tokenizer Mar 11, 2015
@lmullen
Copy link

@lmullen lmullen commented Mar 11, 2015

馃憤 I sometimes encounter CSV files with comments. I'm sorry to say I even used to add them myself.

@jimhester
Copy link
Member

@jimhester jimhester commented Mar 11, 2015

馃憥 for me, the only time I have used this feature in read.csv() is when it causes an incorrect parse due to my data having # in it and I have to read the man page to turn it off.

If you do decide to add it please make the default off!

@leondutoit
Copy link

@leondutoit leondutoit commented Mar 12, 2015

馃憤 default == off, I see people commenting in delimited files all the time

@hadley
Copy link
Member Author

@hadley hadley commented Mar 12, 2015

The default would definitely be off - I've also had bad experiences where I had to turn it off

@jennybc
Copy link
Member

@jennybc jennybc commented Mar 12, 2015

馃憤

Some instruments write reasonable delimited files but put metadata in the file itself, not necessarily at the top, so this would be a good complement to skip =. Also more flexible than skip =, which I assume requires a specific number of lines.

@PeteHaitch
Copy link

@PeteHaitch PeteHaitch commented Mar 16, 2015

馃憤 for the reasons @jennybc said.

@davharris
Copy link

@davharris davharris commented Apr 10, 2015

馃憤

I was just about to open an issue to suggest this. My current use case is reading in files produced by the Stan package for MCMC (example output; note the comment lines at the beginning, middle, and end of the file).

Currently, I have to make two passes through the file and do some extra fiddling, and it would be nice if all of that could be automated.

If it doesn't add too much complication, it might be nice if the comments (or at least their positions) could be included as an attribute as well, similar to how "problems" are handled.

Thanks for making yet another great package!

@defconst
Copy link

@defconst defconst commented Apr 26, 2015

馃憤

Sometimes needed for comments and metadata.

@rpruim
Copy link

@rpruim rpruim commented May 8, 2015

My bad for just adding an issue to request this -- sorry for the cruft. Seems to me that it does no harm when not used and can be make it or break it when the file you need need to read has comments in it.

Frankly, I'd like to see more people put meta data in commented portions of delimited files because otherwise it tends to go missing.

@sjackman
Copy link

@sjackman sjackman commented Jun 19, 2015

馃憤

@stefano-meschiari
Copy link

@stefano-meschiari stefano-meschiari commented Jul 2, 2015

馃憤

1 similar comment
@kcha
Copy link

@kcha kcha commented Sep 1, 2015

馃憤

@hadley hadley closed this in 2ccdde4 Sep 23, 2015
@jennybc
Copy link
Member

@jennybc jennybc commented Sep 23, 2015

This is very exciting. 馃帀

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet