-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with horrible files #94
Comments
I think so. Could you outline a few examples? (maybe using commas so it's a bit easier to see) |
Totally! h1 h2 h3 We've got an unquoted text field, which shouldn't be a thing that exists but is, because humans are terrible. As a result, any comma is (potentially) a delimiter; it's impossible for a machine to tell. In R-core, the response is to create an additional field at the end and then (confusingly) warn you about incomplete lines. In readr, the response is to say "more columns than column names, abort!". It would be nice to have a way of solving for this by saying "I don't care if you've got a comma in you, field, I've been told there are six fields and you're no.7 so I'm going to issue a warning and then fields[6] += fields[7]. In you go now." |
Hmmm, this is the same basic behaviour as |
Maybe |
Yep; sounds perfect! |
Related to #189 |
I think this is ok now - readr will just expand the columns as needed, and you can do the cleanup afterwards. (i.e. it's like |
Some humans - some terrible, terrible humans - leave tabs in content-insertable fields. Tabs that are not consistently escaped. Would it be possible to add an option to read_delim that, if set to FALSE, continues current behaviour around one row unexpectedly having an extra field and, if set to TRUE, mashes the extra field into the last "expected" field and issues a warning noting the row number?
The text was updated successfully, but these errors were encountered: