Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Make it easy to catch mutations in the data by emitting col_types string #314
One of the nice features of readr is that it introspects column types and gives you data frames with columns of "appropriate" data types. This is cool. It does so heuristically by reading a bunch of rows (but not all) and guessing.
This is very nice for interactive use, however I think it could be made into a neat feature to catch data schema mutation.
I would like to propose a new function which reads a whole CSV file, and from the entire file returns a col_types string. A user can then take that string and put it in their production script's read_csv ca;;, so that in the future, if a new data file comes in which has different data types, instead of (for example) silently returning a data frame which has chars where a date once was, it will now give a proper error.
Added 2016-02-10: W3C also has a draft document on "standard ways to express useful metadata about CSV files and other kinds of tabular data": http://w3c.github.io/csvw/primer/.
referenced this issue
Jan 7, 2016
is it OK to add a comment to this completed enhancement, or should I add a new enhancement request?
While spitting out the column specification (I like this) is a great feature as a default, it gets a tad tedious to see the same red text on the screen once the spec has been figured out. My suggestion is to add a "silent = TRUE" option to suppress the column specifications if they are not wanted.