New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registration of custom column types and parsers #865
Comments
I'll second this! Would be very welcome if we could extend the functionality of Our AMR package is all about antimicrobial resistance (AMR). The new class |
Consider this third-ed ! With col_types, extend the abbreviation string assignment to allow users to instead use the key for their input parser. |
This would still be awesome to have |
I also have a use for this. In my case values are written to a character field that are probably best represented by a matrix in R. The CSV records look like this:
These are measurements of acceleration made in a 5 hz burst over 2 axis. Alternatively If i code them as raw vectors I can considerably reduce the memory consumption (66 % reduction). |
I also have a use case for this in Contrary to the posts above, I'd suggest that parsers be specified explicitly with a Likewise I wouldn't extend this to the compact string representation as that will quickly lead to collisions. We've already used a quarter of the (lower/ upper case alphabet) namespace between Indeed I'd expect that some parsers will benefit from configuration (as do Instead I'd suggest this work with the readr::read_csv("data.csv", col_type=list(mypackage::col_logical("yes"=T, "no"=F))) One thing that strikes me looking at the source is that the collectors are implemented in C++, presumably for speed. Would it be possible to modify This would still require that packages implement their collectors in C++. An alternative might be to allow people to provide collectors in R code, accepting that this might dramatically slow down parsing. This would lower the barriers to extension without prohibiting package authors from re-implementing their parsers in C++ if it proved too slow in R. |
This is a feature request that should not impact current
readr
's behaviour, but that would open up the possibility of enhancement from third-party packages (cc @edzer).Motivation
Consider the cases in which a rectangular data source contains, e.g.,
1.53(2) m/s
.POINT(0, 1)
.Currently, those data types are read as character (and other use cases may be stripped to numbers), and the user needs to convert them. The idea would be to allow packages to register custom column types and parsers into
readr
so that, in this example, if packagesquantities
andsf
were loaded,readr
would have automatically generated columns withquantities
andsf
objects respectively.Changes required
I could be missing something, but the general changes needed for this would be:
inst/include
to expose theCollector
class, so that other packages can link toreadr
and derive safely from this class.col_*
andparse_*
.The only drawback I can think of is that a package may, e.g., register a parser that catches everything and messes things up. To avoid this issue,
readr::read_*
may gain a flag to enable external parsers, so that using them requires an action from the user.If this enhancement is considered, I would be more than happy to work on it.
The text was updated successfully, but these errors were encountered: