Description
This is a feature request that should not impact current readr
's behaviour, but that would open up the possibility of enhancement from third-party packages (cc @edzer).
Motivation
Consider the cases in which a rectangular data source contains, e.g.,
- Quantities with units and/or errors.
1.53(2) m/s
. - Spatial data:
POINT(0, 1)
.
readr::read_csv("
quantity,point
1.53(3) m/s,\"POINT(0,1)\"
5.21(1) m/s,\"POINT(1,5)\"
")
#> # A tibble: 3 x 2
#> quantity point
#> <chr> <chr>
#> 1 quantity point
#> 2 1.53(3) m/s POINT(0,1)
#> 3 5.21(1) m/s POINT(1,5)
Currently, those data types are read as character (and other use cases may be stripped to numbers), and the user needs to convert them. The idea would be to allow packages to register custom column types and parsers into readr
so that, in this example, if packages quantities
and sf
were loaded, readr
would have automatically generated columns with quantities
and sf
objects respectively.
Changes required
I could be missing something, but the general changes needed for this would be:
- Move things to
inst/include
to expose theCollector
class, so that other packages can link toreadr
and derive safely from this class. - Some mechanism to register
- (C++) the custom collector into the list of available subclasses (and the means to insert it in a specific position in the chain?) and a parser (guesser) function.
- (R) custom
col_*
andparse_*
.
The only drawback I can think of is that a package may, e.g., register a parser that catches everything and messes things up. To avoid this issue, readr::read_*
may gain a flag to enable external parsers, so that using them requires an action from the user.
If this enhancement is considered, I would be more than happy to work on it.