Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registration of custom column types and parsers #865

Enchufa2 opened this issue Jun 14, 2018 · 3 comments

Registration of custom column types and parsers #865

Enchufa2 opened this issue Jun 14, 2018 · 3 comments


Copy link

@Enchufa2 Enchufa2 commented Jun 14, 2018

This is a feature request that should not impact current readr's behaviour, but that would open up the possibility of enhancement from third-party packages (cc @edzer).


Consider the cases in which a rectangular data source contains, e.g.,

  • Quantities with units and/or errors. 1.53(2) m/s.
  • Spatial data: POINT(0, 1).
1.53(3) m/s,\"POINT(0,1)\"
5.21(1) m/s,\"POINT(1,5)\"
#> # A tibble: 3 x 2
#>   quantity    point     
#>   <chr>       <chr>     
#> 1 quantity    point     
#> 2 1.53(3) m/s POINT(0,1)
#> 3 5.21(1) m/s POINT(1,5)

Currently, those data types are read as character (and other use cases may be stripped to numbers), and the user needs to convert them. The idea would be to allow packages to register custom column types and parsers into readr so that, in this example, if packages quantities and sf were loaded, readr would have automatically generated columns with quantities and sf objects respectively.

Changes required

I could be missing something, but the general changes needed for this would be:

  • Move things to inst/include to expose the Collector class, so that other packages can link to readr and derive safely from this class.
  • Some mechanism to register
    • (C++) the custom collector into the list of available subclasses (and the means to insert it in a specific position in the chain?) and a parser (guesser) function.
    • (R) custom col_* and parse_*.

The only drawback I can think of is that a package may, e.g., register a parser that catches everything and messes things up. To avoid this issue, readr::read_* may gain a flag to enable external parsers, so that using them requires an action from the user.

If this enhancement is considered, I would be more than happy to work on it.

Copy link

@msberends msberends commented Nov 5, 2018

I'll second this! Would be very welcome if we could extend the functionality of readr with other packages.

Our AMR package is all about antimicrobial resistance (AMR). The new class rsi makes sure columns only contain valid antimicrobial interpretations: resistant (R), susceptible (S) or intermediate (I). This can be forced upon a vector with as.rsi.
Anyway, if data are read from a microbiological laboratory system (in a hospital), all columns with results of antbiotics will contain just R, S or I values. Suchcolumns could be parsed as rsi when the AMR package is loaded 😄

@jimhester jimhester added the feature label Nov 13, 2018
@jimhester jimhester added this to the backlog milestone Nov 15, 2018
Copy link

@matthewstrasiotto matthewstrasiotto commented Feb 6, 2019

Consider this third-ed !
Some notes:
Imo, allow users to supply parsers as a named list, and maybe to be able to override builtin parsers (Throw a warning if they try to do this with an option to suppress it, maybe).

With col_types, extend the abbreviation string assignment to allow users to instead use the key for their input parser.

Copy link

@msberends msberends commented Nov 4, 2019

This would still be awesome to have 🙄😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants