baseverse is a collection of functions intended to support the continued use of base R in the modern era. There are three main types of functions included in the package:
- wrapper functions for existing base-R functions: These begin with
p_and support native piping. For example,p_lm()is a wrapper forlm()supporting native piping. - wrapper functions for existing base-R features: These are named after the underlying symbols. For example,
dollar()is a wrapper for dollar-sign notation. - functions that mimic tidyverse functions: These include
base_match()andbase_when()from dplyr (see the section below).
The package is now available on CRAN! 🥳
To install the GitHub version instead (which may be more recent than the CRAN version), use install_github() from the remotes package:
remotes::install_github('yea-hung/baseverse')As mentioned elsewhere, case_match() and case_when() do not return a factor. A typical tidyverse solution for getting a factor out of case_match() with the levels in a desired order is something like this:
nhanes<-nhanes %>%
mutate(
country=factor(
case_match(dmdborn4,1 ~ 'USA',2 ~ 'Other'),
levels=c('USA','Other')
)
)In this sort of solution, we have to type the level labels twice. The first occurrence defines the label-level mapping, while the second occurrence defines the order of the levels. I think this is inefficient. Worse, it may introduce human error.
Compare the above with the following base-R solution:
dmdborn4_codebook<-c('USA'=1,'Other'=2)
nhanes$country<-factor(nhanes$dmdborn4,levels=dmdborn4_codebook,
labels=names(dmdborn4_codebook))Here, we only have to type the level labels once: that one occurrence defines both the label-level mapping and the order of the levels.
My starting principle in writing base_match() and base_when() is that one should only have to type the level labels once.
base_match() using native piping:
nhanes<-nhanes |>
transform(country=base_match(dmdborn4,'USA'=1,'Other'=2))base_when() using native piping:
nhanes<-nhanes |>
transform(
cholesterol=base_when(
'Desirable' = (lbxtc<200),
'Borderline high' = (lbxtc>=200)&(lbxtc<240),
'High' = (lbxtc>=240)
)
)base_when() does not exactly mimic case_when(), and I do not intend it to. A key difference is base_when() will evaluate all conditions defined in conditions whereas case_when() will, for each position, stop when a condition is met.