Skip to content

yea-hung/baseverse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

baseverse is a collection of functions intended to support the continued use of base R in the modern era. There are three main types of functions included in the package:

  • wrapper functions for existing base-R functions: These begin with p_ and support native piping. For example, p_lm() is a wrapper for lm() supporting native piping.
  • wrapper functions for existing base-R features: These are named after the underlying symbols. For example, dollar() is a wrapper for dollar-sign notation.
  • functions that mimic tidyverse functions: These include base_match() and base_when() from dplyr (see the section below).

Installation

The package is now available on CRAN! 🥳

To install the GitHub version instead (which may be more recent than the CRAN version), use install_github() from the remotes package:

remotes::install_github('yea-hung/baseverse')

base_match() and base_when()

Motivation

As mentioned elsewhere, case_match() and case_when() do not return a factor. A typical tidyverse solution for getting a factor out of case_match() with the levels in a desired order is something like this:

nhanes<-nhanes %>%
  mutate(
    country=factor(
      case_match(dmdborn4,1 ~ 'USA',2 ~ 'Other'),
      levels=c('USA','Other')
    )
  )

In this sort of solution, we have to type the level labels twice. The first occurrence defines the label-level mapping, while the second occurrence defines the order of the levels. I think this is inefficient. Worse, it may introduce human error.

Compare the above with the following base-R solution:

dmdborn4_codebook<-c('USA'=1,'Other'=2)
nhanes$country<-factor(nhanes$dmdborn4,levels=dmdborn4_codebook,
                       labels=names(dmdborn4_codebook))

Here, we only have to type the level labels once: that one occurrence defines both the label-level mapping and the order of the levels.

My starting principle in writing base_match() and base_when() is that one should only have to type the level labels once.

Examples

base_match() using native piping:

nhanes<-nhanes |>
  transform(country=base_match(dmdborn4,'USA'=1,'Other'=2))

base_when() using native piping:

nhanes<-nhanes |>
  transform(
    cholesterol=base_when(
      'Desirable' = (lbxtc<200),
      'Borderline high' = (lbxtc>=200)&(lbxtc<240),
      'High' = (lbxtc>=240)
    )
  )

Further details

base_when() does not exactly mimic case_when(), and I do not intend it to. A key difference is base_when() will evaluate all conditions defined in conditions whereas case_when() will, for each position, stop when a condition is met.

About

Modern functions for base R, supporting native piping.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages