Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorised switch (aka SQL DECODE) #1710

Closed
datalove opened this issue Mar 13, 2016 · 4 comments
Closed

Vectorised switch (aka SQL DECODE) #1710

datalove opened this issue Mar 13, 2016 · 4 comments
Labels
Milestone

Comments

@datalove
Copy link

@datalove datalove commented Mar 13, 2016

I do a lot of SQL querying in Oracle and often find myself reaching for the DECODE SQL function (link), which is a bit like a lightweight CASE statement, with simple replacements and an optional default replacement.

I thought it might also be useful with tbl_df using mutate in R so I wrote a decode function that similarly takes pairs of arguments like so:

decode(x, target_1, replacement_1, target_2, replacement_2, ... ,optional_default)

And set it up to work with named arguments (because I'm lazy).

mtcars %>% mutate(cyl_name = decode(cyl,4,'four',6,'six',8,'eight'))
mtcars %>% mutate(cyl_name = decode(cyl,4,'four',6,'six','guzzler'))
mtcars %>% mutate(cyl_name = decode(cyl,4,four,6,six,guzzler))
mtcars %>% mutate(cyl_name = decode(cyl,4,four,6,six))

Is it worth submitting a pull request to this decode function? I'm sure I can improve what's below.

I think adding a decode function reduces a bit of the need to solve this issue (#631) relating to a general purpose SQL-like CASE function.

decode <- function(x, ...) {

  odds <- function(x) { unlist(x[1:length(x) %% 2 == 1][1:floor(length(x)/2)]) }
  even <- function(x) { unlist(x[1:length(x) %% 2 == 0]) }
  last <- function(x) { unlist(if(length(x) %% 2 == 1) tail(x,1)) }

  interpret_args <- function(x) { if(is.call(x)) {eval(x)} else if(is.name(x)) {as.character(x)} else {x} }

  args <- eval(substitute(alist(...)))
  args <- lapply(args, interpret_args)

  targets      <- odds(args)
  replacements <- even(args)
  default      <- last(args)

  res <- x

  if(!is.null(default))
    res[! x %in% targets & ! is.na(x)] <- default

  for(i in seq_along(targets)) {
    t <- targets[[i]];  r <- replacements[[i]]
    res[res == t | (is.na(t) & is.na(res))] <- r    
  }

  if(inherits(x, "factor"))
    res <- as.factor(res)

  res
} 
@hadley hadley added this to the 0.5 milestone Mar 14, 2016
@hadley
Copy link
Member

@hadley hadley commented Mar 14, 2016

I think it would make more sense to work like a vectorised switch(), but I agree that it's a good idea and would be a nice accompaniment to the new na_if() and coalesce().

@hadley
Copy link
Member

@hadley hadley commented Mar 14, 2016

Note that DECODE appears to be an oracle extension, so would need to be converted to a CASE statement by default.

@joranE
Copy link
Contributor

@joranE joranE commented Mar 14, 2016

FWIW, I am frequently also using decode with an Oracle db, and have ended up with the following R implementations for character and factor arguments:

decode.character <- function(x,...,default = NULL){
    replacements <- unlist(list(...))
    unique_x <- unique(x)
    replacements <- replacements[names(replacements) %in% unique_x]
    if (is.null(default)){
        key <- setNames(unique_x,unique_x)
    }else{
        key <- setNames(rep(default,length(unique_x)),unique_x)
    }
    key[names(replacements)] <- replacements
    return(unname(key[x]))
}
decode.factor <- function(x,...,default = NULL){
    replacements <- unlist(list(...))
    unique_x <- levels(x)
    replacements <- replacements[names(replacements) %in% unique_x]
    if (is.null(default)){
        key <- setNames(unique_x,unique_x)
    }else{
        key <- setNames(rep(default,length(unique_x)),unique_x)
    }
    key[names(replacements)] <- replacements
    levels(x) <- key[levels(x)]
    return(x)
}

...and they are called a bit more in the style of switch, with name = value pairs in ..., where name is the old and value is the new.

@hadley hadley changed the title Simple substitution in mutate using a 'decode' function Vectorised switch (aka SQL DECODE) Mar 14, 2016
@hadley hadley closed this in ee097ad Mar 14, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
@hadley @datalove @joranE and others