Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement if_else #241

Merged
merged 3 commits into from Jan 24, 2023
Merged

Implement if_else #241

merged 3 commits into from Jan 24, 2023

Conversation

thisisnic
Copy link
Contributor

@thisisnic thisisnic commented Jan 22, 2023

Fixes #156

There is an issue with return type in DuckDB with logical values, but it won't affect the TPC-H query calculations, so I was going to leave that as something to come back to.

library(substrait)
library(dplyr)

# using Arrow
mtcars %>%
  arrow_substrait_compiler() %>%
  mutate(am_chr = if_else(am == 0, "automatic", "manual")) %>%
  select(am, am_chr) %>%
  collect()
#> # A tibble: 32 × 2
#>       am am_chr   
#>    <dbl> <chr>    
#>  1     1 manual   
#>  2     1 manual   
#>  3     1 manual   
#>  4     0 automatic
#>  5     0 automatic
#>  6     0 automatic
#>  7     0 automatic
#>  8     0 automatic
#>  9     0 automatic
#> 10     0 automatic
#> # … with 22 more rows

# using DuckDB
mtcars %>%
  duckdb_substrait_compiler() %>%
  mutate(am_chr = if_else(am == 0, "automatic", "manual")) %>%
  select(am, am_chr) %>%
  collect()
#> # A tibble: 32 × 2
#>       am am_chr   
#>    <dbl> <chr>    
#>  1     1 manual   
#>  2     1 manual   
#>  3     1 manual   
#>  4     0 automatic
#>  5     0 automatic
#>  6     0 automatic
#>  7     0 automatic
#>  8     0 automatic
#>  9     0 automatic
#> 10     0 automatic
#> # … with 22 more rows

## issue with TRUE/FALSE in DuckDB
# using Arrow
mtcars %>%
  arrow_substrait_compiler() %>%
  mutate(automatic = if_else(am == 0, TRUE, FALSE)) %>%
  select(am, automatic) %>%
  collect()
#> # A tibble: 32 × 2
#>       am automatic
#>    <dbl> <lgl>    
#>  1     1 FALSE    
#>  2     1 FALSE    
#>  3     1 FALSE    
#>  4     0 TRUE     
#>  5     0 TRUE     
#>  6     0 TRUE     
#>  7     0 TRUE     
#>  8     0 TRUE     
#>  9     0 TRUE     
#> 10     0 TRUE     
#> # … with 22 more rows

# using DuckDB
mtcars %>%
  duckdb_substrait_compiler() %>%
  mutate(automatic = if_else(am == 0, TRUE, FALSE)) %>%
  select(am, automatic) %>%
  collect()
#> # A tibble: 32 × 2
#>       am automatic
#>    <dbl>     <int>
#>  1     1         0
#>  2     1         0
#>  3     1         0
#>  4     0         1
#>  5     0         1
#>  6     0         1
#>  7     0         1
#>  8     0         1
#>  9     0         1
#> 10     0         1
#> # … with 22 more rows

@thisisnic thisisnic marked this pull request as ready for review January 23, 2023 14:23
skip_if_not(has_arrow_with_substrait())

expect_equal(
example_data[1:5, "dbl"] %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing new tests, it would be nice to move away from example_data (so that when I'm reviewing these tests I don't have to remember what example_data to know if the expected answer is correct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like having the same dataset to reuse across multiple tests. I guess it could make sense to leave it exclusively for the compare_dplyr_bindings() functions, where you don't need to know what the dataset looks like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good rule of thumb...those tests should pass no matter what example_data is (provided that it has a reasonable type coverage) so it doesn't help much when reading the tests to recreate a dataset every time.

R/pkg-arrow.R Outdated
@@ -259,6 +259,20 @@ arrow_funs[[">="]] <- function(lhs, rhs) {
)
}

arrow_funs[["if_else"]] <- function(condition, true, false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your previous PR, you can put this in substrait_funs since it's the same for arrow and duckdb, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

Copy link
Contributor

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@thisisnic thisisnic merged commit 93e3340 into voltrondata:main Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement if_else()
2 participants