Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Empty Factors #285

Merged
merged 8 commits into from Nov 20, 2017
Merged

Conversation

colearendt
Copy link
Contributor

Close #270, close #271, close #276

Address documentation issues.

One documentation issue involved code that did not execute (#271). Another was not a documentation issue, per say, it involved complete inhering an expand parameter (#276).

Decision on behavior for factors is yet to be decided. This PR will change behavior so that empty factors are not dropped (by expand, crossing, complete), but "completed." nesting will continue to drop empty factors per the original behavior.

This required a parameter on the "drop_empty" function to alter behavior with factors. factor=FALSE is the legacy behavior.

Added tests with more clarity on different argument types. Also added explicit calls to the dplyr package in some of the relevant tests.

Alter complete, expand, crossing to complete empty factors

Address documentation issues
…ure/completebug

Conflicts:
	R/expand.R
	tests/testthat/test-complete.R
	tests/testthat/test-expand.R
Copy link

@rtaph rtaph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR brings in great functionality for factors and is really useful for consistent treatment of factors-- even when they are empty. I hope the code gets folded in soon!

@hadley
Copy link
Member

hadley commented Nov 15, 2017

It's been a long time, so I wanted to check that you're still interested in this PR before I start the review process.

@colearendt
Copy link
Contributor Author

Thanks for the note! Yes, I definitely am. I'm not sure if there is a better way to implement, but I think the functionality is useful for improving the consistency of how factors are treated.

@hadley
Copy link
Member

hadley commented Nov 15, 2017

BTW in the future, it's easier for me to review if you do one PR per issue; before I closely read the linked issues I'd assumed that they were all related.

R/expand.R Outdated
@@ -150,8 +150,8 @@ cross_df <- function(x, y) {
y_idx <- rep(seq_len(nrow(y)), nrow(x))
dplyr::bind_cols(x[x_idx, , drop = FALSE], y[y_idx, , drop = FALSE])
}
drop_empty <- function(x) {
empty <- map_lgl(x, function(x) length(x) == 0)
drop_empty <- function(x, factor=TRUE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please follow http://style.tidyverse.org/syntax.html#spacing and put space around = andbefore,`?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't think this actually needs to be an option, but it does need a comment mentioning that we need to keep it consistent with ulevels()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. The option there is due to the nesting discussion below (nesting has different behavior).

@@ -13,3 +13,11 @@ test_that("preserves grouping", {
expect_s3_class(out, "grouped_df")
expect_equal(dplyr::groups(out), dplyr::groups(df))
})

test_that('expands empty factors', {
emptyfactor <- factor(character(),levels=c('a','b','c'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just write factor(levels = c('a', 'b', 'c'))

(But please use " for strings)

expect_equal(crossing(x = tb$x, y = character()), tb)
})

test_that('zero length factor inputs are completed by expand & crossing, dropped by nesting', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does nesting drop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic question and one that I should have noted. Per the docs:

nesting() is the complement to crossing(): it only keeps combinations of all variables that appear in the data

As a result, I was not sure if an empty factor should be considered a "combination appearing in the data." If so, I am happy to work on an implementation. I can see where this might come across as a lack of consistency - however, this is a context where nesting / crossing are designed to behave differently.

For instance, non-empty factors are currently handled differently by nesting and crossing. My aim was to maintain that dichotomy:

tb <- tibble::tibble(x=1:2)                       
emptyfactor <- factor(levels = c("a","b"))        
                                                  
simplefactor <- factor(c("a"),levels = c("a","b"))
                                                  
## nesting factor behavior                        
nesting(x = tb$x, y = simplefactor)               
#> # A tibble: 2 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     2      a
                                                  
## crossing factor behavior                       
crossing(x = tb$x, y = simplefactor)              
#> # A tibble: 4 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     1      b
#> 3     2      a
#> 4     2      b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see how you can "nest" the empty levels of a factor because they don't appear in the data. I think this is the issue at the heart of #193

@hadley hadley merged commit c5b8e24 into tidyverse:master Nov 20, 2017
hadley added a commit that referenced this pull request Nov 20, 2017
@colearendt colearendt deleted the feature/completebug branch November 14, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants