Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Empty Factors #285

Merged
merged 8 commits into from Nov 20, 2017
Merged

Complete Empty Factors #285

merged 8 commits into from Nov 20, 2017

Conversation

@colearendt
Copy link
Contributor

@colearendt colearendt commented Mar 25, 2017

Close #270, close #271, close #276

Address documentation issues.

One documentation issue involved code that did not execute (#271). Another was not a documentation issue, per say, it involved complete inhering an expand parameter (#276).

Decision on behavior for factors is yet to be decided. This PR will change behavior so that empty factors are not dropped (by expand, crossing, complete), but "completed." nesting will continue to drop empty factors per the original behavior.

This required a parameter on the "drop_empty" function to alter behavior with factors. factor=FALSE is the legacy behavior.

Added tests with more clarity on different argument types. Also added explicit calls to the dplyr package in some of the relevant tests.

Alter complete, expand, crossing to complete empty factors

Address documentation issues
colearendt added 2 commits Aug 27, 2017
…ure/completebug

Conflicts:
	R/expand.R
	tests/testthat/test-complete.R
	tests/testthat/test-expand.R
Copy link

@rtaph rtaph left a comment

This PR brings in great functionality for factors and is really useful for consistent treatment of factors-- even when they are empty. I hope the code gets folded in soon!

@hadley
Copy link
Member

@hadley hadley commented Nov 15, 2017

It's been a long time, so I wanted to check that you're still interested in this PR before I start the review process.

@colearendt
Copy link
Contributor Author

@colearendt colearendt commented Nov 15, 2017

Thanks for the note! Yes, I definitely am. I'm not sure if there is a better way to implement, but I think the functionality is useful for improving the consistency of how factors are treated.

@hadley
Copy link
Member

@hadley hadley commented Nov 15, 2017

BTW in the future, it's easier for me to review if you do one PR per issue; before I closely read the linked issues I'd assumed that they were all related.

R/expand.R Outdated
@@ -150,8 +150,8 @@ cross_df <- function(x, y) {
y_idx <- rep(seq_len(nrow(y)), nrow(x))
dplyr::bind_cols(x[x_idx, , drop = FALSE], y[y_idx, , drop = FALSE])
}
drop_empty <- function(x) {
empty <- map_lgl(x, function(x) length(x) == 0)
drop_empty <- function(x, factor=TRUE) {

This comment has been minimized.

@hadley

hadley Nov 15, 2017
Member

Can you please follow http://style.tidyverse.org/syntax.html#spacing and put space around = andbefore,`?

This comment has been minimized.

@hadley

hadley Nov 15, 2017
Member

Also I don't think this actually needs to be an option, but it does need a comment mentioning that we need to keep it consistent with ulevels()

This comment has been minimized.

@colearendt

colearendt Nov 15, 2017
Author Contributor

Yes, of course. The option there is due to the nesting discussion below (nesting has different behavior).

@@ -13,3 +13,11 @@ test_that("preserves grouping", {
expect_s3_class(out, "grouped_df")
expect_equal(dplyr::groups(out), dplyr::groups(df))
})

test_that('expands empty factors', {
emptyfactor <- factor(character(),levels=c('a','b','c'))

This comment has been minimized.

@hadley

hadley Nov 15, 2017
Member

You can just write factor(levels = c('a', 'b', 'c'))

(But please use " for strings)

expect_equal(crossing(x = tb$x, y = character()), tb)
})

test_that('zero length factor inputs are completed by expand & crossing, dropped by nesting', {

This comment has been minimized.

@hadley

hadley Nov 15, 2017
Member

Why does nesting drop?

This comment has been minimized.

@colearendt

colearendt Nov 15, 2017
Author Contributor

This is a fantastic question and one that I should have noted. Per the docs:

nesting() is the complement to crossing(): it only keeps combinations of all variables that appear in the data

As a result, I was not sure if an empty factor should be considered a "combination appearing in the data." If so, I am happy to work on an implementation. I can see where this might come across as a lack of consistency - however, this is a context where nesting / crossing are designed to behave differently.

For instance, non-empty factors are currently handled differently by nesting and crossing. My aim was to maintain that dichotomy:

tb <- tibble::tibble(x=1:2)                       
emptyfactor <- factor(levels = c("a","b"))        
                                                  
simplefactor <- factor(c("a"),levels = c("a","b"))
                                                  
## nesting factor behavior                        
nesting(x = tb$x, y = simplefactor)               
#> # A tibble: 2 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     2      a
                                                  
## crossing factor behavior                       
crossing(x = tb$x, y = simplefactor)              
#> # A tibble: 4 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     1      b
#> 3     2      a
#> 4     2      b

This comment has been minimized.

@hadley

hadley Nov 20, 2017
Member

Yeah, I don't see how you can "nest" the empty levels of a factor because they don't appear in the data. I think this is the issue at the heart of #193

@hadley hadley merged commit c5b8e24 into tidyverse:master Nov 20, 2017
1 check was pending
1 check was pending
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
hadley added a commit that referenced this pull request Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants