New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Empty Factors #285

Merged
merged 8 commits into from Nov 20, 2017

Conversation

Projects
None yet
3 participants
@colearendt
Contributor

colearendt commented Mar 25, 2017

Close #270, close #271, close #276

Address documentation issues.

One documentation issue involved code that did not execute (#271). Another was not a documentation issue, per say, it involved complete inhering an expand parameter (#276).

Decision on behavior for factors is yet to be decided. This PR will change behavior so that empty factors are not dropped (by expand, crossing, complete), but "completed." nesting will continue to drop empty factors per the original behavior.

This required a parameter on the "drop_empty" function to alter behavior with factors. factor=FALSE is the legacy behavior.

Added tests with more clarity on different argument types. Also added explicit calls to the dplyr package in some of the relevant tests.

Close #270, close #271, close #276
Alter complete, expand, crossing to complete empty factors

Address documentation issues

colearendt added some commits Aug 27, 2017

Merge branch 'master' of https://github.com/tidyverse/tidyr into feat…
…ure/completebug

Conflicts:
	R/expand.R
	tests/testthat/test-complete.R
	tests/testthat/test-expand.R
@rtaph

This PR brings in great functionality for factors and is really useful for consistent treatment of factors-- even when they are empty. I hope the code gets folded in soon!

@hadley

This comment has been minimized.

Member

hadley commented Nov 15, 2017

It's been a long time, so I wanted to check that you're still interested in this PR before I start the review process.

@colearendt

This comment has been minimized.

Contributor

colearendt commented Nov 15, 2017

Thanks for the note! Yes, I definitely am. I'm not sure if there is a better way to implement, but I think the functionality is useful for improving the consistency of how factors are treated.

@hadley

This comment has been minimized.

Member

hadley commented Nov 15, 2017

BTW in the future, it's easier for me to review if you do one PR per issue; before I closely read the linked issues I'd assumed that they were all related.

R/expand.R Outdated
@@ -150,8 +150,8 @@ cross_df <- function(x, y) {
y_idx <- rep(seq_len(nrow(y)), nrow(x))
dplyr::bind_cols(x[x_idx, , drop = FALSE], y[y_idx, , drop = FALSE])
}
drop_empty <- function(x) {
empty <- map_lgl(x, function(x) length(x) == 0)
drop_empty <- function(x, factor=TRUE) {

This comment has been minimized.

@hadley

hadley Nov 15, 2017

Member

Can you please follow http://style.tidyverse.org/syntax.html#spacing and put space around = andbefore,`?

This comment has been minimized.

@hadley

hadley Nov 15, 2017

Member

Also I don't think this actually needs to be an option, but it does need a comment mentioning that we need to keep it consistent with ulevels()

This comment has been minimized.

@colearendt

colearendt Nov 15, 2017

Contributor

Yes, of course. The option there is due to the nesting discussion below (nesting has different behavior).

@@ -13,3 +13,11 @@ test_that("preserves grouping", {
expect_s3_class(out, "grouped_df")
expect_equal(dplyr::groups(out), dplyr::groups(df))
})
test_that('expands empty factors', {
emptyfactor <- factor(character(),levels=c('a','b','c'))

This comment has been minimized.

@hadley

hadley Nov 15, 2017

Member

You can just write factor(levels = c('a', 'b', 'c'))

(But please use " for strings)

expect_equal(crossing(x = tb$x, y = character()), tb)
})
test_that('zero length factor inputs are completed by expand & crossing, dropped by nesting', {

This comment has been minimized.

@hadley

hadley Nov 15, 2017

Member

Why does nesting drop?

This comment has been minimized.

@colearendt

colearendt Nov 15, 2017

Contributor

This is a fantastic question and one that I should have noted. Per the docs:

nesting() is the complement to crossing(): it only keeps combinations of all variables that appear in the data

As a result, I was not sure if an empty factor should be considered a "combination appearing in the data." If so, I am happy to work on an implementation. I can see where this might come across as a lack of consistency - however, this is a context where nesting / crossing are designed to behave differently.

For instance, non-empty factors are currently handled differently by nesting and crossing. My aim was to maintain that dichotomy:

tb <- tibble::tibble(x=1:2)                       
emptyfactor <- factor(levels = c("a","b"))        
                                                  
simplefactor <- factor(c("a"),levels = c("a","b"))
                                                  
## nesting factor behavior                        
nesting(x = tb$x, y = simplefactor)               
#> # A tibble: 2 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     2      a
                                                  
## crossing factor behavior                       
crossing(x = tb$x, y = simplefactor)              
#> # A tibble: 4 x 2
#>       x      y
#>   <int> <fctr>
#> 1     1      a
#> 2     1      b
#> 3     2      a
#> 4     2      b

This comment has been minimized.

@hadley

hadley Nov 20, 2017

Member

Yeah, I don't see how you can "nest" the empty levels of a factor because they don't appear in the data. I think this is the issue at the heart of #193

colearendt and others added some commits Nov 16, 2017

Merge branch 'master' of https://github.com/tidyverse/tidyr into feat…
…ure/completebug

resolve conflicts with NEWS.md and test-expand.R

@hadley hadley merged commit c5b8e24 into tidyverse:master Nov 20, 2017

1 check was pending

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details

hadley added a commit that referenced this pull request Nov 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment