New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weighting to fct_lump #70

Merged
merged 1 commit into from Feb 10, 2018

Conversation

Projects
None yet
4 participants
@wilkox
Copy link
Contributor

wilkox commented Dec 29, 2016

Adds a 'weights' argument to fct_lump that allows frequencies to be weighted. Issue #68.

@hadley
Copy link
Member

hadley left a comment

Thanks for this!

Overall the strategy looks good, but there are some code style fixes needed.

R/lump.R Outdated
ties.method = c("min", "average", "first", "last", "random", "max")) {
f <- check_factor(f)
ties.method <- match.arg(ties.method)

if (! missing(weights)) {

This comment has been minimized.

@hadley

hadley Dec 30, 2016

Member

Please remove the spaces after !, and use != not ! x == y

R/lump.R Outdated
if (! is.numeric(weights)) {
stop("`weights` must be a numeric vector")
} else if (! length(f) == length(weights)) {
stop("different lengths of `f` and `weights`")

This comment has been minimized.

@hadley

hadley Dec 30, 2016

Member

Error messages should start with a capital letter and always use call. = FALSE

R/lump.R Outdated
if (missing(weights)) {
count <- table(f)
} else {
count <- tapply(weights, f, FUN=sum)

This comment has been minimized.

@hadley

hadley Dec 30, 2016

Member

Need spaces around =

@@ -2,44 +2,76 @@ context("fct_lump")

test_that("positive values keeps most commmon", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
w <- c( 1, 1, 1, 1, 1, 4, 2, 1, 1, 1) # Sum = 14

This comment has been minimized.

@hadley

hadley Dec 30, 2016

Member

I think the tests would be clearer if you tested weighting in its own block. I think you only need a couple of tests to verify that weighted and the equivalent unweighted return the same values. The code is quite simple so it doesn't need extensive tests.

You might want to add one other test that checks that bad weights generate error messages.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Dec 30, 2016

Also in one of the commit message please include Fixes #68 that way the issue will be automatically closed when the pull is merged.

@hadley hadley closed this Dec 30, 2016

@hadley hadley reopened this Dec 30, 2016

@wilkox wilkox force-pushed the wilkox:master branch from bb72e9d to da5ebf9 Dec 30, 2016

@wilkox

This comment has been minimized.

Copy link
Contributor

wilkox commented Dec 30, 2016

Tidied up the style issues, simplified the tests and added tests for errors, and rebased.

@wilkox wilkox force-pushed the wilkox:master branch from da5ebf9 to a2d6626 Jul 12, 2017

@wilkox

This comment has been minimized.

Copy link
Contributor

wilkox commented Jul 12, 2017

Thanks @instantkaffee – fixed the error and rebased.

@andwilson

This comment has been minimized.

Copy link

andwilson commented Nov 20, 2017

Is this going to merge into the master? This is extremely useful functionality.

@tiernanmartin

This comment has been minimized.

Copy link

tiernanmartin commented Jan 6, 2018

I'd also like to see this feature merged into the master - thanks!

R/lump.R Outdated
#' # Use ties.method to control how tied factors are collapsed
#' fct_lump(x, n = 6)
#' fct_lump(x, n = 6, ties.method = "max")
#'
fct_lump <- function(f, n, prop, other_level = "Other",
fct_lump <- function(f, n, prop, weights, other_level = "Other",

This comment has been minimized.

@hadley

hadley Feb 10, 2018

Member

I realise now that weights should have a default value since it's option. @wilkox are you still interested in this PR? If not, I'm happy to make the change, but since you've put so much time into it, I thought you might like to carry it home.

This comment has been minimized.

@wilkox

wilkox Feb 10, 2018

Contributor

Still interested in seeing this merged 😃 Added a default weights = NULL, which seems less confusing than weights = 1.

f2 <- c("a", rep("b", 4), rep("c", 6), rep("d", 4), rep("e", 2),
rep("f", 2), "g")

expect_equal(levels(fct_lump(f, weights = w)),

This comment has been minimized.

@hadley

hadley Feb 10, 2018

Member

Can you please use the style described at http://style.tidyverse.org/syntax.html#long-lines ?

@wilkox wilkox force-pushed the wilkox:master branch from a2d6626 to 860f6e5 Feb 10, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Feb 10, 2018

One last change before I’ll merge - can you please add a bullet to news following http://style.tidyverse.org/news.html#during-development

@wilkox wilkox force-pushed the wilkox:master branch from 860f6e5 to 92d17f7 Feb 10, 2018

Add weighting to fct_lump. Fixes #68
* Add tests for weighted fct_lump
* Update fct_lump documentation

@wilkox wilkox force-pushed the wilkox:master branch from 92d17f7 to 382e52a Feb 10, 2018

@wilkox

This comment has been minimized.

Copy link
Contributor

wilkox commented Feb 10, 2018

Added the bullet to NEWS.md, rebased.

@hadley hadley merged commit 81f01c0 into tidyverse:master Feb 10, 2018

3 checks passed

codecov/patch 100% of diff hit (target 76.92%)
Details
codecov/project 78.36% (+1.44%) compared to d31997e
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hadley

This comment has been minimized.

Copy link
Member

hadley commented Feb 10, 2018

Thanks for bearing with me through this long process!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment