Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible jitter #1996

Merged
merged 20 commits into from
Jul 27, 2017
Merged

Reproducible jitter #1996

merged 20 commits into from
Jul 27, 2017

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 13, 2017

by adding a seed argument to position_jitter(). Useful to apply the same jitter to both points and labels.

We should probably move with_seed() to withr before merging.

Reprex:

library(ggplot2)
ggplot(data.frame(x = 1, y = 1:10, text = letters[1:10]), aes(x = x, y = y)) +
  geom_point(position = position_jitter()) +
  geom_label(aes(label = text, y = y - 0.5), position = position_jitter())

ggplot(data.frame(x = 1, y = 1:10, text = letters[1:10]), aes(x = x, y = y)) +
  geom_point(position = position_jitter(seed = 1L)) +
  geom_label(aes(label = text, y = y - 0.5), position = position_jitter(seed = 1L))

@jeff-mettel
Copy link

@krlmlr - Great enhancement. As soon as this gets accepted, we'll plan to use it.

trans_x <- if (params$width > 0) function(x) jitter(x, amount = params$width)
trans_y <- if (params$height > 0) function(x) jitter(x, amount = params$height)
trans_x <- if (params$width > 0) function(x) with_seed(params$seed, jitter(x, amount = params$width))
trans_y <- if (params$height > 0) function(x) with_seed(params$seed, jitter(x, amount = params$height))

transform_position(data, trans_x, trans_y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the with_seed() should probably go here, otherwise the x and y jitter will use the same random values

R/utilities.r Outdated
@@ -299,6 +299,16 @@ find_args <- function(...) {
# global data
dummy_data <- function() data.frame(x = NA)

with_seed <- function(seed, code) {
if (!is.null(seed)) {
old_seed <- get0(".Random.seed", globalenv())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this definitely ok to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihui: knitr seems to cache/restore .Random.seed, too. The caching in knitr is explicit, but it looks like the restore happens implicitly. Could you please comment if my way of restoring the random seed is safe?

with_seed <- function(seed, code) {
  if (!is.null(seed)) {
    old_seed <- get0(".Random.seed", globalenv())
    if (!is.null(old_seed)) {
      on.exit(assign(".Random.seed", old_seed, globalenv()), add = TRUE)
    }
  }
  code
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe (I'd add mode = 'integer' to get0(), though), but it seems that you forgot to set.seed(seed) before code is evaluated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The code worked by accident because resetting the random seed after applying the jitter also made it reproducible.

@hadley
Copy link
Member

hadley commented Jan 26, 2017

Could you please also add that example to the examples in the doc?

I also wonder if position_jitter() should be reproducible "by default", i.e. should we default to seed = sample(1e6)? (or whatever the correct value is)

@jeff-mettel
Copy link

@hadley @krlmlr - I agree with Hadley's suggestion. The ideal behavior would be a default seed and consistent jittering. Thanks guys!

@jimhester
Copy link
Contributor

👍 to this PR, also the correct value is 42.

@krlmlr
Copy link
Member Author

krlmlr commented Feb 2, 2017

@jimhester: Some think it's 4.

@hadley
Copy link
Member

hadley commented Feb 4, 2017

I still think that the seed should be random each time you call geom_jitter - you should have to explicitly opt in to reusing the seed by constructing your own position_jitter (or setting the seed).

Otherwise you'll get exactly the same offsets every single time. That's not terrible but it feels like a different geom to jitter (and you could just use an existing precomputed pseudo random sequence)

in this mode, the random seed is not changed, but reset after processing
@krlmlr
Copy link
Member Author

krlmlr commented Feb 4, 2017

It's difficult to achieve both: consistent jitter by default, and jitter dependent on random seed. I have added a mode (now default) that reuses the current random seed and resets it afterwards.

@hadley
Copy link
Member

hadley commented Feb 5, 2017

I think my original proposal to just pick a random seed would be simpler.

@krlmlr
Copy link
Member Author

krlmlr commented Feb 5, 2017

@hadley: Do you mean

position_jitter <- function(..., seed = sample.int(2147483647L, 1L))

? I suspect this isn't going to work as expected, two consecutive position_jitter() calls still will use a different initial random seed.

@hadley
Copy link
Member

hadley commented Feb 6, 2017

Right, that's what I mean. If you want to reuse the same jitter, you'll need to share a position_jitter() object.

@krlmlr
Copy link
Member Author

krlmlr commented Feb 7, 2017

Fixed and clarified example, the default now picks a random seed and advances the RNG.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make the one change and merge/rebase and then we'll get it in.

NEWS Outdated
@@ -1,3 +1,6 @@
* `position_jitter()` gains a `seed` argument that allows specifying a random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, bullet should go in NEWS.md

@hadley
Copy link
Member

hadley commented Jul 13, 2017

Bump

@karawoo karawoo merged commit d870275 into tidyverse:master Jul 27, 2017
slowkow added a commit to slowkow/ggplot2 that referenced this pull request Feb 9, 2018
In pull tidyverse#1996 position_jitter got a new argument called seed.

position_jitterdodge might also benefit from having a seed.
@lock
Copy link

lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants