Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching and reproducibilty: globally setting random seed #274

Closed
renozao opened this issue Jun 13, 2012 · 7 comments
Closed

Caching and reproducibilty: globally setting random seed #274

renozao opened this issue Jun 13, 2012 · 7 comments
Milestone

Comments

@renozao
Copy link

renozao commented Jun 13, 2012

Hi,

I saw on the NEWS file that you changed the way cache interact with .Random.seed.
I am sure there is a technical reason for the change, but the result is in my opinion really not practical (i.e. specifying set.seed in each cached chunk). What is preventing from restoring the random seed as after a cached computation?

To me this jeopardizes the reproducibility of knit documents.
e.g., the following chunks give different results if run twice, due to caching. Worse: the same random seed is used in both computation.

set.seed(1234)
x <- runif(3)
x
runif(3)
runif(3)

Result for first run is:

set.seed(1234)
x <- runif(3)
x
## [1] 0.1137 0.6223 0.6093
runif(3)
## [1] 0.6234 0.8609 0.6403
runif(3)
## [1] 0.009496 0.232551 0.666084

The second run gives:

set.seed(1234)
x <- runif(3)
x
## [1] 0.1137 0.6223 0.6093
runif(3)
## [1] 0.1137 0.6223 0.6093
runif(3)
## [1] 0.6234 0.8609 0.6403
@yihui
Copy link
Owner

yihui commented Jun 14, 2012

I'll think about it. I have a very different opinion with the majority of people on set.seed() and reproducibility. Thanks!

@renozao
Copy link
Author

renozao commented Jun 14, 2012

:)
I tend to think that different opinions are generally a good thing, and,
in software in particular, this can often be implemented as different
options, among which users can choose. This could be a nice way to solve
and enhance this issue, and satisfy both the majority of people ( ~ the
majority of users?) and personal preferences ;)

On 14/06/2012 06:47, Yihui Xie wrote:

I'll think about it. I have a very different opinion with the majority of people on set.seed() and reproducibility. Thanks!


Reply to this email directly or view it on GitHub:
#274 (comment)

@yihui
Copy link
Owner

yihui commented Jun 15, 2012

Yes I understand that. I'm at useR!2012 this week, and I'll try to find out an optimal way to solve this issue when I go back. Thanks!

@yihui
Copy link
Owner

yihui commented Jun 23, 2012

This is really a tricky issue. The root reason is that RNG (random number generation) modifies .Random.seed; this side-effect make it difficult to maintain reproducibility. For example, you first write chunks A and B which involve with RNG; later you insert C between A and B which also has RNG. In this case, the seed for B must be updated because C modified the seed, but with cache turned on, B will not be aware of the change, and subsequent chunks will still use the seed from B (because the seed is loaded from B's cache), which is wrong.

That is why I said in NEWS that if you want perfect reproducibility, you must set.seed() for every single chunk that involves with RNG.

@yihui
Copy link
Owner

yihui commented Jun 23, 2012

OK, I've got an idea on how to solve this issue: when the .Random.seed is changed, the chunk can be automatically updated. I will document how to do it later in http://yihui.name/knitr/demo/cache/

@yihui
Copy link
Owner

yihui commented Jun 23, 2012

Done. Caching random see should be safe if you read the last section of http://yihui.name/knitr/demo/cache/

@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants