diff --git a/_posts/2011-12-04-cache.md b/_posts/2011-12-04-cache.md index 58f187f9e5..74d2d21e2b 100644 --- a/_posts/2011-12-04-cache.md +++ b/_posts/2011-12-04-cache.md @@ -66,3 +66,18 @@ The issue [#238](https://github.com/yihui/knitr/issues/238) shows another good u Sometimes we may want to use different cache directories for different input files by default, and there is one solution in issue [#234](https://github.com/yihui/knitr/issues/234). However, I still recommend you to do this setting inside your source document to make it self-contained (use `opts_chunk$set(cache.path = ...)`). +## Reproducibility with RNG + +**Knitr** also caches `.Random.seed` and it is restored before the evaluation of each chunk to maintain reproducibility of chunks which involve with random number generation (RNG). However, there is a problem here. Suppose chunk A and B have been cached; now if we insert a chunk C between A and B (all three chunks have RNG in them), in theory B should be updated because RNG modifies `.Random.seed` as a side-effect, but in fact B will not be updated; in other words, the reproducibility of B is bogus. + +To guarantee reproducibility with RNG, we need to associate `.Random.seed` with cache; whenever it is modified, the chunk must be updated. It is easy to do so by using an _unevaluated_ R expression in the `cache.extra` option, e.g. + +{% highlight r %} +opts_knit$set(cache.extra = quote({ + if (exists('.Random.seed', envir = globalenv())) + get('.Random.seed', envir = globalenv()) +})) +{% endhighlight %} + +Here `quote()` protects an expression from being evaluated (important!!), but inside **knitr** this expression will be evaluated. In this case, each chunk will first check if `.Random.seed` has been changed since the last run; a different `.Random.seed` will force the current chunk to rebuild cache. +