-
-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
knitr engine API and cache compatibility with reticulate engine #1505
Comments
I'm afraid I'll have to defer this issue to @kevinushey (author of the python engine in reticulate). |
For what it's worth, I never tried wiring in cache support into the reticulate engine as I wasn't exactly sure what that would entail, but it sounds like we'd need:
|
Thanks @kevinushey. Something else to look into is
And the reason I opened the issue here is because I think If this is something you'd like for reticulate, I'd be interested in helping out with PRs. I'd like to be able write Python with rmarkdown using all the knitr features. |
I'd definitely be open to reviewing a PR, but it seems like this will be tough to get right and I unfortunately won't have that much time to help with the actual implementation in the coming months. |
Hi! I'm working on it (by sheer necessity). There are some some serious problems on the Beyond basic usage, I find that a Python cache engine for |
I think whether or not this is a knitr or reticulate bug depends on the knitr engine API, which I do not completely understand. I carefully searched for this bug in the knitr and reticulate issues and didn't see anything, so I apologize if this is already known.
The bug
Suppose we have the following file called
python_test.Rmd
:When you first press the Knit button the document compiles successfully.
Now suppose I change chunk2, so it has
print(x + 10)
and I save the file. If I try clicking the Knit button I get the following error:My efforts to debug
Here's what I've learned about the error:
It reliably happens when I call
knitr::knit('python_test.Rmd', envir = new.env())
after a session restart, so I don't think it is anrmarkdown::render
error.The error message points to
py_run_string_impl
, which is areticulate
function. But I believe the problem arises beforeknitr
reaches the python engine.When you call
knitr::knit('python_test.Rmd', envir = new.env())
, chunk1 eventually enters thecall_block
function inknitr/R/block.R
. It passesif (params$cache > 0)
, and the hash comes up with the same value. Thencache$load
tries to bring the saved data intoknit_global
.At this point, if you
ls(knit_global())
you'll seecharacter(0)
. So it isn't clear if the python objectx=1
was even saved.However, whether it was saved doesn't even matter because it doesn't get a chance to use it. When
chunk2
starts down the same path, its hash has changed so it moves ontoblock_exec
. If this were R code, it would have access to thecache$load
ed objects from chunk1 withenv = knit_global()
, but non-R engines go down a separate branch.block_exec
tells the reticulate engine to executeprint(x + 9)
, but it fails because it doesn't knowx
. You can verify this ineng_python_synchronize_before
by checking to see ifmain
containsx
, which it doesn't (assuming you alledknit
after a session restart). The only thing that passes to theeng_python
isoptions
which as far as I can tell doesn't include any environment information such asx=1
.What is not clear to me
Despite crawling through the reticulate source code, when
cache=FALSE
, I'm not actually sure how the state is saved between chunks. Each time a python chunk is executed byeng_python
reticulate/R/knitr-engine.R
it callsimport_main
which provides an objectmain
that has the previous chunk's variables (x = 1
), but I don't see where this data is saved chunk to chunk.I know the
main
data has to be saved somewhere, because if you don't restart the session after callingknitr::knit('python_test.Rmd', envir = new.env())
themain
variable will still containx = 1
, even though theknit(..)
call never runsx = 1
or loads it into memory since it was cached.What to do?
I know everyone's busy, so I'm happy to help by making a PR but it is not clear to me how to fix this.
Ideas:
Factor out the chunk hashing. If
autodep=FALSE
, then if any cache fails you have toblock_exec
every chunk.Rethink the engine API, so the language can provide loading/saving for its chunks. In fact, reticulate already supports pickle, which is a python package that can save and load python objects. As far as I can tell, this would still require refactoring some knitr code since engines don't touch caching at all at the moment, as far as I can tell.
Suppose
.RData
is able to save the python object (I don't know if possible). Similar to number 2, you could slightly change the engine API so you pass that object to the engine to process.Session data
The text was updated successfully, but these errors were encountered: