Eager execution does not work for R interface under Python 3 #20701

jjallaire · 2018-07-11T15:57:32Z

Hi there, I am the maintainer of the R interface to TensorFlow. We are currently in the process of porting various Eager examples to R. We haven't had trouble with Python 2 versions of TensorFlow, but with Python 3 versions we get some strange errors.

I realize that this is within the R interface so technically falls outside of the scope of TF for Python. However, in order for us to address this we need some insight as to what might be different for Eager under Python 3. I'll provide a detailed repro and explanation of it's under the hood behavior below.

cc @martinwicke @random-forests

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): TensorFlow v1.10.0-dev20180710
Python version: 3.6.5
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: N/A
GPU model and memory: N/A
Exact command to reproduce: See below

Describe the problem

Using the R interface to TensorFlow:

library(tensorflow)
tf$enable_eager_execution()
x <- tf$constant(1)
tf$add(x, x)

Results in this error:

SystemError: <built-in function TFE_Py_FastPathExecute> returned a result with an error set

This error occurs within the definition of add() within gen_math_ops.py:

  _result = _pywrap_tensorflow.TFE_Py_FastPathExecute(
        _ctx._context_handle, _ctx._eager_context.device_name, "Add", name,
        _ctx._post_execution_callbacks, x, y)

This code works as expected under TF w/ Python 2.

Again, I realize that this is the R interface so you might not have an intuition about what could be wrong. You can think of the R interface conceptually as just using the C Python API to invoke functions. So in the above code we are essentially using:

PyImport_Import to import the tensorflow module
PyObject_CallFunctionObjArgs to call Python functions (e.g. tf.enable_eager_execution, tf.constant, etc.)

My theory is that under Python 3 there is something being done at the Python language level that we aren't emulating or capture when calling through the Python C interface. Hopefully this provides you with some clues as to what that might be and we will be able to make whatever changes are required to make this work within R.

The text was updated successfully, but these errors were encountered:

martinwicke · 2018-07-11T18:44:35Z

I have seen this error before (not in TF) when I made some mistake I believe related to reference counting in connection with exceptions. But I can't say what's wrong here. I also think that Py3 is stricter about what it lets you get away with, which is probably why you only see this with Py3.

Added Alex, who has more context on Eager specifically.

alextp · 2018-07-11T18:47:17Z

Interesting. @akshaym can you take a look at this? Maybe FastPathExecute is doing something funny.

@jjallaire , can you also show us some more information about what error is being set? If you print the TF_Status error code it'll be really helpful.

jjallaire · 2018-07-11T19:25:05Z

Interesting. To test whether the R interface might be leaving an error set before calling I added a pre-emptive call to PyErr_Clear() right before we call PyObject_CallFunctionObjArgs(). Unfortunately the error is still occurring so it may be that there is a Python error occurring somewhere within the call to TFE_Py_FastPathExecute.

So this particular error condition could be somewhat of a red herring: i.e. there is an error occurring during execution b/c some precondition is not met when calling via the C API but we don't see it.

Looking at the code for TFE_Py_FastPathExecute() (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/pywrap_tfe_src.cc#L2232) there are lots of reasons an error might occur but I don't have any intuition about which of these might only be tickled when calling a Python op via the C API as opposed to the Python interpreter.

jjallaire · 2018-07-11T19:28:08Z

Is there a way to get the TF_Status error code from the Python interface?

martinwicke · 2018-07-11T19:34:28Z

(sorry)

alextp · 2018-07-11T19:35:36Z

We usually raise an exception from the TF_Status error code. It might be easier to print it in TFE_FastPathExecute though since I don't know how exceptions work across the python/R boundary.

…

On Wed, Jul 11, 2018 at 12:32 PM J.J. Allaire ***@***.***> wrote: Is there a way to get the TF_Status error code from the Python interface? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#20701 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAATxY5EXDgGgbQmlfonTpVNT2ywBcObks5uFlKxgaJpZM4VLaa0> .

-- - Alex

akshaym · 2018-07-11T20:12:56Z

Hi @jjallaire, Do you have steps to replicate your environment?

I'm pretty unfamiliar with how to set up R (I tried the r-base docker, but wasn't able to install TF using https://tensorflow.rstudio.com/tensorflow/).

Some questions I have:
Since the TF py3 docker has 3.5.2 (and your version is 3.6.5), can you check that this error doesn't occur from the python interpreter directly? If so, it might be easier for me to debug that.
Does it happen with all ops (perhaps try a matmul)?
Does tf.constant(1) return a reasonable looking Tensor?

(I'm trying to answer the first one myself, but responded in case its easy enough for you to run)

jjallaire · 2018-07-11T20:44:52Z

The error doesn't occur when I execute from the Python interpreter directly.

It does appear to happen with other ops (I tried tf.subtract and tf.matmul and it occurred for both of those). For tf.matmul the error is slightly different, it occurs on this line of code:

with ops.name_scope(name, "MatMul", [a, b]) as name:

The specific error is:

SystemError: <class 'tensorflow.python.framework.ops.name_scope'> returned a result with an error set

tf$constant(1) does in fact return a reasonable looking tensor.

Here's how I would suggest replicating:

Start from a system that already has TF for Python installed and working.
Install R

Install the R tensorflow package from the R console:

install.packages("tensorflow", repos = "https://cran.rstudio.com")

Execute this R script:

library(tensorflow)
tf$enable_eager_execution()
x <- tf$constant(1)
tf$add(x,x)

R should be able to find your installation of TensorFlow. If it's in a virtualenv you may need to add this to give it a hint:

library(tensorflow)
use_virtualenv("/path/to/virtualenv")

jjallaire · 2018-07-11T20:48:26Z

To install R on Debian just do this:

sudo apt-get install r-base

jjallaire · 2018-07-11T20:48:49Z

Then to run R:

jjallaire · 2018-07-11T20:53:05Z

So to summarize:

$ sudo apt-get install r-base
$ R

Then from within R:

> install.packages("tensorflow", repos = "https://cran.rstudio.com")
> library(tensorflow)
> use_virtualenv("/path/to/virtualenv") # if necessary
> tf$enable_eager_execution()
> x <- tf$constant(1)
> tf$add(x,x)

Or, after installing the R tensorflow packages w/ install.packages(), just put the following in a text file e.g. "eager.R":

library(tensorflow)
use_virtualenv("/path/to/virtualenv") # if necessary
tf$enable_eager_execution()
x <- tf$constant(1)
tf$add(x,x)

And then execute:

$ Rscript eager.R

If you need to keep the process alive for debugging then you can go into R and do this:

source("eager.R")

akshaym · 2018-07-11T22:31:27Z

Thanks @jjallaire!

I'm able to reproduce with your steps.

The following fails for me though:

library(tensorflow)
tf$enable_eager_execution()
x <- tf$constant(1.0)
print(x) # fails with "returned a result with an error set" in EagerTensor_datatype_enum
print(x) # the second call succeeds on the same tensor.

So it seems as if the constant call is actually returning with an error set (regardless, the tf$add calls also fail after this). So something isn't working right there.

I'll try to spend some more time on this soon.

jjallaire · 2018-07-11T22:57:08Z

Okay, great!

In terms of a conceptual model, think of the R interface as just using the C API to do everything. So perhaps there is some side-effect or state associated with using Eager via the Python interpreter that is not being replicated? Just a hunch about one angle to consider. Hopefully once you can see the actual failure on the C side everything will become clear!

akshaym · 2018-07-12T18:48:24Z

@jjallaire,

This seems to happen since the EagerTensor python type doesn't have a __module__, which is accessed here: https://github.com/rstudio/reticulate/blob/c9e222fc709a6dcbdda586bbb49d93757fc86086/src/python.cpp#L287

I'm looking into how to include a __module__ attribute for the EagerTensor class (looks like we might just need a fully qualified name for the EagerTensor).

I also sent a pull request to fix reticulate to not generate the python error in any case: rstudio/reticulate#312 (I'm not entirely sure how to actually test this).

I'll leave this open till I get the __module__ to work for EagerTensor.

jjallaire · 2018-07-12T19:19:47Z

Brilliant!!! So happy we figured this out.

Here is the fix I made in reticulate: rstudio/reticulate@b1728da#diff-849f590f08cba3094f269e0a4d69398d

I only know of one other case in the wild where a Python object didn't have a module so clients are likely conditioned to expect it (whether or not it is formally required, I'm guessing it isn't).

I'll watch for this issue to be closed and sync to the new module name for EagerTensor (right now we default it to something generic but we'll want to use whatever you end up with once it's checked in.

martinwicke assigned alextp Jul 11, 2018

martinwicke closed this as completed Jul 11, 2018

martinwicke reopened this Jul 11, 2018

akshaym self-assigned this Jul 11, 2018

tensorflow-copybara closed this as completed in fd04b76 Jul 16, 2018

peteboothroyd mentioned this issue Dec 18, 2018

Attribute Error when finding modules pytest-dev/pyfakefs#460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eager execution does not work for R interface under Python 3 #20701

Eager execution does not work for R interface under Python 3 #20701

jjallaire commented Jul 11, 2018

martinwicke commented Jul 11, 2018

alextp commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

martinwicke commented Jul 11, 2018

alextp commented Jul 11, 2018 via email

akshaym commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

akshaym commented Jul 11, 2018

jjallaire commented Jul 11, 2018

akshaym commented Jul 12, 2018 •

edited

jjallaire commented Jul 12, 2018

Eager execution does not work for R interface under Python 3 #20701

Eager execution does not work for R interface under Python 3 #20701

Comments

jjallaire commented Jul 11, 2018

System information

Describe the problem

martinwicke commented Jul 11, 2018

alextp commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

martinwicke commented Jul 11, 2018

alextp commented Jul 11, 2018 via email

akshaym commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

jjallaire commented Jul 11, 2018

akshaym commented Jul 11, 2018

jjallaire commented Jul 11, 2018

akshaym commented Jul 12, 2018 • edited

jjallaire commented Jul 12, 2018

akshaym commented Jul 12, 2018 •

edited