New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TensorFlow compatible with PyPy #252

Open
FabHan opened this Issue Nov 17, 2015 · 17 comments

Comments

Projects
None yet
10 participants
@FabHan

FabHan commented Nov 17, 2015

I know it's not a priority and will be a long way to get there; but making TF compatible with PyPy woud be super cool.

Thoughts?

@vrv vrv added the enhancement label Nov 17, 2015

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Nov 19, 2015

Contributor

As a team we don't use PyPy day-to-day, but we would welcome contributions if it's easy to do this without breaking CPython compatibility.

My guess is that the two stumbling blocks would be TensorFlow's reliance on NumPy in the Python front-end, and SWIG for interfacing with the C++ backend. Are you aware of any other issues that one would face?

Contributor

mrry commented Nov 19, 2015

As a team we don't use PyPy day-to-day, but we would welcome contributions if it's easy to do this without breaking CPython compatibility.

My guess is that the two stumbling blocks would be TensorFlow's reliance on NumPy in the Python front-end, and SWIG for interfacing with the C++ backend. Are you aware of any other issues that one would face?

@girving

This comment has been minimized.

Show comment
Hide comment
@girving

girving Mar 8, 2016

Contributor

I'm not sure what the state of PyPy binding layers is these days, but there's a good chance this would require rewriting the whole swig interface.

Contributor

girving commented Mar 8, 2016

I'm not sure what the state of PyPy binding layers is these days, but there's a good chance this would require rewriting the whole swig interface.

@lvella

This comment has been minimized.

Show comment
Hide comment
@lvella

lvella Jul 13, 2016

PyPy only good interface with native code seems to be CFFI, but it is a C <-> Python interface library. Besides being slower than CPython native interface (when running on CPython, of course), it would be too painful to interface with C++ (Python <-> C <-> C++).

lvella commented Jul 13, 2016

PyPy only good interface with native code seems to be CFFI, but it is a C <-> Python interface library. Besides being slower than CPython native interface (when running on CPython, of course), it would be too painful to interface with C++ (Python <-> C <-> C++).

@girving

This comment has been minimized.

Show comment
Hide comment
@girving

girving Jul 13, 2016

Contributor

The medium term plan for language bindings is to go through the C API. We're in the process of moving functionality from Python to C++, at which point it will also be exposed through an extended C API. Moving stuff out of Python to C++ isn't relevant for PyPy, but the improved C API might make it easy to write a separate PyPy interface.

Contributor

girving commented Jul 13, 2016

The medium term plan for language bindings is to go through the C API. We're in the process of moving functionality from Python to C++, at which point it will also be exposed through an extended C API. Moving stuff out of Python to C++ isn't relevant for PyPy, but the improved C API might make it easy to write a separate PyPy interface.

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Jan 24, 2017

Great news from the side of PyPy: since version 5.6.0 (released in November) the devs included a compatibility layer for the CPython C-API (cpyext) (see also here). With this, I was able to build numpy, scipy and sklearn out-of-the box, so finally machine learning with pypy gets interesting.

Of course I wanted to then also check tensorflow, with pretty good results! The build completes successfully out-of-the box (!!), so no need to adapt any interfaces. However, import tensorflow then fails with the well-known "ImportError: No module named _pywrap_tensorflow" (of course not from the tensorflow directory). Looking into it, ldd -r _pywrap_tensorflow.so gave me some "undefined symbol: PyPy...." errors, indicating that just an additional link command during the build could fix the issue! Since I'm not so familiar with bazel, I didn't go further to resolve it...

classner commented Jan 24, 2017

Great news from the side of PyPy: since version 5.6.0 (released in November) the devs included a compatibility layer for the CPython C-API (cpyext) (see also here). With this, I was able to build numpy, scipy and sklearn out-of-the box, so finally machine learning with pypy gets interesting.

Of course I wanted to then also check tensorflow, with pretty good results! The build completes successfully out-of-the box (!!), so no need to adapt any interfaces. However, import tensorflow then fails with the well-known "ImportError: No module named _pywrap_tensorflow" (of course not from the tensorflow directory). Looking into it, ldd -r _pywrap_tensorflow.so gave me some "undefined symbol: PyPy...." errors, indicating that just an additional link command during the build could fix the issue! Since I'm not so familiar with bazel, I didn't go further to resolve it...

@girving

This comment has been minimized.

Show comment
Hide comment
@girving

girving Jan 24, 2017

Contributor

@classner Very cool! Presumably you have to link against a cpyext library. @martinwicke Who would be the right person to ask for assistance here?

Contributor

girving commented Jan 24, 2017

@classner Very cool! Presumably you have to link against a cpyext library. @martinwicke Who would be the right person to ask for assistance here?

@martinwicke

This comment has been minimized.

Show comment
Hide comment
@martinwicke

martinwicke Jan 24, 2017

Member

@classner can you try the cmake build? If you can make that work, then we know what needs to be added, and we can put it into the bazel build. Bazel really wants all its dependencies declared, so it would be good to have a clear picture of what exactly is missing.

Member

martinwicke commented Jan 24, 2017

@classner can you try the cmake build? If you can make that work, then we know what needs to be added, and we can put it into the bazel build. Bazel really wants all its dependencies declared, so it would be good to have a clear picture of what exactly is missing.

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Jan 25, 2017

Just posted on the pypy dev mailing list to get a bit more input from that side! Will post further info here...

classner commented Jan 25, 2017

Just posted on the pypy dev mailing list to get a bit more input from that side! Will post further info here...

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Jan 25, 2017

One of the devs was incredibly quick to reply: apparently, all the symbols I mentioned are part of libpypy-c.so, which is in the bin folder of the pypy distribution. I'm currently super-short on time and can't look at it right now, but could do sometime during the next weeks.

classner commented Jan 25, 2017

One of the devs was incredibly quick to reply: apparently, all the symbols I mentioned are part of libpypy-c.so, which is in the bin folder of the pypy distribution. I'm currently super-short on time and can't look at it right now, but could do sometime during the next weeks.

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Jan 31, 2017

Alright, managed to build with bazel and cmake successfully (v1.0.0-rc0). The trick to make it work is to rename the generated _pywrap_tensorflow.so to _pywrap_tensorflow.pypy-XX.so, where XX is the pypy version code (in my case 41 for pypy 5.6.0). With this naming scheme, import tensorflow works nicely. So second base touched.

This part of the 'get started' code runs:

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.global_variables_initializer()

# Launch the graph.
sess = tf.Session()

sess.run(init) fails with a TypeError:

TypeError: in method 'TF_DeleteBuffer', argument 1 of type 'TF_Buffer *'.

classner commented Jan 31, 2017

Alright, managed to build with bazel and cmake successfully (v1.0.0-rc0). The trick to make it work is to rename the generated _pywrap_tensorflow.so to _pywrap_tensorflow.pypy-XX.so, where XX is the pypy version code (in my case 41 for pypy 5.6.0). With this naming scheme, import tensorflow works nicely. So second base touched.

This part of the 'get started' code runs:

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.global_variables_initializer()

# Launch the graph.
sess = tf.Session()

sess.run(init) fails with a TypeError:

TypeError: in method 'TF_DeleteBuffer', argument 1 of type 'TF_Buffer *'.

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Feb 6, 2017

Any feedback on this from the interface developers? Could this TypeError be resolved easily?

classner commented Feb 6, 2017

Any feedback on this from the interface developers? Could this TypeError be resolved easily?

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Feb 6, 2017

Contributor

Can you provide more information about the TypeError? From looking at session.py, it seems the most likely candidate for raising this exception is this line. What type does run_metadata_ptr have in your version of the code?

Contributor

mrry commented Feb 6, 2017

Can you provide more information about the TypeError? From looking at session.py, it seems the most likely candidate for raising this exception is this line. What type does run_metadata_ptr have in your version of the code?

@classner

This comment has been minimized.

Show comment
Hide comment
@classner

classner Feb 6, 2017

The line in question is indeed causing the TypeError. This is what I get when adding an ipdb breakpoint just before:

    772       import ipdb; ipdb.set_trace()
--> 773       tf_session.TF_DeleteBuffer(run_metadata_ptr)
    774       if options:

ipdb> run_metadata_ptr
<tensorflow.python.pywrap_tensorflow.TF_Buffer;  >
ipdb> type(run_metadata_ptr)
<class 'tensorflow.python.pywrap_tensorflow.TF_Buffer'>
ipdb> n
TypeError: "in method 'TF_DeleteBuffer', argument 1 of type 'TF_Buffer *'"

(str and repr don't give any more helpful information)

classner commented Feb 6, 2017

The line in question is indeed causing the TypeError. This is what I get when adding an ipdb breakpoint just before:

    772       import ipdb; ipdb.set_trace()
--> 773       tf_session.TF_DeleteBuffer(run_metadata_ptr)
    774       if options:

ipdb> run_metadata_ptr
<tensorflow.python.pywrap_tensorflow.TF_Buffer;  >
ipdb> type(run_metadata_ptr)
<class 'tensorflow.python.pywrap_tensorflow.TF_Buffer'>
ipdb> n
TypeError: "in method 'TF_DeleteBuffer', argument 1 of type 'TF_Buffer *'"

(str and repr don't give any more helpful information)

@girving

This comment has been minimized.

Show comment
Hide comment
@girving

girving Feb 6, 2017

Contributor

I don't have anything constructive to say, but I can't resist chiming in to gripe about error messages in one of these two forms:

  1. "You gave me an X. I expected something else!"
  2. "I expected a Y. You gave me something else!"

Head...desk.

Contributor

girving commented Feb 6, 2017

I don't have anything constructive to say, but I can't resist chiming in to gripe about error messages in one of these two forms:

  1. "You gave me an X. I expected something else!"
  2. "I expected a Y. You gave me something else!"

Head...desk.

@mrry

This comment has been minimized.

Show comment
Hide comment
@mrry

mrry Feb 6, 2017

Contributor

@girving Unfortunately that error message is in generated code. Maybe take it up with the authors of https://github.com/swig/swig? :)

Judging by that generated code, we seem to be in case 2 ("I expected a TF_Buffer*"). SWIG converts this C type to and from a pointer-wrapper object. I don't know enough about PyPy or cpyext to know if this type could be getting swizzled somehow. Here are the relevant generated code fragments for the wrappers for TF_NewBuffer() and TF_DeleteBuffer():

SWIGINTERN PyObject *_wrap_TF_NewBuffer(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  TF_Buffer *result = 0 ;
  
  if (!PyArg_ParseTuple(args,(char *)":TF_NewBuffer")) SWIG_fail;
  {
    Py_BEGIN_ALLOW_THREADS;
    result = (TF_Buffer *)TF_NewBuffer();
    Py_END_ALLOW_THREADS;
  }
  resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_TF_Buffer, 0 |  0 );
  return resultobj;
fail:
  return NULL;
}

SWIGINTERN PyObject *_wrap_TF_DeleteBuffer(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  TF_Buffer *arg1 = (TF_Buffer *) 0 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  PyObject * obj0 = 0 ;
  
  if (!PyArg_ParseTuple(args,(char *)"O:TF_DeleteBuffer",&obj0)) SWIG_fail;
  res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_TF_Buffer, 0 |  0 );
  if (!SWIG_IsOK(res1)) {
    SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "TF_DeleteBuffer" "', argument " "1"" of type '" "TF_Buffer *""'"); 
  }
  arg1 = reinterpret_cast< TF_Buffer * >(argp1);
  {
    Py_BEGIN_ALLOW_THREADS;
    TF_DeleteBuffer(arg1);
    Py_END_ALLOW_THREADS;
  }
  resultobj = SWIG_Py_Void();
  return resultobj;
fail:
  return NULL;
}
Contributor

mrry commented Feb 6, 2017

@girving Unfortunately that error message is in generated code. Maybe take it up with the authors of https://github.com/swig/swig? :)

Judging by that generated code, we seem to be in case 2 ("I expected a TF_Buffer*"). SWIG converts this C type to and from a pointer-wrapper object. I don't know enough about PyPy or cpyext to know if this type could be getting swizzled somehow. Here are the relevant generated code fragments for the wrappers for TF_NewBuffer() and TF_DeleteBuffer():

SWIGINTERN PyObject *_wrap_TF_NewBuffer(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  TF_Buffer *result = 0 ;
  
  if (!PyArg_ParseTuple(args,(char *)":TF_NewBuffer")) SWIG_fail;
  {
    Py_BEGIN_ALLOW_THREADS;
    result = (TF_Buffer *)TF_NewBuffer();
    Py_END_ALLOW_THREADS;
  }
  resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_TF_Buffer, 0 |  0 );
  return resultobj;
fail:
  return NULL;
}

SWIGINTERN PyObject *_wrap_TF_DeleteBuffer(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  TF_Buffer *arg1 = (TF_Buffer *) 0 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  PyObject * obj0 = 0 ;
  
  if (!PyArg_ParseTuple(args,(char *)"O:TF_DeleteBuffer",&obj0)) SWIG_fail;
  res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_TF_Buffer, 0 |  0 );
  if (!SWIG_IsOK(res1)) {
    SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "TF_DeleteBuffer" "', argument " "1"" of type '" "TF_Buffer *""'"); 
  }
  arg1 = reinterpret_cast< TF_Buffer * >(argp1);
  {
    Py_BEGIN_ALLOW_THREADS;
    TF_DeleteBuffer(arg1);
    Py_END_ALLOW_THREADS;
  }
  resultobj = SWIG_Py_Void();
  return resultobj;
fail:
  return NULL;
}

@aselle aselle added type:feature and removed enhancement labels Feb 9, 2017

@standy66

This comment has been minimized.

Show comment
Hide comment
@standy66

standy66 Jul 20, 2017

Contributor

I was able to reproduce @classner steps for tensorflow 1.2 with CUDA for pypy3 version 5.8.0-beta0 (Python 3.5.3). In order to do this, _pywrap_tensorflow_internal.so library in site-packages/tensorflow/python needs to be renamed to _pywrap_tensorflow_internal.pypy3-58-x86_64-linux-gnu.so. I am getting a slightly different error during sess.run:

$ LD_LIBRARY_PATH=/usr/local/cuda/lib64 pypy3.5

Python 3.5.3 (a37ecfe5f142bc971a86d17305cc5d1d70abec64, Jun 08 2017, 19:43:54)
[PyPy 5.8.0-beta0 with GCC 6.3.0] on linux

>>>> import tensorflow as tf
>>>> x = tf.constant(1)
>>>> y = tf.constant(2)
>>>> z = x * y
>>>> with tf.Session() as sess:
....     sess.run(z)
....     
2017-07-20 14:43:47.795964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-20 14:43:47.796979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.936
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.67GiB
2017-07-20 14:43:47.796992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-07-20 14:43:47.796995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-07-20 14:43:47.797003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/andrew/.virtualenvs/tensorflow_pypy/site-packages/tensorflow/python/client/session.py", line 780, in run
    run_metadata_ptr = tf_session.TF_NewBuffer()
TypeError: argument after * must be an iterable, not NoneType

It seems that the first line in session.run fails to execute.

Contributor

standy66 commented Jul 20, 2017

I was able to reproduce @classner steps for tensorflow 1.2 with CUDA for pypy3 version 5.8.0-beta0 (Python 3.5.3). In order to do this, _pywrap_tensorflow_internal.so library in site-packages/tensorflow/python needs to be renamed to _pywrap_tensorflow_internal.pypy3-58-x86_64-linux-gnu.so. I am getting a slightly different error during sess.run:

$ LD_LIBRARY_PATH=/usr/local/cuda/lib64 pypy3.5

Python 3.5.3 (a37ecfe5f142bc971a86d17305cc5d1d70abec64, Jun 08 2017, 19:43:54)
[PyPy 5.8.0-beta0 with GCC 6.3.0] on linux

>>>> import tensorflow as tf
>>>> x = tf.constant(1)
>>>> y = tf.constant(2)
>>>> z = x * y
>>>> with tf.Session() as sess:
....     sess.run(z)
....     
2017-07-20 14:43:47.795964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-20 14:43:47.796979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.936
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.67GiB
2017-07-20 14:43:47.796992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-07-20 14:43:47.796995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-07-20 14:43:47.797003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/andrew/.virtualenvs/tensorflow_pypy/site-packages/tensorflow/python/client/session.py", line 780, in run
    run_metadata_ptr = tf_session.TF_NewBuffer()
TypeError: argument after * must be an iterable, not NoneType

It seems that the first line in session.run fails to execute.

@GSam

This comment has been minimized.

Show comment
Hide comment
@GSam

GSam Mar 10, 2018

I'm quite interested in this working under pypy. I didn't have any luck with CNTK either but I've discovered that Theano actually works under pypy (with some minor hacks), so for now, it seems I'll be using that instead.

Edit: Patches are now upstream for anyone interested.

GSam commented Mar 10, 2018

I'm quite interested in this working under pypy. I didn't have any luck with CNTK either but I've discovered that Theano actually works under pypy (with some minor hacks), so for now, it seems I'll be using that instead.

Edit: Patches are now upstream for anyone interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment