Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPyC with Tensorflow: AttributeError when saving model #431

Open
mtsdalmolin opened this issue Jan 19, 2021 · 0 comments
Open

RPyC with Tensorflow: AttributeError when saving model #431

mtsdalmolin opened this issue Jan 19, 2021 · 0 comments
Assignees
Labels
To Start Description reviewed and a maintainer needs "to start" triage

Comments

@mtsdalmolin
Copy link

mtsdalmolin commented Jan 19, 2021

Hey, everyone!

I'm trying to use RPyC with Tensorflow to distribute machine learning steps in a cluster. Right now, i'm struggling to save a trained model. When i call model.save inside a node, it throws the error below:

Traceback (most recent call last):
  File "test_MLFV.py", line 82, in <module>
    single(sys.argv[1])
  File "test_MLFV.py", line 58, in single
    x = send_chain(c,p)
  File "test_MLFV.py", line 48, in send_chain
    ret = con.root.exec_chain(c,p)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 253, in __call__
    return syncreq(_self, consts.HANDLE_CALL, args, kwargs)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 76, in syncreq
    return conn.sync_request(handler, proxy, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 469, in sync_request
    return self.async_request(handler, *args, timeout=timeout).value
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/async_.py", line 102, in value
    raise self._obj
AttributeError: 'list' object has no attribute '__name__'

========= Remote Traceback (3) =========
Traceback (most recent call last):
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 320, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 593, in _handle_call
    return obj(*args, **dict(kwargs))
  File "MLFV_Module.py", line 20, in exposed_exec_chain
    x = parse_chain(c, p, db)
  File "/home/matheus/Desktop/mlfv-3.0/server/MLFV_Parsing.py", line 9, in parse_chain
    return parse_seq(c,p,db)
  File "/home/matheus/Desktop/mlfv-3.0/server/MLFV_Parsing.py", line 20, in parse_seq
    parse_seq(i,p,db)
  File "/home/matheus/Desktop/mlfv-3.0/server/MLFV_Parsing.py", line 25, in parse_seq
    return exec_chain_function(c, p, ret, obj, pp, db)
  File "/home/matheus/Desktop/mlfv-3.0/server/MLFV_Manager.py", line 24, in exec_chain_function
    r = send_function(con, cc) # send the function to be executed there
  File "/home/matheus/Desktop/mlfv-3.0/server/MLFV_Manager.py", line 10, in send_function
    run = rpyc.utils.classic.teleport_function(con, obj.run)(obj)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 253, in __call__
    return syncreq(_self, consts.HANDLE_CALL, args, kwargs)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 76, in syncreq
    return conn.sync_request(handler, proxy, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 469, in sync_request
    return self.async_request(handler, *args, timeout=timeout).value
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/async_.py", line 102, in value
    raise self._obj
AttributeError: 'list' object has no attribute '__name__'

========= Remote Traceback (2) =========
Traceback (most recent call last):
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 320, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 593, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/home/matheus/Desktop/mlfv-3.0/server/training.py", line 51, in run
    model.save('trained_model.h5')
  File "/home/matheus/.local/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save
    signatures, options)
  File "/home/matheus/.local/lib/python2.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model
    model, filepath, overwrite, include_optimizer)
  File "/home/matheus/.local/lib/python2.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 103, in save_model_to_hdf5
    v, default=serialization.get_json_type).encode('utf8')
  File "/usr/lib/python2.7/json/__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/home/matheus/.local/lib/python2.7/site-packages/tensorflow_core/python/util/serialization.py", line 54, in get_json_type
    return obj.__name__
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 166, in __getattr__
    return syncreq(self, consts.HANDLE_GETATTR, name)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/netref.py", line 76, in syncreq
    return conn.sync_request(handler, proxy, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 469, in sync_request
    return self.async_request(handler, *args, timeout=timeout).value
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/async_.py", line 102, in value
    raise self._obj
AttributeError: 'list' object has no attribute '__name__'

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 320, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 609, in _handle_getattr
    return self._access_attr(obj, name, (), "_rpyc_getattr", "allow_getattr", getattr)
  File "/home/matheus/.local/lib/python2.7/site-packages/rpyc/core/protocol.py", line 537, in _access_attr
    return accessor(obj, name, *args)
AttributeError: 'list' object has no attribute '__name__'

I also tried to pickle the trained model and got this error:
TypeError: can't pickle SwigPyObject objects

Environment
  • rpyc 4.1.5
  • python 2.7
  • ubuntu 20.04
  • tensorflow >= 2.1.0
Minimal example

The code example can be found here:
https://github.com/mlfv-ufsm/mlfv-3.0/tree/feature/tensorflow-rpyc

To start the server, has to access server directory and run:
python2.7 MLFV_Module.py

Then, you have to init clients through init_client.py script. Go to client directory and run:
python2.7 init_client.py localhost 15089 "numpy,pandas,tensorflow" "os,sys,timeit,numpy,pandas,tensorflow" 256000000 2 100

To run a function chain, go to server directory again and run:
python2.7 test_MLFV.py

I'm not sure if it's a problem with rpyc or tensorflow because the error traceback brings both tensorflow and rpyc packages.

The most strange thing is that i built a boilerplate to test it and it worked fine. The boilerplate has the following code:
Server

import rpyc
from rpyc.utils.server import ThreadedServer

class TFService(rpyc.Service):
  def exposed_train(self):
    import tensorflow as tf
    mnist = tf.keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    model.fit(x_train, y_train, epochs=5)

    model.save('/tmp/trained_model.h5')
    return 'saved model'

  def exposed_load(self):
    import tensorflow as tf
    loaded_model = tf.keras.models.load_model('/tmp/trained_model.h5')
    loaded_model.summary()
    return 'loaded model'


if __name__ == "__main__":
  rpyc.lib.setup_logger()
  server = ThreadedServer(TFService, port=12345, backlog=10, protocol_config=rpyc.core.protocol.DEFAULT_CONFIG)
  server.start()

Client

from __future__ import print_function
import rpyc
import sys

if __name__ == "__main__":
  func = sys.argv[1]
  c = rpyc.connect("localhost", 12345)
  exec('print(c.root.{}())'.format(func))

So, what's the matter with my code that it doesn't work in my application, throwing AttributeError? Is it something related to rpyc or something i did wrong in my implementation?

@comrumino comrumino self-assigned this Feb 16, 2021
@comrumino comrumino added the To Start Description reviewed and a maintainer needs "to start" triage label Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To Start Description reviewed and a maintainer needs "to start" triage
Projects
None yet
Development

No branches or pull requests

2 participants