Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom_op (Registered only GPU kernel) failed to load #48

Open
jb892 opened this issue Apr 25, 2019 · 1 comment
Open

custom_op (Registered only GPU kernel) failed to load #48

jb892 opened this issue Apr 25, 2019 · 1 comment

Comments

@jb892
Copy link

jb892 commented Apr 25, 2019

Hi,

I'm new to tensorflow serving. I'm trying to serving my trained model via simple_tensorflow_serving. However, after I run next line command, it failed to recognize the custom_ops that only registed with GPU kernels.

simple_tensorflow_serving --model_base_path="./models/pointnet2_sem_seg/" --custom_op_paths="./custom_ops/" --session_config='{"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}'

Result:

2019-04-25 10:26:55 INFO     custom_op_paths: ./custom_ops/
2019-04-25 10:26:55 INFO     debug: False
2019-04-25 10:26:55 INFO     enable_cors: True
2019-04-25 10:26:55 INFO     model_config_file: 
2019-04-25 10:26:55 INFO     host: 0.0.0.0
2019-04-25 10:26:55 INFO     secret_key: secret.key
2019-04-25 10:26:55 INFO     model_name: default
2019-04-25 10:26:55 INFO     port: 8500
2019-04-25 10:26:55 INFO     enable_auth: False
2019-04-25 10:26:55 INFO     model_platform: tensorflow
2019-04-25 10:26:55 INFO     reload_models: False
2019-04-25 10:26:55 INFO     enable_colored_log: False
2019-04-25 10:26:55 INFO     log_level: info
2019-04-25 10:26:55 INFO     auth_username: admin
2019-04-25 10:26:55 INFO     auth_password: admin
2019-04-25 10:26:55 INFO     model_base_path: ./models/pointnet2_sem_seg/
2019-04-25 10:26:55 INFO     gen_client: 
2019-04-25 10:26:55 INFO     bind: 0.0.0.0:8500
2019-04-25 10:26:55 INFO     session_config: {"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}
2019-04-25 10:26:55 INFO     download_inference_images: True
2019-04-25 10:26:55 INFO     secret_pem: secret.pem
2019-04-25 10:26:55 INFO     enable_ssl: False
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_grouping_so.so
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_interpolate_so.so
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_sampling_so.so
2019-04-25 10:26:55.137247: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-04-25 10:26:55.140876: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2019-04-25 10:26:55 INFO     Put the model version: 1 online, path: ./models/pointnet2_sem_seg/1
INFO:tensorflow:Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
2019-04-25 10:26:55 INFO     Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
    self._extend_graph()
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

	 [[{{node layer1/FarthestPointSample}} = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

	 [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Caused by op 'layer1/FarthestPointSample', defined at:
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 697, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
    **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

	 [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 351, in load
    self.restore_variables(sess, saver, import_scope)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 303, in restore_variables
    saver.restore(sess, self._variables_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1582, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

	 [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Caused by op 'layer1/FarthestPointSample', defined at:
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 697, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
    **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

	 [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Have anyone has come across this issue? What should I do next?

@jb892
Copy link
Author

jb892 commented Apr 25, 2019

It seems that the GPU is not activated during restoring from checkpoint, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant