Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnist_tpu.py errors #3724

Closed
hyoo opened this issue Mar 23, 2018 · 2 comments
Closed

mnist_tpu.py errors #3724

hyoo opened this issue Mar 23, 2018 · 2 comments

Comments

@hyoo
Copy link

hyoo commented Mar 23, 2018

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • What is the top-level directory of the model you are using:
    official/mnist

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    No

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Linux tpu-vm 4.9.0-6-amd64 initial commit, simple, separated models #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
    This is GCP vm and configured to use TPU

  • TensorFlow installed from (source or binary):
    use ml-images while creating vm

  • TensorFlow version (use command below):
    1.6.0 ('v1.6.0-0-gd2e24b6039')

  • Bazel version (if compiling from source):

  • CUDA/cuDNN version:

  • GPU model and memory:

  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

('v1.6.0-0-gd2e24b6039', '1.6.0')

Describe the problem

While running mnist.py works, but mnist_tpu.py --tpu_name '<tpu_name>' gives error below,

Source code / logs

$ python mnist_tpu.py --tpu_name 'tpu-node-1'
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to n p.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpPrbzhd
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
log_device_placement: true
, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.train
ing.server_lib.ClusterSpec object at 0x7f0cd7de4810>, '_evaluation_master': u'grpc://10.240.1.2:8470', '_save_checkpoints_steps': None, '_kee
p_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tpu_config': TPUConfig(iterations_per_loop=50, num_shards=8, p
er_host_input_for_training=True, tpu_job_name=None, initial_infeed_sleep_secs=None), '_tf_random_seed': None, '_master': u'grpc://10.240.1.2:
8470', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/tmpPrbzhd', '_save_summary_steps': 100}
INFO:tensorflow:Calling model_fn.
Downloading https://storage.googleapis.com/cvdf-datasets/mnist/train-images-idx3-ubyte.gz to train-images-idx3-ubyte.gz
Downloading https://storage.googleapis.com/cvdf-datasets/mnist/train-labels-idx1-ubyte.gz to train-labels-idx1-ubyte.gz
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:TPU job name tpu_worker
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Init TPU system
INFO:tensorflow:Start infeed thread controller
INFO:tensorflow:Starting infeed thread controller.
INFO:tensorflow:Start outfeed thread controller
INFO:tensorflow:Starting outfeed thread controller.
INFO:tensorflow:Enqueue next (50) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (50) batch(es) of data from outfeed.
WARNING:tensorflow:
Error occurred during infeed/outfeed. This may be due to a compile error in the main session. Waiting for a short time for the main session
to come back.
File system scheme '[local]' not implemented (file: 'train-images-idx3-ubyte')
[[Node: input_pipeline_task0/IteratorGetNext = IteratorGetNextoutput_shapes=[[1024,784], [1024]], output_types=[DT_FLOAT, DT_INT32]
, _device="/job:tpu_worker/replica:0/task:0/device:CPU:0"
]]
Caused by op u'input_pipeline_task0/IteratorGetNext', defined at:
File "mnist_tpu.py", line 176, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
UnimplementedError (see above for traceback): File system scheme '[local]' not implemented (file: 'train-images-idx3-ubyte')
_sys.exit(main(argv))
File "mnist_tpu.py", line 166, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2065, in _model_fn
input_holders.generate_infeed_enqueue_ops_and_dequeue_fn())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1149, in generate_infeed_enqueue_ops_and_dequeue_fn
self._invoke_input_fn_and_record_structure())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1202, in _invoke_input_fn_and_record_structure
self._batch_axis, host_device))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 918, in generate_per_host_enqueue_ops_fn_for_host
inputs = _Inputs.from_input_fn(input_fn())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2036, in _input_fn
return input_fn(**kwargs)
File "mnist_tpu.py", line 116, in train_input_fn
images, labels = ds.make_one_shot_iterator().get_next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 330, in get_next
name=name)), self._output_types,
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 866, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

UnimplementedError (see above for traceback): File system scheme '[local]' not implemented (file: 'train-images-idx3-ubyte')
[[Node: input_pipeline_task0/IteratorGetNext = IteratorGetNextoutput_shapes=[[1024,784], [1024]], output_types=[DT_FLOAT, DT_INT32], _device="/job:tpu_worker/replica:0/task:0/device:CPU:0"]]

ERROR:tensorflow:Feed error: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 666, in _run_infeed
session.run(self._enqueue_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
UnimplementedError: File system scheme '[local]' not implemented (file: 'train-images-idx3-ubyte')
[[Node: input_pipeline_task0/IteratorGetNext = IteratorGetNextoutput_shapes=[[1024,784], [1024]], output_types=[DT_FLOAT, DT_INT32], _device="/job:tpu_worker/replica:0/task:0/device:CPU:0"]]

Caused by op u'input_pipeline_task0/IteratorGetNext', defined at:
File "mnist_tpu.py", line 176, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "mnist_tpu.py", line 166, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2065, in _model_fn
input_holders.generate_infeed_enqueue_ops_and_dequeue_fn())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1149, in generate_infeed_enqueue_ops_and_dequeue_fn
self._invoke_input_fn_and_record_structure())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1202, in _invoke_input_fn_and_record_structure
self._batch_axis, host_device))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 918, in generate_per_host_enqueue_ops_fn_for_host
inputs = _Inputs.from_input_fn(input_fn())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2036, in _input_fn
return input_fn(**kwargs)
File "mnist_tpu.py", line 116, in train_input_fn
images, labels = ds.make_one_shot_iterator().get_next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 330, in get_next
name=name)), self._output_types,
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 866, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

UnimplementedError (see above for traceback): File system scheme '[local]' not implemented (file: 'train-images-idx3-ubyte')
[[Node: input_pipeline_task0/IteratorGetNext = IteratorGetNextoutput_shapes=[[1024,784], [1024]], output_types=[DT_FLOAT, DT_INT32], _device="/job:tpu_worker/replica:0/task:0/device:CPU:0"]]

@hyoo hyoo closed this as completed Mar 24, 2018
@deepakmeena635
Copy link

how did you resolve this issue?

@bmd-drepecka
Copy link

Is that related to this: https://cloud.google.com/tpu/docs/troubleshooting#cannot_use_local_filesystem? I am guessing you would need to modify code so that it uses GCP buckets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants