Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get matching files #47

Closed
liuziyi219 opened this issue Jan 20, 2020 · 4 comments
Closed

Failed to get matching files #47

liuziyi219 opened this issue Jan 20, 2020 · 4 comments

Comments

@liuziyi219
Copy link

When I ran run_cmrc_drcd.py, there was a problem that "Failed to get matching files" when create checkpoint. I guess it's because there isn't xlnet_model.ckpt in pretrained modal files. I changed the xlnet_modal.ckpt.meta into xlnet_modal.ckpt. Still, it can not find xlnet_modal.ckpt.

INFO:tensorflow:Create CheckpointSaverHook.
I0120 13:58:49.598975 140028613015424 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
I0120 13:58:50.135126 140028613015424 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:TPU job name tpu_worker
I0120 13:58:53.244385 140028613015424 tpu_estimator.py:506] TPU job name tpu_worker
INFO:tensorflow:Graph was finalized.
I0120 13:58:55.594104 140028613015424 monitored_session.py:240] Graph was finalized.
ERROR:tensorflow:Error recorded from training_loop: From /job:tpu_worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/drive/My Drive/chinese_xlnet_mid_L-24_H-768_A-12/xlnet_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/drive/My Drive/chinese_xlnet_mid_L-24_H-768_A-12/xlnet_model.ckpt')
	 [[node checkpoint_initializer_117 (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'checkpoint_initializer_117':
  File "content/drive/My Drive/Chinese-PreTrained-XLNet-master/src/run_cmrc_drcd.py", line 1292, in <module>
    tf.app.run()
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "content/drive/My Drive/Chinese-PreTrained-XLNet-master/src/run_cmrc_drcd.py", line 1193, in main
    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3184, in _model_fn
    scaffold = _get_scaffold(scaffold_fn)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3749, in _get_scaffold
    scaffold = scaffold_fn()
  File "content/drive/My Drive/Chinese-PreTrained-XLNet-master/src/model_utils.py", line 77, in tpu_scaffold
    tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
    init_from_checkpoint_fn)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call
    return self._merge_call(merge_fn, args, kwargs)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in <lambda>
    ckpt_dir_or_file, assignment_map)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint
    _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer
    _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer
    ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()
@ymcui
Copy link
Owner

ymcui commented Jan 21, 2020

It is xlnet_model.ckpt not xlnet_modal.ckpt.
Would you please check your command lines to see if your model names are correct.
Also, you are encouraged to post your command lines here so that I can look into what's wrong.

@liuziyi219
Copy link
Author

It is xlnet_model.ckpt not xlnet_modal.ckpt.
Would you please check your command lines to see if your model names are correct.
Also, you are encouraged to post your command lines here so that I can look into what's wrong.

Yes, I copy the directory of the file

here is my command line

xlnet="python -u /content/drive/'My Drive'/Chinese-PreTrained-XLNet-master/src/run_cmrc_drcd.py \
	--spiece_model_file=/content/drive/'My Drive'/spiece.model \
	--model_config_path=/content/drive/'My Drive'/xlnet_config.json \
	--init_checkpoint=/content/drive/'My Drive'/chinese_xlnet_mid_L_24_H_768_A_12/xlnet_model.ckpt \
	--use_tpu=True \
	--num_hosts=1 \
	--num_core_per_host=8 \
	--output_dir=/content/drive/'My Drive'/cmrc2018-master/squad-style-data \
	--model_dir=/content/drive/'My Drive'/chinese_xlnet_mid_L-24_H-768_A-12 \
	--predict_dir=/content/drive/'My Drive'/chinese_xlnet_mid_L-24_H-768_A-12/eval \
	--train_file=/content/drive/'My Drive'/cmrc2018-master/squad-style-data/cmrc2018_train.json \
	--uncased=False \
	--max_answer_length=40 \
	--max_seq_length=512 \
	--do_train=True \
	--train_batch_size=16 \
	--do_predict=False \
	--predict_batch_size=16 \
	--learning_rate=3e-5 \
	--adam_epsilon=1e-6 \
	--iterations=1000 \
	--save_steps=2000 \
	--train_steps=2400 \
	--warmup_steps=240"
!{xlnet}

I ran it in colab, would it be the cause?
I could initialize from the ckpt

I0120 14:25:44.918656 139675852879744 model_utils.py:71] Initialize from the ckpt /content/drive/My Drive/chinese_xlnet_mid_L_24_H_768_A_12/xlnet_model.ckpt
INFO:tensorflow:**** Global Variables ****
I0120 14:25:44.925628 139675852879744 model_utils.py:85] **** Global Variables ****
INFO:tensorflow:  name = model/transformer/r_w_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
I0120 14:25:44.925852 139675852879744 model_utils.py:91]   name = model/transformer/r_w_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_r_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
I0120 14:25:44.926049 139675852879744 model_utils.py:91]   name = model/transformer/r_r_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 768), *INIT_FROM_CKPT*
I0120 14:25:44.926191 139675852879744 model_utils.py:91]   name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_s_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
I0120 14:25:44.926325 139675852879744 model_utils.py:91]   name = model/transformer/r_s_bias:0, shape = (24, 12, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/seg_embed:0, shape = (24, 2, 12, 64), *INIT_FROM_CKPT*

But after graph was finalized, it shows that failed to get matching files

@ymcui
Copy link
Owner

ymcui commented Jan 21, 2020

OK, I see.
As far as I know, if you are using TPU for computing, the checkpoint should be loaded from GCS (Google Cloud Storage) instead of local file system.
The GCS path looks like 'gs://your-bucket-name/dir/file'.
You are advised to refer to the BERT fine-tuning tutorial on Colab.

@liuziyi219
Copy link
Author

Thank you,my problem was solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants