--decode_to_file does not create output file #48

mehmedes · 2017-06-25T21:28:26Z

During inference, I'm not able to create a file containing the inference output.
I've tried --decode_to_file, but no output file is being created...

The text was updated successfully, but these errors were encountered:

lukaszkaiser · 2017-06-25T22:25:25Z

It should be sufficient to just use --decode_from_file=path, it creates a file names path.decodes... where in ... it puts the model name and so on. Did you try that?

mehmedes · 2017-06-25T22:32:48Z

Actually, I did.
This is my decoding command:

PROBLEM=wmt_ende_bpe32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS


BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --decode_beam_size=$BEAM_SIZE \
  --decode_alpha=$ALPHA \
  --decode_from_file /.../newsdev2016.tok.bpe.32000.en

But I couldn't find any output file anywhere.

mehmedes · 2017-06-25T22:35:04Z

Now, I see. It's in the tmp folder. Thanks!

mehmedes · 2017-06-25T22:38:02Z

While translating a file, I got the following error. What could this be connected with:

File "/usr/local/bin/t2t-trainer", line 83, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/bin/t2t-trainer", line 79, in main
    schedule=FLAGS.schedule)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
    run_locally(exp_fn(output_dir))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
    decode_from_file(estimator, FLAGS.decode_from_file)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 645, in decode_from_file
    result_iter = estimator.predict(input_fn=input_fn.next, as_iterable=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
    features = self._get_features_from_input_fn(input_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
    result = input_fn()
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 725, in _decode_batch_input_fn
    input_ids = vocabulary.encode(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 132, in encode
    ret = [self._token_to_id[tok] for tok in sentence.strip().split()]
KeyError: '\xc3\x88'

mehmedes · 2017-06-25T22:56:05Z

O, that's only a character issue while translating a word containing the character "È"!

cshanbo · 2017-06-26T06:56:20Z

Hi all,
I met the same issues.

Indeed, the output will be in a file in TMP directory. But the --decode_to_file doesn't work.
the KeyError: '\xc3\x88' exception raised here.

I think this is caused by the OOVs like @mehmedes said. I'm not sure whether the vocab.bpe.32000 in this data can cover the pre-processed training data in English to German translation task.

I used my data, where the training data contains some unknown words out of vocabulary. I instead rewrote the code to something like:

def encode(self, sentence):
    """Converts a space-separated string of tokens to a list of ids."""
    ret = [self._token_to_id[tok] if tok in self._token_to_id \ 
           else self._token_to_id['UNK'] for tok in sentence.strip().split()]
    return ret[::-1] if self._reverse else ret

Where I can ensure the UNK symbol in my vocabulary.
I'm testing this setting to see if it works.
Any advice will be helpful

Thank you

lukaszkaiser · 2017-06-26T18:27:11Z

Does this only happen with BPE, or with the standard "tokens_32k" too? We don't have a built-in tokenizer for BPE, it was used only for papers to have perplexities comparable with other papers. It cannot be detokenized, so I believe it's better to use our own tokenizer. Or is the problem the same?

mehmedes closed this as completed Jun 25, 2017

cshanbo mentioned this issue Jun 26, 2017

Issue when trying to decode a file that was not part of the training #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--decode_to_file does not create output file #48

--decode_to_file does not create output file #48

mehmedes commented Jun 25, 2017

lukaszkaiser commented Jun 25, 2017

mehmedes commented Jun 25, 2017 •

edited

mehmedes commented Jun 25, 2017

mehmedes commented Jun 25, 2017

mehmedes commented Jun 25, 2017

cshanbo commented Jun 26, 2017 •

edited

lukaszkaiser commented Jun 26, 2017

--decode_to_file does not create output file #48

--decode_to_file does not create output file #48

Comments

mehmedes commented Jun 25, 2017

lukaszkaiser commented Jun 25, 2017

mehmedes commented Jun 25, 2017 • edited

mehmedes commented Jun 25, 2017

mehmedes commented Jun 25, 2017

mehmedes commented Jun 25, 2017

cshanbo commented Jun 26, 2017 • edited

lukaszkaiser commented Jun 26, 2017

mehmedes commented Jun 25, 2017 •

edited

cshanbo commented Jun 26, 2017 •

edited