How to run the reverse_demical40 task? #3

0b01 · 2017-06-18T18:01:31Z

Here is my run script:

[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ cat run.sh 
PROBLEM=algorithmic_reverse_decimal40
MODEL=baseline_lstm_seq2seq
HPARAMS=basic1
DATA_DIR=./t2t_data
TMP_DIR=./t2t_datagen
TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# mv $TMP_DIR/tokens.vocab.32768 $DATA_DIR

# Train
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --beam_size=$BEAM_SIZE \
  --alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes

Output:

[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ bash run.sh 
INFO:tensorflow:Generating training data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-train.
INFO:tensorflow:Generating development data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-dev.
INFO:tensorflow:Shuffling data...
INFO:tensorflow:read: 10000
INFO:tensorflow:read: 20000
INFO:tensorflow:read: 30000
INFO:tensorflow:read: 40000
INFO:tensorflow:read: 50000
INFO:tensorflow:read: 60000
INFO:tensorflow:read: 70000
INFO:tensorflow:read: 80000
INFO:tensorflow:read: 90000
INFO:tensorflow:read: 100000
INFO:tensorflow:write: 0
INFO:tensorflow:write: 10000
INFO:tensorflow:write: 20000
INFO:tensorflow:write: 30000
INFO:tensorflow:write: 40000
INFO:tensorflow:write: 50000
INFO:tensorflow:write: 60000
INFO:tensorflow:write: 70000
INFO:tensorflow:write: 80000
INFO:tensorflow:write: 90000
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:Registry contents:

  Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']

  HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']

  RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
  
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f670e058c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 0.418 sec.
INFO:tensorflow:This model_fn took 0.649 sec.
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_0                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_10                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_11                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_12                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_13                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_14                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_15                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_1                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_2                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_3                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_4                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_5                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_6                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_7                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_8                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_9                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_0                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_10                                              shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_11                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_12                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_13                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_14                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_15                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_1                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_2                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_3                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_4                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_5                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_6                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_7                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_8                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_9                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_0                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_10                                           shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_11                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_12                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_13                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_14                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_15                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_1                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_2                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_3                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_4                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_5                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_6                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_7                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_8                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_9                                            shape    (1, 64)                size    64
INFO:tensorflow:Total trainable variables size: 266304
INFO:tensorflow:Total embedding variables size: 0
INFO:tensorflow:Total non-embedding variables size: 266304
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-18 14:00:13.936233: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936254: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936261: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936270: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936276: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:14.068278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-18 14:00:14.068727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.57GiB
2017-06-18 14:00:14.068740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-06-18 14:00:14.068744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-06-18 14:00:14.068749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
2017-06-18 14:00:17.068205: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4488 get requests, put_count=3034 evicted_count=1000 eviction_rate=0.329598 and unsatisfied allocation rate=0.569073
2017-06-18 14:00:17.068238: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into ./t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1/model.ckpt.
INFO:tensorflow:loss = inf, step = 1
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
    
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
    
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 562, in run_locally
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
    _, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 960, in run
    run_metadata=run_metadata))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 477, in after_run
    raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
INFO:tensorflow:Registry contents:

  Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']

  HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']

  RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
  
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f35ad359c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
INFO:tensorflow: batch 1
INFO:tensorflow:Deocding batch 0
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
    
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
    
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 623, in run_locally
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
    features = self._get_features_from_input_fn(input_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
    result = input_fn()
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 743, in _decode_batch_input_fn
  File "build/bdist.linux-x86_64/egg/tensor2tensor/data_generators/text_encoder.py", line 60, in encode
ValueError: invalid literal for int() with base 10: 'Goodbye'
cat: ./t2t_data/decode_this.txt.baseline_lstm_seq2seq.basic1.beam4.alpha0.6.decodes: No such file or directory

The text was updated successfully, but these errors were encountered:

rsepassi · 2017-06-18T19:40:31Z

Looks like the issue is that training is unstable and the loss hits nan. Probably need some different hyperoarameter settings. I'll investigate and get back to you but in the meantime, feel free to fiddle with the learning rate and other learning settings.

rsepassi · 2017-06-18T19:42:23Z

You can override individual hparam settings by flag: --hparams='learning_rate=0.1,another_hparam=blah'

0b01 · 2017-06-18T19:48:34Z

Could you give an example of the reverse task using transformer?

Here is my run.sh. The loss goes down to 0.00001 but its output is [].

[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ cat run.sh 
PROBLEM=algorithmic_reverse_decimal40
MODEL=transformer
HPARAMS=transformer_tiny
DATA_DIR=./t2t_data
TMP_DIR=./t2t_datagen
TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

mv $TMP_DIR/tokens.vocab.32768 $DATA_DIR

# Train
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "8 7 2 6 8 5 2 10 5 1 9 1 8 2 6 10 1 9 10 1 8 7 10 3 9 9 2" > $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=10 \
  --beam_size=$BEAM_SIZE \
  --alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes

lukaszkaiser · 2017-06-19T04:35:51Z

I tried and I believe it's a decoding problem -- we use 1 to mean "end of sequence" in decoding, but the algorithmic generator only avoids 0s (padding). Will try to prepare a fix soon, thanks for reporting the problem!

lukaszkaiser · 2017-06-21T05:07:43Z

@RickyHan -- the most recent 1.0.4 version should include all needed corrections to make the above instructions work well. I tried and I find that transformer still has some problems with determining the end of inputs, as it's not marked in the algorithmic tasks. So it sometimes reverses a bit too much, but otherwise seems to work. I'm closing this, but could you please test and let me know if it works for you? And if it doesn't, please re-open. Thanks!

Update

lukaszkaiser closed this as completed Jun 21, 2017

lukaszkaiser pushed a commit that referenced this issue Jun 29, 2017

Merge pull request #3 from tensorflow/master

3b244fa

Update

cbockman mentioned this issue May 24, 2018

Extremely poor performance w/ large embedding matrices (sort of fixed, but plz see inside) #833

Closed

cbockman mentioned this issue Nov 1, 2018

Universal Transformer appears to be buggy and not converging correctly #1191

Closed

etragas-fathom mentioned this issue Mar 13, 2019

Transformer step/sec decrease over time to 0 #1484

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run the reverse_demical40 task? #3

How to run the reverse_demical40 task? #3

0b01 commented Jun 18, 2017

rsepassi commented Jun 18, 2017

rsepassi commented Jun 18, 2017

0b01 commented Jun 18, 2017 •

edited

lukaszkaiser commented Jun 19, 2017

lukaszkaiser commented Jun 21, 2017

How to run the reverse_demical40 task? #3

How to run the reverse_demical40 task? #3

Comments

0b01 commented Jun 18, 2017

rsepassi commented Jun 18, 2017

rsepassi commented Jun 18, 2017

0b01 commented Jun 18, 2017 • edited

lukaszkaiser commented Jun 19, 2017

lukaszkaiser commented Jun 21, 2017

0b01 commented Jun 18, 2017 •

edited