dynamic_decode raise Segmentation default & inference.load_data filter some lines #22

hxsnow10 · 2017-07-20T18:15:35Z

Hello,

I use en-zh data like in the tmp.zip
put these files in /tmp/nmt_data

python -m nmt.nmt --src=zh --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/dev2 --dev_prefix=/tmp/nmt_data/dev2 --test_prefix=/tmp/nmt_data/dev2 --out_dir=/tmp/nmt_model_zh2en --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=ble

then i get a Segmentation fault result.it should be reproduced.

After some bug fix, i location the problem lieing in dynamic_decode , line 326 of model.py. But i can't go on to solve it.
Can you give some suggestions? Really Thanks.

Another small problem is inference.load_data filter some lines, which make loaded data of zh and en has different lens.

notes: tmp.zip is placed in https://github.com/hxsnow10/nmt_problem

mingfengwuye · 2017-07-21T01:12:13Z

I download the tmp.zip file, and there is nothing in it?

hxsnow10 · 2017-07-21T05:28:59Z

@mingfengwuye tmp.zip is updated.

mingfengwuye · 2017-07-21T08:18:46Z

@hxsnow10 I have download the tmp.zip file and begin to train model untill now It works well. The final result will be next week. One more question, how could get the vocab.zh or vocab.en file. Before, I get vocab follow the wmt16_en_de.sh, but it seems different with that.

hxsnow10 · 2017-07-21T13:07:03Z

@mingfengwuye some strange, try use my update bigger data, I make sure i meet Segmentation fault this time without anychange to code. my vocab is count from en-zh corpus myself.

By the way, my system is centos7, tensorflow version 1.2.1, i run on cpu, memory is enough.

python -m nmt.nmt --src=zh --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/dev3 --dev_prefix=/tmp/nmt_data/dev3 --test_prefix=/tmp/nmt_data/dev3 --out_dir=/tmp/nmt_model_zh2en --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=ble

 # Job id 0
# hparams:
  src=zh
  tgt=en
  train_prefix=/tmp/nmt_data/dev3
  dev_prefix=/tmp/nmt_data/dev3
  test_prefix=/tmp/nmt_data/dev3
  out_dir=/tmp/nmt_model_zh2en
# Vocab file /tmp/nmt_data/vocab.zh exists
# Vocab file /tmp/nmt_data/vocab.en exists
  saving hparams to /tmp/nmt_model_zh2en/hparams
  saving hparams to /tmp/nmt_model_zh2en/best_ble/hparams
  attention=
  attention_architecture=standard
  batch_size=128
  beam_width=0
  best_ble=0
  best_ble_dir=/tmp/nmt_model_zh2en/best_ble
  bpe_delimiter=None
  colocate_gradients_with_ops=True
  decay_factor=0.98
  decay_steps=10000
  dev_prefix=/tmp/nmt_data/dev3
  dropout=0.2
  encoder_type=uni
  eos=</s>
  epoch_step=0
  forget_bias=1.0
  infer_batch_size=32
  init_weight=0.1
  learning_rate=1.0
  length_penalty_weight=0.0
  log_device_placement=False
  max_gradient_norm=5.0
  max_train=0
  metrics=['ble']
  num_buckets=5
  num_gpus=1
  num_layers=2
  num_residual_layers=0
  num_train_steps=12000
  num_units=128
  optimizer=sgd
  out_dir=/tmp/nmt_model_zh2en
  pass_hidden_state=True
  random_seed=None
  residual=False
  share_vocab=False
  sos=<s>
  source_reverse=False
  src=zh
  src_max_len=50
  src_max_len_infer=None
  src_vocab_file=/tmp/nmt_data/vocab.zh
  src_vocab_size=459879
  start_decay_step=0
  steps_per_external_eval=None
  steps_per_stats=100
  test_prefix=/tmp/nmt_data/dev3
  tgt=en
  tgt_max_len=50
  tgt_max_len_infer=None
  tgt_vocab_file=/tmp/nmt_data/vocab.en
  tgt_vocab_size=570651
  time_major=True
  train_prefix=/tmp/nmt_data/dev3
  unit_type=lstm
  vocab_prefix=/tmp/nmt_data/vocab
# creating train graph ...
  num_layers = 2, num_residual_layers=0
  cell 0  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/gpu:0
  cell 0  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/gpu:0
  start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
  embeddings/encoder/embedding_encoder:0, (459879, 128), 
  embeddings/decoder/embedding_decoder:0, (570651, 128), 
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651), /device:GPU:0
# creating eval graph ...
  num_layers = 2, num_residual_layers=0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
  embeddings/encoder/embedding_encoder:0, (459879, 128), 
  embeddings/decoder/embedding_decoder:0, (570651, 128), 
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651), /device:GPU:0
# creating infer graph ...
  num_layers = 2, num_residual_layers=0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/gpu:0
  start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
  embeddings/encoder/embedding_encoder:0, (459879, 128), 
  embeddings/decoder/embedding_decoder:0, (570651, 128), 
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
  dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
  dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651), 
# log_file=/tmp/nmt_model_zh2en/log_1500642274
2017-07-21 21:04:34.111711: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111760: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111771: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111780: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111790: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
  created train model with fresh parameters, time 0.00s.
2017-07-21 21:04:34.636797: I tensorflow/core/common_runtime/simple_placer.cc:675] Ignoring device specification /job:localhost/replica:0/task:0/device:GPU:0 for node 'gradients/dynamic_seq2seq/decoder/decoder/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayGrad/TensorArrayGradV3/Enter' because the input edge from 'dynamic_seq2seq/decoder/decoder/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0
  created infer model with fresh parameters, time 0.01s.
  # 14506
    src: 尽管 明确 的 分辨 因果 关系 有一 些 困难 ， 还是 有一 些 证据 表明 ， 建立 了 财政 规则 体系 的 国家 具有 更为 合理 的 财政 状况 。
    ref: There is some evidence that countries with fiscal rules have sounder public finances , though it is tricky to separate cause from effect .
    nmt: ZigBee acti acti bonderizing party.169 superstitious superstitious superstitious 05/23/08 05/23/08 05/23/08 kvetching kvetching herbut Hamidzada Hamidzada Hamidzada Hamidzada Hamidzada Hamidzada Nudibranchs Nudibranchs present.family IMBALANCE September-1 September-1 September-1 interleaving interleaving interleaving interleaving cIients cIients Bootloader Piaohong Piaohong Piaohong discovery.But prayerWe prayerWe carpas carpas carpas 57、Hope 57、Hope 57、Hope evidenced-based highs.The yesterdayhave DVE DVE deathA.She satellite-to-ground satellite-to-ground satellite-to-ground infantsweresix died.Because died.Because died.Because insK'fiSnt
  created eval model with fresh parameters, time 0.00s.
Segmentation fault

mingfengwuye · 2017-07-24T00:28:21Z

@hxsnow10 I train the model in the weekend, and trainning completed without any error. I did not change the source code. OS is Ubuntu, cuda 8.0 + nvidia-375, tensorflow version is 1.2.1. My script is:
python -m nmt.nmt
--src=zh --tgt=en --vocab_prefix=./temp/nmt_problem/tmp/vocab --train_prefix=./temp/nmt_problem/tmp/dev2 --dev_prefix=./temp/nmt_problem/tmp/dev2 --test_prefix=./temp/nmt_problem/tmp/dev2 --out_dir=./temp/nmt_problem/nmt_model_tmp --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu

hxsnow10 · 2017-07-24T02:35:04Z

@mingfengwuye i'm sorry, can you try use dev3 and cpu once..., thanks...

python -m nmt.nmt --src=zh --tgt=en --vocab_prefix=./temp/nmt_problem/tmp/vocab --train_prefix=./temp/nmt_problem/tmp/dev3 --dev_prefix=./temp/nmt_problem/tmp/dev3 --test_prefix=./temp/nmt_problem/tmp/dev3 --out_dir=./temp/nmt_problem/nmt_model_tmp --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu --num_gpus 0

mingfengwuye · 2017-07-24T03:21:14Z

@hxsnow10 I will try it later.

hxsnow10 · 2017-07-24T19:14:58Z

@mingfengwuye I find in my environment and cpu, the problem lies in crossent = tf.nn.sparse_softmax_cross_entropy_with_logits( in model.py, when max_time is big like 39, it will raise Segmentation fault error.
When i use tf.one_hot and tf.nn.softmax_cross_entropy_with_logits, thie error disappear.

mingfengwuye · 2017-07-26T02:23:38Z

@hxsnow10 How could you get vocab. Could you give me some guides? Thank you very much.

hxsnow10 · 2017-07-26T08:15:21Z

@mingfengwuye I can't catch you much.. My vocab is count and sort word by several english-chineses corpus from http://www.statmt.org/wmt17/translation-task.html#download after tokenize(using nltk and chineses tokenizer).

Segmentation fault is because sprase.softmax not support big tensor, so at last i try batch_size=32 with sparse.softmax, it works.

Another question: what iterator.source looks like, when i make source_reverse=False
word0,... wordk
does it look like [id0, id1, ..., idk, id_< /s >, id_< /s > ..id_< /s > ] or what? thanks!

zhangpengGenedock · 2017-09-22T15:35:39Z

@hxsnow10 can you give a complete and clear solution? I face the same question too.

hxsnow10 · 2017-10-12T08:13:34Z

@zhangpengGenedock use small batch_size。

hxsnow10 mentioned this issue Jul 24, 2017

tf.nn.sparse_softmax_cross_entropy_with_logits raise Segmentation fault tensorflow/tensorflow#11723

Closed

This was referenced Sep 26, 2018

Try to translate Chinese to English, then encounter this error "Segmentation fault (core dumped) " #178

Open

Segmentation fault (core dumped) while training Bodo (brx) to English (en) #320

Open

kaidi-jin mentioned this issue Mar 26, 2019

How to process Chinese txt in Chinese-English traslation #435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic_decode raise Segmentation default & inference.load_data filter some lines #22

dynamic_decode raise Segmentation default & inference.load_data filter some lines #22

hxsnow10 commented Jul 20, 2017 •

edited

mingfengwuye commented Jul 21, 2017

hxsnow10 commented Jul 21, 2017

mingfengwuye commented Jul 21, 2017

hxsnow10 commented Jul 21, 2017

mingfengwuye commented Jul 24, 2017

hxsnow10 commented Jul 24, 2017

mingfengwuye commented Jul 24, 2017

hxsnow10 commented Jul 24, 2017 •

edited

mingfengwuye commented Jul 26, 2017

hxsnow10 commented Jul 26, 2017 •

edited

zhangpengGenedock commented Sep 22, 2017

hxsnow10 commented Oct 12, 2017

dynamic_decode raise Segmentation default & inference.load_data filter some lines #22

dynamic_decode raise Segmentation default & inference.load_data filter some lines #22

Comments

hxsnow10 commented Jul 20, 2017 • edited

mingfengwuye commented Jul 21, 2017

hxsnow10 commented Jul 21, 2017

mingfengwuye commented Jul 21, 2017

hxsnow10 commented Jul 21, 2017

mingfengwuye commented Jul 24, 2017

hxsnow10 commented Jul 24, 2017

mingfengwuye commented Jul 24, 2017

hxsnow10 commented Jul 24, 2017 • edited

mingfengwuye commented Jul 26, 2017

hxsnow10 commented Jul 26, 2017 • edited

zhangpengGenedock commented Sep 22, 2017

hxsnow10 commented Oct 12, 2017

hxsnow10 commented Jul 20, 2017 •

edited

hxsnow10 commented Jul 24, 2017 •

edited

hxsnow10 commented Jul 26, 2017 •

edited