Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Problem in Running Example Code for translate_enzh_wmt8k #347

@heronyang

Description

@heronyang

I tried to run the example commands in the README file on translate_enzh_wmt8k. But, I was not able to get Chinese words after decoding. The result contains some byte characters separated by white spaces but Chinese characters.

Commands I used for data generating, training and decoding:

sudo apt-get update
sudo apt-get install -y python-pip
pip install --upgrade pip
pip install --user tensorflow tensor2tensor

PROBLEM=translate_enzh_wmt8k
MODEL=transformer
HPARAMS=transformer_base_single_gpu

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=$DECODE_FILE

The decoded result:

$ cat ~/t2t_data/decode_this.txt.transformer.transformer_base_single_gpu.translate_enzh_wmt8k.beam4.alpha0.6.decodes
� � � � � � � � � � � � � � � � � � � � � � � � \ 2 0 1 9 9
� � � � � � � � � � � � � � � � � � � � � � � � \ 2 0 1 9 9
$ od ~/t2t_data/decode_this.txt.transformer.transformer_base_single_gpu.translate_enzh_wmt8k.beam4.alpha0.6.decodes
0000000 020344 020270 020252 020344 020270 020252 020344 020273
0000020 020247 020344 020273 020247 020344 020273 020247 020344
0000040 020275 020264 020344 020275 020264 020344 020275 020264
0000060 020134 020062 020060 020061 020071 005071 020344 020270
0000100 020252 020344 020270 020252 020344 020273 020247 020344
0000120 020270 020252 020344 020273 020247 020344 020275 020264
0000140 020344 020275 020264 020344 020275 020264 020134 020062
0000160 020060 020061 020071 005071
0000170

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions