This repository was archived by the owner on Jul 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
Problem in Running Example Code for translate_enzh_wmt8k #347
Copy link
Copy link
Closed
Description
I tried to run the example commands in the README file on translate_enzh_wmt8k. But, I was not able to get Chinese words after decoding. The result contains some byte characters separated by white spaces but Chinese characters.
Commands I used for data generating, training and decoding:
sudo apt-get update
sudo apt-get install -y python-pip
pip install --upgrade pip
pip install --user tensorflow tensor2tensor
PROBLEM=translate_enzh_wmt8k
MODEL=transformer
HPARAMS=transformer_base_single_gpu
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
t2t-datagen \
--data_dir=$DATA_DIR \
--tmp_dir=$TMP_DIR \
--problem=$PROBLEM
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR
DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
BEAM_SIZE=4
ALPHA=0.6
t2t-decoder \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
--decode_from_file=$DECODE_FILE
The decoded result:
$ cat ~/t2t_data/decode_this.txt.transformer.transformer_base_single_gpu.translate_enzh_wmt8k.beam4.alpha0.6.decodes
� � � � � � � � � � � � � � � � � � � � � � � � \ 2 0 1 9 9
� � � � � � � � � � � � � � � � � � � � � � � � \ 2 0 1 9 9
$ od ~/t2t_data/decode_this.txt.transformer.transformer_base_single_gpu.translate_enzh_wmt8k.beam4.alpha0.6.decodes
0000000 020344 020270 020252 020344 020270 020252 020344 020273
0000020 020247 020344 020273 020247 020344 020273 020247 020344
0000040 020275 020264 020344 020275 020264 020344 020275 020264
0000060 020134 020062 020060 020061 020071 005071 020344 020270
0000100 020252 020344 020270 020252 020344 020273 020247 020344
0000120 020270 020252 020344 020273 020247 020344 020275 020264
0000140 020344 020275 020264 020344 020275 020264 020134 020062
0000160 020060 020061 020071 005071
0000170
Metadata
Metadata
Assignees
Labels
No labels