Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with interactive.py #5

Closed
nicolabertoldi opened this issue Jun 10, 2019 · 12 comments
Closed

problem with interactive.py #5

nicolabertoldi opened this issue Jun 10, 2019 · 12 comments

Comments

@nicolabertoldi
Copy link

I follow the instruction to preprocess and train an engine with your code with and without srclm and trglm. And I succeded. I trained two models, one with srclm and tgtlm and one engine without.

Then, I tried to translate with any of the two models, but in both cases I failed.
Here are the two command I used

echo "ciao ciao ciao" | python3 ../interactive.py --remove-bpe REMOVE_BPE --raw-text --path engine_nolm/checkpoint_best.pt  --src-no-lm --tgt-no-lm --load-nmt --task lm_translation  data_generated 

echo "ciao ciao ciao" | python3 ../interactive.py --remove-bpe REMOVE_BPE --raw-text --path engine_lm/checkpoint_best.pt --task lm_translation data_generated --src-no-lm --tgt-no-lm --load-srclm-file lm_sl/checkpoint_best.pt  --load-tgtlm-file lm_tl/checkpoint_best.pt --load-nmt-file engine_lm/checkpoint_best.pt

What's wrong?
Which is the correct command to activate both src and tgt LM, the command to disable them?

@teslacool
Copy link
Owner

teslacool commented Jun 11, 2019

When inferring a sentence, we do not touch the lm. First use nmt's encoder to generate hidden states and then work on nmt's decoder. You should look carefully at the fairseq decoding steps.

And now i test a sentence and fixed a bug. You can use interactive.py like original fairseq without any other command parameters.

@nicolabertoldi
Copy link
Author

@teslacool
could you please give me an example how to run properly the command ./interactive.py

assuming that I have the following checkpoint models:
for srclm: ./lm_sl/checkpoint_best.pt
for tgtlm: ./lm_tl/checkpoint_best.pt
for transformer: ./engine/checkpoint_best.pt

I got these checkpoints by running the following preprocessing and training commands:

fairseq-preprocess --source-lang sl --target-lang tl --trainpref ./encoded_corpora//train --validpref ./encoded_corpora//dev --destdir ./data_generated --joined-dictionary

fairseq-train --task language_modeling --arch transformer_lm --lr-scheduler inverse_sqrt --lr-shrink 0.1 --warmup-updates 4000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer adam --lr 0.0001 --clip-norm 0.1 --criterion adaptive_loss --max-tokens 4096 --update-freq 8 --seed 1 --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d --save-interval-updates 1000 --keep-interval-updates 10 --no-epoch-checkpoints --attention-dropout 0.1 --dropout 0.3 --criterion label_smoothed_cross_entropy --save-dir ./lm_sl/ ./data_generated_sl

fairseq-train --task language_modeling --arch transformer_lm --lr-scheduler inverse_sqrt --lr-shrink 0.1 --warmup-updates 4000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer adam --lr 0.0001 --clip-norm 0.1 --criterion adaptive_loss --max-tokens 4096 --update-freq 8 --seed 1 --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d --save-interval-updates 1000 --keep-interval-updates 10 --no-epoch-checkpoints --attention-dropout 0.1 --dropout 0.3 --criterion label_smoothed_cross_entropy --save-dir ./lm_tl/ ./data_generated_tl

python3 ../train.py ./data_generated --task lm_translation --arch transformer_iwslt_de_en --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 --lr 0.0009 --min-lr 1e-09 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 2084 --update-freq 8 --save-dir ./engine --save-interval-updates 1000 --seed 200 --tradeoff 0.15 --load-lm --load-srclm-file ./lm_sl/checkpoint_best.pt --load-tgtlm-file ./lm_tl/checkpoint_best.pt --lmdecoder-ffn-embed-dim 2048 --no-epoch-checkpoints --keep-interval-updates 10

Note that all these scripts succedeed.

@teslacool
Copy link
Owner

run

echo "Danke dir ." | python interactive.py data-bin/iwslt14.tokenized.de-en     --path  checkpoints/transformer/checkpoint_best.pt --buffer-size 1024     --batch-size 128 --beam 5 --remove-bpe  | grep ^H | cut -f3-

you will get

thank you .

if your model is deen task.

@nicolabertoldi
Copy link
Author

@teslacool

after your fix, it works

echo "ciao ciao ciao" | python3 ../interactive.py --remove-bpe --raw-text --path ./engine/checkpoint_best.pt --task lm_translation --src-no-lm --tgt-no-lm --load-srclm-file ./lm_sl/checkpoint_best.pt  --load-tgtlm-file ./lm_tl/checkpoint_best.pt  ./data_generated 

thank you

@nicolabertoldi
Copy link
Author

@teslacool

I have a few more questions:

  • why do I need do specify the data-bin directory? Only for the dictionary? or is there another reason?

@teslacool
Copy link
Owner

yes, you need dict to get id for embedding matrix.

@nicolabertoldi
Copy link
Author

  • I specify --remove-bpe, but my output still contains the _ symbols; why does this happen?

@teslacool
Copy link
Owner

remove bpe is to remove @@

@nicolabertoldi
Copy link
Author

so it is correct that my output looks like

I_ am_ a_ newbie_ ._

@nicolabertoldi
Copy link
Author

actually I get

I_ am_ a_ new bie_ ._

@teslacool
Copy link
Owner

it looks like not correct. I do not know why your sentences consist of words following _.

Because I and am are not a single word, the symbol _ was not added by bpe operation.

@nicolabertoldi
Copy link
Author

@teslacool

sorry, my fault.

I close the issue by now,
but probably I will have more questions in short time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants