ValueError: invalid literal for int() with base 10: '-2.34575' #4

ArtanisTheOne · 2022-09-29T21:47:07Z

I've followed all instructions with a corpus size of around 300,000 (vocab 25,000) and keep on running into this issues (have tried multiple times, same problem). I've completed all pre-processing, model training etc successfully but the library just errors upon a specific entry in the source.vocab (below)

Do you have any idea how I can resolve my issue?

ymoslem · 2022-09-30T15:15:48Z

Hello!

It seems that you are using the vocab file created by SentencePiece. You should rather use the vocab file created by OpenNMT onmt_build_vocab

Alternatively, you can use the script spm_to_vocab.py to convert a SentencePiece vocab file to OpenNMT-py compilable format. If you rather use OpenNMT-tf, there is a command for this vocab conversion process.

I hope this helps.

Kind regards,
Yasmin

ymoslem closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: invalid literal for int() with base 10: '-2.34575' #4

ValueError: invalid literal for int() with base 10: '-2.34575' #4

ArtanisTheOne commented Sep 29, 2022

ymoslem commented Sep 30, 2022

ValueError: invalid literal for int() with base 10: '-2.34575' #4

ValueError: invalid literal for int() with base 10: '-2.34575' #4

Comments

ArtanisTheOne commented Sep 29, 2022

ymoslem commented Sep 30, 2022