Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en' #20

ulhaqi12 · 2021-03-10T12:08:24Z

Hi,
I was trying to translate 19203 sentence data from german to English using the translate_stream method explained in the following link.
https://github.com/UKPLab/EasyNMT/blob/main/examples/translation_streaming.py

I set the chunk size to 32. After successfully translating 3 chunks and writing output on file it gave an error of the model. Can you guide me with this issue? I am pasting error wording down here.

  0%|▌                                                | 96/19203 [00:54<3:00:03,  1.77it/s]
Exception: Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en'. Make sure that:

- 'Helsinki-NLP/opus-mt-nds-en' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'Helsinki-NLP/opus-mt-nds-en' is the correct path to a directory containing relevant tokenizer files


  1%|▋                                                 | 127/19203 [01:06<2:46:24,  1.91it/s]
Traceback (most recent call last):
  File "translate.py", line 12, in <module>
    for translation in model.translate_stream(sentences, chunk_size=32, target_lang='en'):
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 297, in translate_stream
    translated = self.translate(batch, show_progress_bar=False, **kwargs)
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 124, in translate
    translated_sentences = self.translate_sentences(splitted_sentences, target_lang=target_lang, source_lang=source_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs)
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 210, in translate_sentences
    raise e
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 205, in translate_sentences
    translated = self.translate_sentences(grouped_sentences, source_lang=lng, target_lang=target_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs)
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 222, in translate_sentences
    output.extend(self.translator.translate_sentences(sentences_sorted[start_idx:start_idx+batch_size], source_lang=source_lang, target_lang=target_lang, beam_size=beam_size, device=self.device, **kwargs))
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 46, in translate_sentences
    tokenizer, model = self.load_model(model_name)
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 28, in load_model
    tokenizer = MarianTokenizer.from_pretrained(model_name)
  File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    raise EnvironmentError(msg)
OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en'. Make sure that:

- 'Helsinki-NLP/opus-mt-nds-en' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'Helsinki-NLP/opus-mt-nds-en' is the correct path to a directory containing relevant tokenizer files

The text was updated successfully, but these errors were encountered:

nreimers · 2021-03-10T12:50:27Z

The language detection step identified a document as language 'nds', which is low German.

However, there is no model that can translate from NDS to EN. Hence the error.

The automatic language detection sadly does not work perfectly for noisy data. So if you know the source language, it is best to set it (translate(..., src_lang='de', ...))

In that case, the language must not be determined, it loads directly the correct model (opus-mt-de-en) and this error is avoided.

ulhaqi12 · 2021-03-10T12:56:33Z

oh, got it. solved when I mentioned source language. btw the parameter is 'source_lang' not 'src_lang'.

Thank you for your help.

ulhaqi12 closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en' #20

Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en' #20

ulhaqi12 commented Mar 10, 2021 •

edited

Loading

nreimers commented Mar 10, 2021

ulhaqi12 commented Mar 10, 2021

Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en' #20

Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en' #20

Comments

ulhaqi12 commented Mar 10, 2021 • edited Loading

nreimers commented Mar 10, 2021

ulhaqi12 commented Mar 10, 2021

ulhaqi12 commented Mar 10, 2021 •

edited

Loading