You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set the chunk size to 32. After successfully translating 3 chunks and writing output on file it gave an error of the model. Can you guide me with this issue? I am pasting error wording down here.
0%|▌ | 96/19203 [00:54<3:00:03, 1.77it/s]
Exception: Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en'. Make sure that:
- 'Helsinki-NLP/opus-mt-nds-en' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'Helsinki-NLP/opus-mt-nds-en' is the correct path to a directory containing relevant tokenizer files
1%|▋ | 127/19203 [01:06<2:46:24, 1.91it/s]
Traceback (most recent call last):
File "translate.py", line 12, in <module>
for translation in model.translate_stream(sentences, chunk_size=32, target_lang='en'):
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 297, in translate_stream
translated = self.translate(batch, show_progress_bar=False, **kwargs)
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 124, in translate
translated_sentences = self.translate_sentences(splitted_sentences, target_lang=target_lang, source_lang=source_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs)
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 210, in translate_sentences
raise e
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 205, in translate_sentences
translated = self.translate_sentences(grouped_sentences, source_lang=lng, target_lang=target_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs)
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 222, in translate_sentences
output.extend(self.translator.translate_sentences(sentences_sorted[start_idx:start_idx+batch_size], source_lang=source_lang, target_lang=target_lang, beam_size=beam_size, device=self.device, **kwargs))
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 46, in translate_sentences
tokenizer, model = self.load_model(model_name)
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 28, in load_model
tokenizer = MarianTokenizer.from_pretrained(model_name)
File "/home/ulhaqi12/anaconda3/envs/pytorch-env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
raise EnvironmentError(msg)
OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-nds-en'. Make sure that:
- 'Helsinki-NLP/opus-mt-nds-en' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'Helsinki-NLP/opus-mt-nds-en' is the correct path to a directory containing relevant tokenizer files
The text was updated successfully, but these errors were encountered:
The language detection step identified a document as language 'nds', which is low German.
However, there is no model that can translate from NDS to EN. Hence the error.
The automatic language detection sadly does not work perfectly for noisy data. So if you know the source language, it is best to set it (translate(..., src_lang='de', ...))
In that case, the language must not be determined, it loads directly the correct model (opus-mt-de-en) and this error is avoided.
Hi,
I was trying to translate 19203 sentence data from german to English using the translate_stream method explained in the following link.
https://github.com/UKPLab/EasyNMT/blob/main/examples/translation_streaming.py
I set the chunk size to 32. After successfully translating 3 chunks and writing output on file it gave an error of the model. Can you guide me with this issue? I am pasting error wording down here.
The text was updated successfully, but these errors were encountered: