dl-translate v0.2.0rc1
Pre-release
Pre-release
Add m2m100 as an alternative to mbart50
m2m100 has more languages available (~110) and has also reported their absolute BLEU scores.
Added
dlt.lang.m2m100module: Now has variables for over 100 languages, also auto-complete ready. Example:dlt.lang.m2m100.ENGLISH.dlt.utils.available_languages,dlt.utils.available_codes: Now supports argument "m2m100"
Changed
- [BREAKING]
dlt.lang.TranslationModel: A new model parameter calledmodel_familyin the initialization function. Either "mbart50" or "m2m100". By default, it will be inferred based onmodel_or_path. Needs to be explicitly set ifmodel_or_pathis a path. dlt.TranslationModel.translate: Improved docstring to be more general.- Tests pertaining to
m2m100 scripts/generate_langs.py: Renamed, mechanism now changed to loading from json files
Fixed
dlt.TranslationModel.available_codes()was returning the languages instead of the codes. It will now correctly return the code.
Removed
- Output type hints for
TranslationModel.get_transformers_modelandTranslationModel.get_tokenizer - [BREAKING]
dlt.TranslationModel.bart_modelanddlt.TranslationModel.tokenizerare no longer available to be used directly. Please usedlt.TranslationModel.get_transformers_modelanddlt.TranslationModel.get_tokenizerinstead.