Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MarianNMT #2

Closed
4 tasks
xhluca opened this issue Feb 25, 2021 · 3 comments
Closed
4 tasks

Add MarianNMT #2

xhluca opened this issue Feb 25, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@xhluca
Copy link
Owner

xhluca commented Feb 25, 2021

See Marian: https://huggingface.co/transformers/model_doc/marian.html
See helsinki-nlp's models: https://huggingface.co/Helsinki-NLP

We'd need

  • Add option to load the marian architecture at initialization (e.g. dlt.TranslationModel("marian"))
  • Add an option to find all of the languages (and code) available for a certain variant trained using marian, e.g. dlt.utils.available_languages("opus-en-romance")
  • An option to leverage autocomplete such as dlt.lang.opus.en_romance.ENGLISH, but the options would be limited to only what's available with the variance (i.e. romance)
  • TBD
@xhluca xhluca added the enhancement New feature or request label Feb 25, 2021
@xhluca xhluca added this to To do in Release v0.1.0 Feb 25, 2021
@xhluca xhluca moved this from To do to In progress in Release v0.1.0 Mar 11, 2021
@xhluca xhluca moved this from In progress to To do in Release v0.1.0 Mar 11, 2021
@xhluca
Copy link
Owner Author

xhluca commented Mar 11, 2021

Could we auto-gen the languages for OPUS as well? we could use it for:

# 
dlt.lang.opus.romance.ENGLISH
dlt.lang.opus.north_eu.GERMAN

And get the available languages using:

dlt.utils.available_languages("opus", source="en", target="romance")

Wonder if above should return a dictionary {"source": [...], "target": [...]}, a tuple of (source, target)? Or we would need to specify what to return: dlt.utils.available_languages("opus", source="en", target="romance", return_type="source")?

@xhluca xhluca removed this from To do in Release v0.1.0 Mar 12, 2021
@xhluca xhluca added this to To Do in Add MarianNMT Mar 13, 2021
@xhluca
Copy link
Owner Author

xhluca commented Mar 13, 2021

Actually, this might make more sense:

dlt.lang.romance.ENGLISH

dlt.utils.available_language("romance")

A lot more concise and logical.

@xhluca
Copy link
Owner Author

xhluca commented Mar 9, 2022

Already available in easynmt, which has good support for multiple models

@xhluca xhluca closed this as completed Mar 9, 2022
Add MarianNMT automation moved this from To Do to Done Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

2 participants