Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a New Language and Extending Previous Languages #33

Open
mbach138 opened this issue Nov 17, 2021 · 2 comments
Open

Adding a New Language and Extending Previous Languages #33

mbach138 opened this issue Nov 17, 2021 · 2 comments

Comments

@mbach138
Copy link

How would one add a new language to the existing set? How would one extended what is already there with further examples?

@laurahanu
Copy link
Collaborator

If you have a sufficient number of labelled examples you should be able to finetune the multilingual model directly (might want to check if XLM-Roberta was trained on your language). Details on how to train this model are in the README, you might need to create a data loader for your new dataset.

If you don't have enough labelled examples, you could translate the Jigsaw datasets used into the new language and retrain the model, although you would probably need to create a labelled test set to check the performance.

Hope this helps!

@SaadAhmed433
Copy link

@laurahanu I am trying to extend the multilingual model for the dutch language. Since I did not have any labelled examples, I have translated the Jigsaw Datasets to dutch. Now I have the following questions

  1. Do we need to perform any pre processing on the translated datasets?

  2. The instructions in the readme file only show the command for training the model with one of the config files, the stage 2 config is not used? Please clarify if its used or not.

  3. Two sources are mentioned for the translated datasets, I am confused which one to use.

  4. Since I don't have a test dataset for dutch comments, do I translate the entries in the test.csv files to include dutch comments?

Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants