Adding a New Language and Extending Previous Languages #33

mbach138 · 2021-11-17T18:37:10Z

How would one add a new language to the existing set? How would one extended what is already there with further examples?

laurahanu · 2021-11-30T18:11:43Z

If you have a sufficient number of labelled examples you should be able to finetune the multilingual model directly (might want to check if XLM-Roberta was trained on your language). Details on how to train this model are in the README, you might need to create a data loader for your new dataset.

If you don't have enough labelled examples, you could translate the Jigsaw datasets used into the new language and retrain the model, although you would probably need to create a labelled test set to check the performance.

Hope this helps!

SaadAhmed433 · 2022-11-15T14:22:06Z

@laurahanu I am trying to extend the multilingual model for the dutch language. Since I did not have any labelled examples, I have translated the Jigsaw Datasets to dutch. Now I have the following questions

Do we need to perform any pre processing on the translated datasets?
The instructions in the readme file only show the command for training the model with one of the config files, the stage 2 config is not used? Please clarify if its used or not.
Two sources are mentioned for the translated datasets, I am confused which one to use.
Since I don't have a test dataset for dutch comments, do I translate the entries in the test.csv files to include dutch comments?

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a New Language and Extending Previous Languages #33

Adding a New Language and Extending Previous Languages #33

mbach138 commented Nov 17, 2021

laurahanu commented Nov 30, 2021

SaadAhmed433 commented Nov 15, 2022

Adding a New Language and Extending Previous Languages #33

Adding a New Language and Extending Previous Languages #33

Comments

mbach138 commented Nov 17, 2021

laurahanu commented Nov 30, 2021

SaadAhmed433 commented Nov 15, 2022