You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you have a sufficient number of labelled examples you should be able to finetune the multilingual model directly (might want to check if XLM-Roberta was trained on your language). Details on how to train this model are in the README, you might need to create a data loader for your new dataset.
If you don't have enough labelled examples, you could translate the Jigsaw datasets used into the new language and retrain the model, although you would probably need to create a labelled test set to check the performance.
@laurahanu I am trying to extend the multilingual model for the dutch language. Since I did not have any labelled examples, I have translated the Jigsaw Datasets to dutch. Now I have the following questions
Do we need to perform any pre processing on the translated datasets?
The instructions in the readme file only show the command for training the model with one of the config files, the stage 2 config is not used? Please clarify if its used or not.
Two sources are mentioned for the translated datasets, I am confused which one to use.
Since I don't have a test dataset for dutch comments, do I translate the entries in the test.csv files to include dutch comments?
How would one add a new language to the existing set? How would one extended what is already there with further examples?
The text was updated successfully, but these errors were encountered: