Create my own Model for Sentence Similarity/Automated Scoring #72

dhimasyoga16 · 2020-08-10T15:06:43Z

I have a wikipedia dump file as my corpus (which is in Indonesian, i've extract it and convert it to .txt)
How can i train my corpus file with bert multilingual cased (fine tune) with BERTScore so i can have my own model for specific task such as sentence similarity or automated short answer scoring?

Or maybe i should do this with the original BERT?
Thankyou so much in advance.

The text was updated successfully, but these errors were encountered:

Tiiiger · 2020-08-10T17:52:40Z

hi @dhimasyoga16 thank you for your interest in this repo. I am not sure what the question is asking.

Are you asking for how to fine-tune the bert-multilingual model? For that, you need to check the huggingface model to see how to continue training bert-multilingual with the mask language modeling objectives. See https://github.com/huggingface/transformers/tree/master/examples/language-modeling.

If you have already fine-tuned bert-multilingual, you can feed in the model path to --model when calling the score function.

Feel free to follow up with more questions.

dhimasyoga16 · 2020-08-11T17:26:07Z

Hi, thankyou so much for the quick assist and the answer.
I also feel sorry for my hard-to-understand question, my bad.

Can i ask one more question?
I've done feature extracting by running the extract_features.py and it generates 17.3GB json file. Can i use that json file as my model?
Because i want the BERTScore to analyze Indonesian sentences/texts better.

Thankyou so much once again :)

Tiiiger · 2020-08-11T19:20:18Z

hi @dhimasyoga16 which extract_features.py file are you talking about? is it in this repo?

if you have precomputed the features, you can modify the code (https://github.com/Tiiiger/bert_score/blob/master/bert_score/utils.py#L253) to load the features instead of computing them again.

dhimasyoga16 · 2020-08-15T02:11:14Z

Hi, sorry for the inactivity of this issue.
Refer to this link : https://github.com/huggingface/transformers/tree/master/examples/language-modeling

How can i fine-tune the bert multilingual model in Indonesian language? Can i use a wikipedia dump file as a corpus?

Tiiiger · 2020-08-15T04:49:40Z

hi @dhimasyoga16 this is a question that is better posed to the huggingface repo. Hopefully they will have a detailed instruction.

We are not really experts on this topic.

dhimasyoga16 · 2020-08-19T10:39:40Z

Hi, i've successfully created my language model using huggingface transformers.

When i'm trying to do a testing (using my own model of course), why does the --num_layers affect the score?
For example, when i'm using --num_layers 2 it gave me lower score rather than when i'm using --num_layers 6.
Any detailed explanation about this?
And which --num_layers that will give better scoring accuracy?

Sorry for so much questions, i'm new to NLP.

Tiiiger · 2020-08-19T18:53:03Z

hi @dhimasyoga16 please see our paper for the effect of using different --num_layers. basically, this argument controls which pre-trained layer of representations you are using. It is hard to say which --num_layers would work best for you application without seeing any validation data from our side.

Tiiiger closed this as completed Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create my own Model for Sentence Similarity/Automated Scoring #72

Create my own Model for Sentence Similarity/Automated Scoring #72

dhimasyoga16 commented Aug 10, 2020

Tiiiger commented Aug 10, 2020

dhimasyoga16 commented Aug 11, 2020

Tiiiger commented Aug 11, 2020

dhimasyoga16 commented Aug 15, 2020

Tiiiger commented Aug 15, 2020

dhimasyoga16 commented Aug 19, 2020

Tiiiger commented Aug 19, 2020

Create my own Model for Sentence Similarity/Automated Scoring #72

Create my own Model for Sentence Similarity/Automated Scoring #72

Comments

dhimasyoga16 commented Aug 10, 2020

Tiiiger commented Aug 10, 2020

dhimasyoga16 commented Aug 11, 2020

Tiiiger commented Aug 11, 2020

dhimasyoga16 commented Aug 15, 2020

Tiiiger commented Aug 15, 2020

dhimasyoga16 commented Aug 19, 2020

Tiiiger commented Aug 19, 2020