-
Notifications
You must be signed in to change notification settings - Fork 25
Training custom data not working #18
Comments
There is one mistake in above data which the separator of each line is space and caused train process failed. |
@vietvudanh please provide me the error log when you call word_tokenize function? |
There was no error. But the output is not tokenized correctly at all.
My question are:
and the both generated models still not work.
|
About your first question, I think we need more data to feed into the model than one ore two simple examples. It is not enough for model "learn" the pattern. If you want to make your custom word_tokenize, don't mind to integrate with underthesea. You can simple "export" your model and wrap it on a script. I know it's not obvious right now, so I will update documentation and pipeline in next few days. Don't hesitate to update your current work in this issue. |
Sure, I will spend more time working on the code. Further guidelines is much appreciated. I still don't understand why the model trained on original VLSP data not working though. |
I am trying to train custom input data.
Juts a simple text including:
custom_train_data.txt
and I ran the training script as
which generated the
model.bin
file ok.However, when I replaced the model into underthesea at
underthesea/underthesea/word_tokenize/model_9.bin
. (I have set debug and was sure the right model was made). And tried to tokenize string using the model, which was not working.So what do you think is the problem here?
The text was updated successfully, but these errors were encountered: