You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
About tokenizer.text_to_sequences method ,for train_set,val_set and test_set.Maybe the same word has different index. So when predict the unknown label dataSet,this kind of issue can make a very big mistake.
I train a Chinese comments sentiment_analysis model, like your code ,every set directly text_to_sequences,,but f1-score is only 0.7382.It is poor,and lower than bayes(0.8988).
When I find this issue, I use all data_set to make tokenizer.wordindex,so every word just has one index .As a result,every word of different sets has the unique index. If not ,as above,the same word has different index,and different vector. F1-score is 0.9325.By the way,my result's match ranking is the first!
The core is tokenizer.wordindex built strategy,and whether different dataSet has complete word_Vocab.
The text was updated successfully, but these errors were encountered:
About tokenizer.text_to_sequences method ,for train_set,val_set and test_set.Maybe the same word has different index. So when predict the unknown label dataSet,this kind of issue can make a very big mistake.
I train a Chinese comments sentiment_analysis model, like your code ,every set directly text_to_sequences,,but f1-score is only 0.7382.It is poor,and lower than bayes(0.8988).
When I find this issue, I use all data_set to make tokenizer.wordindex,so every word just has one index .As a result,every word of different sets has the unique index. If not ,as above,the same word has different index,and different vector. F1-score is 0.9325.By the way,my result's match ranking is the first!
The core is tokenizer.wordindex built strategy,and whether different dataSet has complete word_Vocab.
The text was updated successfully, but these errors were encountered: