Optional TF-IDF weighting #6

wpm · 2017-07-09T01:12:47Z

Optionally weight input vectors by TF-IDF. Should be easy to get from scikit-learn.

wpm · 2017-07-17T15:34:11Z

Use the TfidfTransformer. Serialize the trained transformer as part of the model.

wpm · 2017-07-20T17:33:03Z

Actually does this even make sense? Keras' weighting mechanism only applies to training samples, not at test time. Wouldn't scaling a word vector just turn it into another word? Have to see if there's precedent in the literature for this.

It is troubling that in the experimental results in #20 a bag-of-words SVM does as well as a neural LSTM, because that indicates that I'm not able to effectively make use of sequential data. It's possible that the issue is that I'm treating all words as equally informative in the neural network models. TF-IDF weighting is one way to address this. Would max pooling in a convent model (#17) be another?

wpm · 2017-08-03T16:49:44Z

Nah. Not going to do this. Too many other avenues to explore.

wpm self-assigned this Jul 10, 2017

wpm added the enhancement label Jul 10, 2017

wpm added the investigation label Jul 20, 2017

wpm added the wontfix label Aug 3, 2017

wpm closed this as completed Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional TF-IDF weighting #6

Optional TF-IDF weighting #6

wpm commented Jul 9, 2017

wpm commented Jul 17, 2017

wpm commented Jul 20, 2017

wpm commented Aug 3, 2017

Optional TF-IDF weighting #6

Optional TF-IDF weighting #6

Comments

wpm commented Jul 9, 2017

wpm commented Jul 17, 2017

wpm commented Jul 20, 2017

wpm commented Aug 3, 2017