Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional TF-IDF weighting #6

Closed
wpm opened this issue Jul 9, 2017 · 3 comments
Closed

Optional TF-IDF weighting #6

wpm opened this issue Jul 9, 2017 · 3 comments

Comments

@wpm
Copy link
Owner

wpm commented Jul 9, 2017

Optionally weight input vectors by TF-IDF. Should be easy to get from scikit-learn.

@wpm wpm self-assigned this Jul 10, 2017
@wpm
Copy link
Owner Author

wpm commented Jul 17, 2017

Use the TfidfTransformer. Serialize the trained transformer as part of the model.

@wpm
Copy link
Owner Author

wpm commented Jul 20, 2017

Actually does this even make sense? Keras' weighting mechanism only applies to training samples, not at test time. Wouldn't scaling a word vector just turn it into another word? Have to see if there's precedent in the literature for this.

It is troubling that in the experimental results in #20 a bag-of-words SVM does as well as a neural LSTM, because that indicates that I'm not able to effectively make use of sequential data. It's possible that the issue is that I'm treating all words as equally informative in the neural network models. TF-IDF weighting is one way to address this. Would max pooling in a convent model (#17) be another?

@wpm wpm added the wontfix label Aug 3, 2017
@wpm
Copy link
Owner Author

wpm commented Aug 3, 2017

Nah. Not going to do this. Too many other avenues to explore.

@wpm wpm closed this as completed Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant