Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Classification #5

Closed
TyJK opened this issue May 9, 2017 · 1 comment
Closed

Topic Classification #5

TyJK opened this issue May 9, 2017 · 1 comment

Comments

@TyJK
Copy link
Owner

TyJK commented May 9, 2017

Creating an Initial Topic Identification Model

We have created vector models in both Word2Vec and Doc2Vec and so now we are aiming to use these vectors to create features for a classification or topic model that will correctly identify when a topic from a predefined list is being discussed in a comment. We are looking at different possibilities, including custom though imperfect datasets that use subreddit names as labels (generalized into broader topics), or possibly using a classic dataset such as 20newsgroup as a proof of concept.

We will be using the gensim library to create the model and hope to have it completed by the end of the week.

Any expertise or advice on topic modeling would be appreciated.

@TyJK
Copy link
Owner Author

TyJK commented May 16, 2017

Our Doc2Vec model is set up with a training suite that allows us to compare distantly labelled comments and return a list of related subreddits.While not a true classifier by any stretch, we feel that without a labelled dataset this is the best we can do and so this issue is at this time pending data collection. We will make an attempt to cluster the 'documents' (subreddits) to form more cohesive, unsupervised categories to see if we can gain better results, but most likely supervised learning will be the solution.

@TyJK TyJK closed this as completed Mar 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant