You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have created vector models in both Word2Vec and Doc2Vec and so now we are aiming to use these vectors to create features for a classification or topic model that will correctly identify when a topic from a predefined list is being discussed in a comment. We are looking at different possibilities, including custom though imperfect datasets that use subreddit names as labels (generalized into broader topics), or possibly using a classic dataset such as 20newsgroup as a proof of concept.
We will be using the gensim library to create the model and hope to have it completed by the end of the week.
Any expertise or advice on topic modeling would be appreciated.
The text was updated successfully, but these errors were encountered:
Our Doc2Vec model is set up with a training suite that allows us to compare distantly labelled comments and return a list of related subreddits.While not a true classifier by any stretch, we feel that without a labelled dataset this is the best we can do and so this issue is at this time pending data collection. We will make an attempt to cluster the 'documents' (subreddits) to form more cohesive, unsupervised categories to see if we can gain better results, but most likely supervised learning will be the solution.
Creating an Initial Topic Identification Model
We have created vector models in both Word2Vec and Doc2Vec and so now we are aiming to use these vectors to create features for a classification or topic model that will correctly identify when a topic from a predefined list is being discussed in a comment. We are looking at different possibilities, including custom though imperfect datasets that use subreddit names as labels (generalized into broader topics), or possibly using a classic dataset such as 20newsgroup as a proof of concept.
We will be using the gensim library to create the model and hope to have it completed by the end of the week.
Any expertise or advice on topic modeling would be appreciated.
The text was updated successfully, but these errors were encountered: