This the implementation of the paper
- Tian Shi, Kyeongpil Kang, Jaegul Choo and Chandan K. Reddy, "Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations", In Proceedings of the International Conference on World Wide Web (WWW), Lyon, France, April 2018. PDF
- Python 3.5.2
- argparse
- Tokenize with NLTK, SpaCy or CoreNLP
- Remove special characters.
- Remove stop-words.
- Edit the argument of
data_process.py
- Run
python3 data_process.py
to prepare the document-term matrix and vocabulary.
- Run
python3 train.py --help
to see the full list of options.
- Run
python3 vis_topic.py
to calculate the PMI and visualize the top keywords in each topic.