Skip to content

wangpengcn/Auto-Generated-Insights-of-2019-HR-Tech-Conference-Twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auto Generated Insights of 2019 HR Tech Conference Twitter

image

I scrape tweets with #HRTechConf, and build Latent Dirichlet Allocation (LDA) model for auto detecting and interpreting topics in the tweets. Here is my pipeline:

  1. Data gathering – twitter scrape
  2. Data pre-processing
  3. Generating word cloud
  4. Train LDA model
  5. Visualizing topics

Install

This project requires Python 3.6+ and the following Python libraries installed:

  • TwitterScraper, a Python script to scrape for tweets
  • NLTK(Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
  • Gensim, “generate similar”, a popular NLP package for topic modeling
  • Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
  • pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data
  • NumPy
  • Pandas
  • matplotlib

Code

Code is provided in HRTech2019_LDA.py.

Topic Visualization

Interactive topic visualization image

About

LDA-Generated Insights of 2019 HR Tech Conference Twitter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages