A script that automatically infers the topics discussed in a collection of documents.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
topic-modeling-using-LDA.py
yelp_academic_dataset_review-short-version.json
yelp_academic_dataset_review.json

README.md

Overview

Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction in later stages of the data analysis.

LDA (Latent Dirichlet Allocation) is a topic model library. I used LDA in this project to derive ‘topics’ from the dataset provided, the code was written in Python.

Dataset

The dataset was obtained from Yelp’s website.

Script Steps

  1. Prepare the data:
  • Tokenizing
  • Stopping
  • Stemming
  1. Construct a Document-term Matrix
  2. Apply the LDA Model
  3. Examine the results

License

This project is licensed under the GNU 2.0 License - see the LICENSE.md file for details