📜 Understanding Probabilistic Topic Models with Simulation in Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
reveal.js @ 69bf9b5 Add reveal locally Oct 18, 2016
static Final! Nov 10, 2015
.gitignore Add .gitignore Oct 15, 2015
.gitmodules Add reveal as a submodule Nov 6, 2015
Makefile Add reveal locally Oct 18, 2016
README.md Update README.md Mar 7, 2017
custom.css Add stylesheet to make font bigger Nov 3, 2015
environment.yml Use seaborn, fix links Sep 19, 2016
topic-models-with-simulation.ipynb last minute changes Oct 18, 2016


By Tim Hopper: tdhopper.com

alt text alt text

Understanding Probabilistic Topic Models By Simulation



Latent Dirichlet Allocation and related topic models are often presented in the form of complicated equations and confusing diagrams. I will present LDA as a generative model through probabilistic simulation in simple Python. Simulation will help data scientists to understand the model assumptions and limitations and more effectively use black box LDA implementations.


Those without training in probabilistic graphical models and measure theory, data scientist may have a hard time understanding Latent Dirichlet Allocation and other probabilistic topic models. However, because LDA is a generative model, we can write Python code to generated data based on the model assumptions.

The talk will progress as follows:

  • Introduction to mixture models
  • Simulation of mixture models
  • Introduction to grouped data
  • Simulation of latent Dirichlet allocation
  • Fitting and visualizing LDA with Python

Setup Conda Environment and Launch Notebook

With Conda installed, run

$ git clone https://github.com/tdhopper/pydata-nyc-2015.git understanding-lda
$ cd understanding-lda
$ make install
$ source activate understanding-lda

To view the notebook, run

$ make

To view the notebook as a slideshow, run

make slides