Clone this wiki locally
- Spelling Correction Exercise, Part 1: Error detection and inverted index for n-gram character candidate generation.
- Spelling Correction Exercise, Part 2: Candidate generation (n-gram character and edit based) and scoring with unigram and bigram language models.
- Spelling Correction Exercise, Part 3: TBA, Edit distance and error model computation and use.
- Topic Modeling Exercise: Compute topics using Mallet.
- Additional reading/resource: See my post about a Gibbs sampler in R for a toy topic model example, which implements a sampler and runs on the data for the overview paper Probabilistic Topic Models by Steyvers and Griffiths (2007).