CS 4650 and CS 7650 will meet jointly, on Mondays and Wednesdays from 3:05 - 4:25PM, in Howey (Physics) L3. Office hours are listed here
This is a provisional schedule. Check back in August for more details. But be aware that there will be graded material due no later than the second week of the course (maybe even the first week).
Readings and homeworks are final at the time of the class before they are due (e.g., wednesday readings are final on the preceding monday); problem sets are final on the day they are "out." Please check for updates until then.
History of NLP and modern applications. Review of probability.
- Reading: Chapter 1 of Linguistic Fundamentals for NLP. You should be able to access this PDF for free from a Georgia Tech computer.
- Optional reading: Functional programming in Python. The scaffolding code in this class will make heavy use of Python's functional programming features, such as iterators, generators, list comprehensions, and lambda expressions. If you haven't seen much of this style of programming before, it will be helpful for you to read up on it before getting started with the problem sets.
- Optional reading: Section 2.1 of Foundations of Statistical NLP. A PDF version is accessible through the GT library.
- Optional reading includes these other reviews of probability.
- Why you should take notes by hand, not on a laptop
- Problem set 1 out.
Bag-of-words models, naive Bayes, and sentiment analysis.
- Reading: my notes, chapter 1, 3.1.
- Optional readings: Sentiment analysis and opinion mining, especially parts 1, 2, 4.1-4.3, and 7; Chapters 0-0.3, 1-1.2 of LXMLS lab guide
- Homework 1 cancelled, since waitlist students can't yet access the course t-square. We'll try this again, and the total number of homeworks will remain 12.
- [Demo](classes/Lec-2 Simple Sentiment Analysis.ipynb)
Discriminative classifiers: perceptron and passive-aggressive learning; word-sense disambiguation.
- Reading: my notes, chapter 2-2.3.
- Optional supplementary reading: Parts 4-7 of log-linear models; survey on word sense disambiguation
- Optional advanced reading: adagrad; passive-aggressive learning
Logistic regression and online learning
- Reading: my notes, chapter 2.4-2.6.
- Optional supplementary reading: Parts 4-7 of log-linear models
- Problem set 1 due at 2:55pm.
- Problem set 2 out on August 28
- Reading: notes chapter 3.2
- Homework 2 due
Learning from partially-labeled data.
Labor Day is a celebration of the American Labor Movement.
N-grams, speech recognition, smoothing, recurrent neural networks.
- Reading: my notes, chapter 5.
- Homework 3 due
- Demo
- Optional advanced reading: An empirical study of smoothing techniques for language models, especially sections 2.7 and 3 on Kneser-Ney smoothing; A hierarchical Bayesian language model based on Pitman-Yor processes (requires some machine learning background)
Finding meaning inside words! Also, we'll probably have to catch up a little on smoothing from the previous class.
- Homework 4 due
- Reading: Bender chapter 2.
- Optional reading: Jurafsky and Martin chapter 2.
Finite-state acceptors, transducers, composition. Edit distance.
- Problem set 2 due at 2:55 pm.
- Reading: my notes, chapter 7
- Optional reading: Knight and May; OpenFST slides; Weighted Finite-State Transducers in speech recognition.
- Jacob will be at the conference on Empirical Methods in Natural Language Processing, presenting research from the Computational Linguistics Lab.
September 23: Part-of-speech tagging and Hidden Markov Models
Part-of-speech tags, hidden Markov models.
- Homework 5 due
- Problem set 3 out.
- Reading: my notes, chapter 8
- Optional reading: Bender chapter 6; Tagging problems and hidden Markov models
September 28: Dynamic Programming in Hidden Markov Models
Viterbi, the forward algorithm, B-I-O encoding for named entity recognition.
- Reading: my notes, chapter 9-9.5
- Optional reading: Conditional random fields;
- Slides
- TAs will be available to answer questions on problem set 3.
- Jacob will be presenting research at DiSpoL 2015, a workshop on discourse structure.
Structured perceptron, conditional random fields, and max-margin markov networks. More about forward-backward. Maybe a little about unsupervised POS tagging.
- Reading: my notes, chapter 9.6-9.9
- Optional reading: Discriminative training of HMMs; CRF tutorial; Two decades of unsupervised POS tagging: how far have we come?; my notes 9.10
- Problem set 3 due at 2:55pm.
Constituents, grammar design, formal language theory.
- Reading: my notes, chapter 10
- Optional reading: Bender chapter 7
- Problem set 4 out.
The CKY algorithm, the inside algorithm, Markovization, and lexicalization.
- Homework 6 due
- Reading: my notes, chapter 10.4-11.2
You may bring a one-page sheet of notes (two sides, any font size).
Mid-term review. Parsing in probabilistic context-free grammars.
- Reading: notes, chapter 11.3-11.4
- Optional reading: Probabilistic context-free grammars; Bender chapter 8; my notes 10.13-10.14.
Making CFG parsing work better: markovization, lexicalization, refinement grammars. Intro to dependency parsing.
- Reading: my notes, chapter 11.5-6
- Problem set 4 due at 2:55pm.
- Optional reading: The inside-outside algorithm; Corpus-based induction of linguistic structure
Dependency grammar, projective and non-projective dependency graphs, related algorithms, and transition-based dependency parsing. Quick tour of feature-structure grammars, unification, combinatory categorial grammar (CCG), tree-adjoining grammar (TAG). Algorithms and applications.
- Reading: my notes, chapter 12; intro to CCG
- Homework 7 due
- Optional readings on dependency parsing: my slides; Characterizing the errors of data-driven dependency parsing models; Short textbook on dependency parsing, PDF should be free from a GT computer
- Optional readings on alternative models of syntax: Much more about CCG; LTAG; Probabilistic disambiguation models for wide-coverage HPSG
- The always useful language log on non-projectivity in dependency parsing.
- Problem set 5 out.
Meaning representations, compositionality, first-order logic, and the syntax-semantics interface.
- Reading: Levy and Manning: Intro to Formal Computational Semantics
- Bonus Homework due. (This is an additional homework, beyond the 12 that were planned. You will still be graded on your best ten homeworks for the semester, so you can feel free to skip this one; or, you can do this one and skip another one.)
- Optional readings: Briscoe: Introduction to Formal Semantics for Natural Language; Learning to map sentences to logical form
PropBank, FrameNet, semantic role labeling, and a little Abstract Meaning Representation (AMR). Integer linear programming will also be discussed.
- Homework 8 due
- Reading: Gildea and Jurafsky sections 1-3; Banarescu et al sections 1-4
- Optional reading: SRL via ILP; Syntactic parsing in SRL; AMR parsing
- Optional video
Latent semantic analysis, word embeddings
- Reading: Vector-space models, sections 1, 2, 4-4.4, 6
- Optional: my notes, chapter 15
- Optional reading: python coding tutorial for word2vec word embeddings
- Slides
Classification-based algorithms; graph-based algorithms; a brief intro to government and binding theory.
- Problem set 5 due on November 12 at 2:55pm.
- Homework 9 due
- Reading: my notes, chapter 17
- Optional reading: Multi-pass sieve (good coverage of linguistic features that bear on coreference); Large-scale multi-document coreference, Easy victories and uphill battles (a straightforward machine learning approach to coreference)
Coherence, cohesion, centering theory, topic segmentation, speech act classification.
- Reading: Discourse structure and language technology
- Homework 10 due (extended to November 18, 3pm)
- Optional: Modeling local coherence; Sentence-level discourse parsing; Analysis of discourse structure...
- Problem set 6 out.
- Slides
- Homework 11 due
- Reading: Collins, IBM models 1 and 2
- Optional Reading: Chiang, Intro to Synchronous Grammars; Lopez, Statistical machine translation
- Slides
Reading for comprehension.
- Homework 12 due
- Reading: Grishman, sections 1 and 4-6
- Slides
No class.
- Problem set 6 due at 2:55pm.
Semi-supervised learning and domain adaptation.
- Reading: my notes, chapter 20
- Optional reading: Jerry Zhu's survey; Jerry Zhu's book
- Slides
- Homework 13 due
2:50 - 5:40pm. You may bring a single sheet of notes, two-sided.