Clone this wiki locally
This class will provide instruction on applying algorithms in natural language processing and machine learning for experimentation and for real world tasks, including clustering, classification, part-of-speech tagging, named entity recognition, topic modeling, and more. The approach will be practical and hands-on: for example, students will program common classifiers from the ground up, use existing toolkits such as OpenNLP, Chalk, StanfordNLP, Mallet, and Breeze, construct NLP pipelines with UIMA, and get some initial experience with distributed computation with Hadoop and Spark. Guidance will also be given on software engineering, including build tools, git, and testing. It is assumed that students are already familiar with machine learning and/or computational linguistics and that they already are competent programmers. The programming language used in the course will be Scala; no explicit instruction will be given in Scala programming, but resources and assistance will be provided for those new to the language.