Skip to content

This repository contains the tasks created as part of the Natural Language Processing (NLP) course at the Ben-Gurion University of the Negev

Notifications You must be signed in to change notification settings

yiftachsa/Natural-Language-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Natural-Language-Processing

This repository contains the tasks created as part of the Natural Language Processing (NLP) course at the Ben-Gurion University of the Negev

Natural language processing is the research field in which we develop, test and analyze machine learning algorithms that are used in order to automatically process large amounts of text in order to understand given texts and generate new texts. The course makes heavy use of machine learning but introduces concepts from linguistics and cognitive psychology. Typical examples for active research topics and applications are spam detection, error correction, machine translation, topic modeling, document classification and demographic attribution.


Assignments

Assignment 1 - Text Preprocessing, Language Modeling and Generation - Implement a Markovian language model and a language generator. We use noisy channel algorithm for spell checking. Combining the noisy channel with a language model is a simple, though powerful, algorithm that demonstrates some key elements in language processing and the way statistical machine learning implicitly accounts for cognitive and technological biases.

Assignment 2 - Contextual Spell Checking - The Noisy Channel and a Probabilistic Spell Checker. Distributional semantics and Text Classification. In this assignment we built a spell checker that handles both non-word and real-word errors given in a sentential context. In order to do that we learn a language model as well as reconstruct the error distribution tables (according to error type) from lists of common errors. Finally, we combine it all to a context sensitive noisy channel model.

Assignment 3 | Notebook | Report- Authorship Attribution - LSTM networks - Using various algorithms for text classification, performing an authorship attribution task on Donald Trump’s tweets. A comprehensive report, the accompanying code and classification output obtained on a test set is included in the repository.

Assignment 4 | Notebook - Part of Speech Tagging - Implement a Hidden Markov Model and a BiLSTM model for Part of Speech tagging. Using discriminative models for POS tagging (MEMM and bi-LSTM).

About

This repository contains the tasks created as part of the Natural Language Processing (NLP) course at the Ben-Gurion University of the Negev

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published