I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.
by Kyubyong
Reviewed and updated by YJ Choe on Oct. 18, 2017.
PAPERAutomatic Text Scoring Using Neural NetworksPAPERA Neural Approach to Automated Essay ScoringCHALLENGEKaggle: The Hewlett Foundation: Automated Essay ScoringPROJECTEASE (Enhanced AI Scoring Engine)
WIKISpeech recognitionPAPERDeep Speech 2: End-to-End Speech Recognition in English and MandarinPAPERWaveNet: A Generative Model for Raw AudioPROJECTA TensorFlow implementation of Baidu's DeepSpeech architecturePROJECTSpeech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGEThe 5th CHiME Speech Separation and Recognition ChallengeDATAThe 5th CHiME Speech Separation and Recognition ChallengeDATACSTR VCTK CorpusDATALibriSpeech ASR corpusDATASwitchboard-1 Telephone Speech CorpusDATATED-LIUM CorpusDATAOpen Speech and Language ResourcesDATACommon Voice
WIKIAutomatic summarizationBOOKAutomatic Text SummarizationPAPERText Summarization Using Neural NetworksPAPERRanking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATAText Analytics Conferences (TAC)DATADocument Understanding Conferences (DUC)
INFOCoreference ResolutionPAPERDeep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPERImproving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGECoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGECoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotesCHALLENGESemEval 2018 Task 4: Character Identification on Multiparty Dialogues
PAPERA Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionPAPERNeural Network Translation Models for Grammatical Error CorrectionPAPERAdapting Sequence Models for Sentence CorrectionCHALLENGECoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGECoNLL-2014 Shared Task: Grammatical Error CorrectionDATANUS Non-commercial research/trial corpus licenseDATALang-8 Learner CorporaDATACornell Movie--Dialogs CorpusPROJECTDeep Text CorrectorPRODUCTdeep grammar
PAPERGrapheme-to-Phoneme Models for (Almost) Any LanguagePAPERPolyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPERMultitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECTSequence-to-Sequence G2P toolkitPROJECTg2p_en: A Simple Python Module for English Grapheme To Phoneme ConversionDATAMultilingual Pronunciation Data
PAPERAutomatic Sarcasm Detection: A SurveyPAPERMagnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very PersonalPAPERSarcasm Detection on Twitter: A Behavioral Modeling ApproachCHALLENGESemEval-2017 Task 6: #HashtagWars: Learning a Sense of HumorCHALLENGESemEval-2017 Task 7: Detection and Interpretation of English PunsDATASarcastic comments from RedditDATASarcasm Corpus V2DATASarcasm Amazon Reviews Corpus
WIKISymbol grounding problemPAPERThe Symbol Grounding ProblemPAPERFrom phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learningPAPEREncoding of phonology in a recurrent neural model of grounded speechPAPERGated-Attention Architectures for Task-Oriented Language GroundingPAPERSound-Word2Vec: Learning Word Representations Grounded in SoundsCOURSELanguage Grounding to Vision and ControlWORKSHOPLanguage Grounding for Robotics
WIKILanguage identificationPAPERAUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSPAPERNatural Language Processing with Small Feed-Forward NetworksCHALLENGE2015 Language Recognition Evaluation
WIKILanguage modelTOOLKITKenLM Language Model ToolkitPAPERDistributed Representations of Words and Phrases and their CompositionalityPAPERGenerating Sequences with Recurrent Neural NetworksPAPERCharacter-Aware Neural Language ModelsTHESISStatistical Language Models Based on Neural NetworksDATAPenn TreebankTUTORIALTensorFlow Tutorial on Language Modeling with Recurrent Neural Networks
WIKILemmatisationPAPERJoint Lemmatization and Morphological Tagging with LEMMINGTOOLKITWordNet LemmatizerDATATreebank-3
WIKILip readingPAPERLipNet: End-to-End Sentence-level LipreadingPAPERLip Reading Sentences in the WildPAPERLarge-Scale Visual Speech RecognitionPROJECTLip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksPRODUCTLiopaDATAThe GRID audiovisual sentence corpusDATAThe BBC-Oxford 'Multi-View Lip Reading Sentences' (MV-LRS) Dataset
PAPERNeural Machine Translation by Jointly Learning to Align and TranslatePAPERNeural Machine Translation in Linear TimePAPERAttention Is All You NeedPAPERSix Challenges for Neural Machine TranslationPAPERPhrase-Based & Neural Unsupervised Machine TranslationCHALLENGEACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGEEMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)DATAOpenSubtitles2016DATAWIT3: Web Inventory of Transcribed and Translated TalksDATAThe QCRI Educational Domain (QED) CorpusPAPERMulti-task Sequence to Sequence LearningPAPERUnsupervised Pretraining for Sequence to Sequence LearningPAPERGoogle’s Multilingual Neural Machine Translation System: Enabling Zero-Shot TranslationTOOLKITSubword Neural Machine Translation with Byte Pair Encoding (BPE)TOOLKITMulti-Way Neural Machine TranslationTOOLKITOpenNMT: Open-Source Toolkit for Neural Machine Translation
WIKIInflectionPAPERMorphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGESIGMORPHON 2016 Shared Task: Morphological ReinflectionDATAsigmorphon2016
WIKINamed-entity recognitionPAPERNeural Architectures for Named Entity RecognitionPROJECTOSU Twitter NLP ToolsCHALLENGENamed Entity Recognition in TwitterCHALLENGECoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGEIntroduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATACoNLL-2002 NER corpusDATACoNLL-2003 NER corpusDATANUT Named Entity Recognition in Twitter Shared taskTOOLKITStanford Named Entity Recognizer
PAPERDynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECTParalex: Paraphrase-Driven Learning for Open Question AnsweringCHALLENGESemEval-2015 Task 1: Paraphrase and Semantic Similarity in TwitterDATAMicrosoft Research Paraphrase CorpusDATAMicrosoft Research Video Description CorpusDATAPascal DatasetDATAFlickr DatasetDATAThe SICK data setDATAPPDB: The Paraphrase DatabaseDATAWikiAnswers Paraphrase Corpus
PAPERNeural Paraphrase Generation with Stacked Residual LSTM NetworksDATANeural Paraphrase Generation with Stacked Residual LSTM NetworksCODENeural Paraphrase Generation with Stacked Residual LSTM NetworksPAPERA Deep Generative Framework for Paraphrase GenerationPAPERParaphrasing Revisited with Neural Machine Translation
WIKIParsingTOOLKITThe Stanford Parser: A statistical parserTOOLKITspaCy parserPAPERGrammar as a Foreign LanguagePAPERA fast and accurate dependency parser using neural networksPAPERUniversal Semantic ParsingCHALLENGECoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGECoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGECoNLL 2015 Shared Task: Shallow Discourse ParsingCHALLENGESemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!
WIKIPart-of-speech taggingPAPERMultilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPERUnsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATATreebank-3TOOLKITnltk.tag package
WIKIPinyin input methodPAPERNeural Network Language Model for Chinese Pinyin Input Method EnginePROJECTNeural Chinese Transliterator
WIKIQuestion answeringPAPERAsk Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPERDynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGETREC Question Answering TaskCHALLENGENTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGECLEF Question Answering TrackCHALLENGESemEval-2017 Task 3: Community Question AnsweringCHALLENGESemEval-2018 Task 11: Machine Comprehension using Commonsense KnowledgeDATAMS MARCO: Microsoft MAchine Reading COmprehension DatasetDATAMaluuba NewsQADATASQuAD: 100,000+ Questions for Machine Comprehension of TextDATAGraphQuestions: A Characteristic-rich Question Answering DatasetDATAStory Cloze Test and ROCStories CorporaDATAMicrosoft Research WikiQA CorpusDATADeepMind Q&A DatasetDATAQASentDATATextbook Question Answering
WIKIRelationship extractionPAPERA deep learning approach for relationship extraction from interaction context in social manufacturing paradigmCHALLENGESemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers
WIKISemantic role labelingBOOKSemantic Role LabelingPAPEREnd-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPERNeural Semantic Role Labeling with Dependency Path EmbeddingsPAPERDeep Semantic Role Labeling: What Works and What's NextCHALLENGECoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGECoNLL-2004 Shared Task: Semantic Role LabelingTOOLKITIllinois Semantic Role Labeler (SRL)DATACoNLL-2005 Shared Task: Semantic Role Labeling
WIKISentence boundary disambiguationPAPERA Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKITNLTK TokenizersDATAThe British National CorpusDATASwitchboard-1 Telephone Speech Corpus
WIKISentiment analysisINFOAwesome Sentiment AnalysisCHALLENGEKaggle: UMICH SI650 - Sentiment ClassificationCHALLENGESemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGESemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECTSenticNetPROJECTStanford NLP Group Sentiment AnalysisDATAMulti-Domain Sentiment Dataset (version 2.0)DATAStanford Sentiment TreebankDATATwitter Sentiment CorpusDATATwitter Sentiment Analysis Training CorpusDATAAFINN: List of English words rated for valence
PAPERVideo-based Sign Language Recognition without Temporal SegmentationPAPERSubUNets: End-to-end Hand Shape and Continuous Sign Language RecognitionDATARWTH-PHOENIX-WeatherDATAASLLRPPROJECTSignAll
PAPERSinging voice synthesis based on deep neural networksPAPERA Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural SongsPRODUCTVOCALOID: voice synthesis technology and software developed by YamahaCHALLENGESpecial Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"
WORKSHOPNLP+CSS: Workshops on Natural Language Processing and Computational Social ScienceTOOLKITMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level ConstraintsTOOLKITOnline Variational Bayes for Latent Dirichlet Allocation (LDA)GROUPThe University of Chicago Knowledge Lab
WIKISource separationPAPERFrom Blind to Guided Audio Source SeparationPAPERJoint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGESignal Separation Evaluation Campaign (SiSEC)CHALLENGECHiME Speech Separation and Recognition Challenge
WIKISpeaker diarisationPAPERDNN-based speaker clustering for speaker diarisationPAPERUnsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPERAudio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGERich Transcription Evaluation
WIKISpeaker recognitionPAPERA NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPERDEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONPAPERDeep Speaker: an End-to-End Neural Speaker Embedding SystemPROJECTVoice Vector: which of the Hollywood stars is most similar to my voice?CHALLENGENIST Speaker Recognition Evaluation (SRE)INFOAre there any suggestions for free databases for speaker recognition?DATAVoxCeleb2: Deep Speaker Recognition
- See Lip-reading
 
WIKISpeech_segmentationPAPERWord Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPERUnsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPERUnsupervised Lexicon Discovery from Acoustic InputPAPERWeakly supervised spoken term discovery using cross-lingual side informationDATACALLHOME Spanish Speech
WIKISpeech synthesisPAPERNatural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram PredictionsPAPERWaveNet: A Generative Model for Raw AudioPAPERTacotron: Towards End-to-End Speech SynthesisPAPERDeep Voice 3: 2000-Speaker Neural Text-to-SpeechPAPEREfficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided AttentionDATAThe World English BibleDATALJ Speech DatasetDATALessac DataCHALLENGEBlizzard Challenge 2017PRODUCTLyrebirdPROJECTThe Festvox projectTOOLKITMerlin: The Neural Network (NN) based Speech Synthesis System
WIKISpeech enhancementBOOKSpeech enhancement: theory and practicePAPERAn Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPERA Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPERSpeech Enhancement Based on Deep Denoising Autoencoder
WIKIStemmingPAPERA BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMINGTOOLKITNLTK Stemmers
WIKITerminology extractionPAPERNeural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
WIKISemantic similarityPAPERA Survey of Text Similarity ApproachesPAPERLearning to Rank Short Text Pairs with Convolutional Deep Neural NetworksPAPERImproved Semantic Representations From Tree-Structured Long Short-Term Memory NetworksCHALLENGESemEval-2014 Task 3: Cross-Level Semantic SimilarityCHALLENGESemEval-2014 Task 10: Multilingual Semantic Textual SimilarityCHALLENGESemEval-2017 Task 1: Semantic Textual SimilarityWIKISemantic Textual Similarity Wiki
WIKIText simplificationPAPERAligning Sentences from Standard Wikipedia to Simple WikipediaPAPERProblems in Current Text Simplification Research: New Data Can HelpDATANewsela Data
- See Speech Synthesis
 
WIKITextual entailmentPROJECTTextual Entailment with TensorFlowPAPERTextual Entailment with Structured Attentions and CompositionCHALLENGESemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGESemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
WIKITransliterationINFOTransliteration of Non-Latin scriptsPAPERA Deep Learning Approach to Machine TransliterationCHALLENGENEWS 2016 Shared Task on Transliteration of Named EntitiesPROJECTNeural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?
PAPERPHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECTDeep neural networks for voice conversion (voice style transfer) in TensorflowPROJECTAn implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGEVoice Conversion Challenge 2016CHALLENGEVoice Conversion Challenge 2018DATACMU_ARCTIC speech synthesis databasesDATATIMIT Acoustic-Phonetic Continuous Speech Corpus
WIKIWord embeddingTOOLKITGensim: word2vecTOOLKITfastTextTOOLKITGloVe: Global Vectors for Word RepresentationINFOWhere to get a pretrained modelPROJECTPre-trained word vectorsPROJECTPre-trained word vectors of 30+ languagesPROJECTPolyglot: Distributed word representations for multilingual NLPPROJECTBPEmb: a collection of pre-trained subword embeddings in 275 languagesCHALLENGESemEval 2018 Task 10 Capturing Discriminative AttributesPAPERBilingual Word Embeddings for Phrase-Based Machine TranslationPAPERA Survey of Cross-Lingual Embedding Models
INFOWhat is Word Prediction?PAPERThe prediction of character based on recurrent neural network language modelPAPERAn Embedded Deep Learning based Word PredictionPAPEREvaluating Word Prediction: Framing Keystroke SavingsDATAAn Embedded Deep Learning based Word PredictionPROJECTWord Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?CHALLENGESemEval-2018 Task 2, Multilingual Emoji Prediction