I've been working on several natural language processing tasks for a long time. One day, I felt like to draw a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.
by Kyubyong
PAPERAutomatic Text Scoring Using Neural NetworksPAPERA Neural Approach to Automated Essay ScoringCHALLENGEKaggle: The Hewlett Foundation: Automated Essay ScoringPROJECTEASE (Enhanced AI Scoring Engine)
WIKISpeech recognitionPAPERDeep Speech 2: End-to-End Speech Recognition in English and MandarinPAPERWaveNet: A Generative Model for Raw AudioPROJECTA TensorFlow implementation of Baidu's DeepSpeech architecturePROJECTSpeech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGEThe 5th CHiME Speech Separation and Recognition ChallengeDATAThe 5th CHiME Speech Separation and Recognition ChallengeDATACSTR VCTK CorpusDATALibriSpeech ASR corpusDATASwitchboard-1 Telephone Speech CorpusDATATED-LIUM Corpus
WIKIAutomatic summarizationBOOKAutomatic Text SummarizationPAPERText Summarization Using Neural NetworksPAPERRanking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATAText Analytics Conferences (TAC)DATADocument Understanding Conferences (DUC)
INFOCoreference ResolutionPAPERDeep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPERImproving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGECoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGECoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
PAPERNeural Network Translation Models for Grammatical Error CorrectionCHALLENGECoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGECoNLL-2014 Shared Task: Grammatical Error CorrectionDATANUS Non-commercial research/trial corpus licenseDATALang-8 Learner CorporaDATACornell Movie--Dialogs CorpusPROJECTDeep Text CorrectorPRODUCTdeep grammar
PAPERGrapheme-to-Phoneme Models for (Almost) Any LanguagePAPERPolyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPERMultitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECTSequence-to-Sequence G2P toolkitDATAMultilingual Pronunciation Data
WIKILanguage identificationPAPERAUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSCHALLENGE2015 Language Recognition Evaluation
WIKILanguage modelTOOLKITKenLM Language Model ToolkitPAPERDistributed Representations of Words and Phrases and their CompositionalityPAPERCharacter-Aware Neural Language ModelsDATAPenn Treebank
WIKILemmatisationPAPERJoint Lemmatization and Morphological Tagging with LEMMINGTOOLKITWordNet LemmatizerDATATreebank-3
WIKILip readingPAPERLip Reading Sentences in the WildPAPER3D Convolutional Neural Networks for Cross Audio-Visual Matching RecognitionPROJECTLip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksDATAThe GRID audiovisual sentence corpus
PAPERNeural Machine Translation by Jointly Learning to Align and TranslatePAPERNeural Machine Translation in Linear TimePAPERAttention Is All You NeedCHALLENGEACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGEEMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)DATAOpenSubtitles2016DATAWIT3: Web Inventory of Transcribed and Translated TalksDATAThe QCRI Educational Domain (QED) Corpus
WIKIInflectionPAPERMorphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGESIGMORPHON 2016 Shared Task: Morphological ReinflectionDATAsigmorphon2016
WIKINamed-entity recognitionPAPERNeural Architectures for Named Entity RecognitionPROJECTOSU Twitter NLP ToolsCHALLENGENamed Entity Recognition in TwitterCHALLENGECoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGEIntroduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATACoNLL-2002 NER corpusDATACoNLL-2003 NER corpusDATANUT Named Entity Recognition in Twitter Shared task
PAPERDynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECTParalex: Paraphrase-Driven Learning for Open Question AnsweringDATAMicrosoft Research Paraphrase CorpusDATAMicrosoft Research Video Description CorpusDATAPascal DatasetDATAFlickr DatasetDATAThe SICK data setDATAPPDB: The Paraphrase DatabaseDATAWikiAnswers Paraphrase Corpus
PAPERNeural Paraphrase Generation with Stacked Residual LSTM NetworksPAPERA Deep Generative Framework for Paraphrase GenerationPAPERParaphrasing Revisited with Neural Machine Translation
WIKIParsingTOOLKITThe Stanford Parser: A statistical parserTOOLKITspaCy parserPAPERA fast and accurate dependency parser using neural networksCHALLENGECoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGECoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGECoNLL 2015 Shared Task: Shallow Discourse ParsingCHALLENGESemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!
WIKIPart-of-speech taggingPAPERMultilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPERUnsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATATreebank-3TOOLKITnltk.tag package
PAPERNeural Network Language Model for Chinese Pinyin Input Method EnginePROJECTNeural Chinese Transliterator
WIKIQuestion answeringPAPERAsk Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPERDynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGETREC Question Answering TaskCHALLENGENTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGECLEF Question Answering TrackCHALLENGESemEval-2017 Task 3: Community Question AnsweringDATAMS MARCO: Microsoft MAchine Reading COmprehension DatasetDATAMaluuba NewsQADATASQuAD: 100,000+ Questions for Machine Comprehension of TextDATAGraphQuestions: A Characteristic-rich Question Answering DatasetDATAStory Cloze Test and ROCStories CorporaDATAMicrosoft Research WikiQA CorpusDATADeepMind Q&A DatasetDATAQASent
WIKIRelationship extractionPAPERA deep learning approach for relationship extraction from interaction context in social manufacturing paradigm
WIKISemantic role labelingBOOKSemantic Role LabelingPAPEREnd-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPERNeural Semantic Role Labeling with Dependency Path EmbeddingsPAPERDeep Semantic Role Labeling: What Works and What's NextCHALLENGECoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGECoNLL-2004 Shared Task: Semantic Role LabelingTOOLKITIllinois Semantic Role Labeler (SRL)DATACoNLL-2005 Shared Task: Semantic Role Labeling
WIKISentence boundary disambiguationPAPERA Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKITNLTK TokenizersDATAThe British National CorpusDATASwitchboard-1 Telephone Speech Corpus
WIKISentiment analysisINFOAwesome Sentiment AnalysisCHALLENGEKaggle: UMICH SI650 - Sentiment ClassificationCHALLENGESemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGESemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECTSenticNetDATAMulti-Domain Sentiment Dataset (version 2.0)DATAStanford Sentiment TreebankDATATwitter Sentiment CorpusDATATwitter Sentiment Analysis Training CorpusDATAAFINN: List of English words rated for valence
WIKISource separationPAPERFrom Blind to Guided Audio Source SeparationPAPERJoint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGESignal Separation Evaluation Campaign (SiSEC)CHALLENGECHiME Speech Separation and Recognition Challenge
WIKISpeaker diarisationPAPERDNN-based speaker clustering for speaker diarisationPAPERUnsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPERAudio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGERich Transcription Evaluation
WIKISpeaker recognitionPAPERA NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPERDEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONCHALLENGENIST Speaker Recognition Evaluation (SRE)INFOAre there any suggestions for free databases for speaker recognition?
- See Lip-reading
 
WIKISpeech_segmentationPAPERWord Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPERUnsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPERUnsupervised Lexicon Discovery from Acoustic InputPAPERWeakly supervised spoken term discovery using cross-lingual side informationDATACALLHOME Spanish Speech
WIKISpeech synthesisPAPERWaveNet: A Generative Model for Raw AudioPAPERTacotron: Towards End-to-End Speech SynthesisPAPERDeep Voice 2: Multi-Speaker Neural Text-to-SpeechDATAThe World English BibleDATALJ Speech DatasetDATALessac DataCHALLENGEBlizzard Challenge 2017PRODUCTLyrebirdPROJECTThe Festvox projectTOOLKITMerlin: The Neural Network (NN) based Speech Synthesis System
WIKISpeech enhancementBOOKSpeech enhancement: theory and practicePAPERAn Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPERA Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPERSpeech Enhancement Based on Deep Denoising Autoencoder
WIKIStemmingPAPERA BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMINGTOOLKITNLTK Stemmers
WIKITerminology extractionPAPERNeural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
WIKIText simplificationPAPERAligning Sentences from Standard Wikipedia to Simple WikipediaPAPERProblems in Current Text Simplification Research: New Data Can HelpDATANewsela Data
- See Speech Synthesis
 
WIKITextual entailmentPROJECTTextual Entailment with TensorFlowPAPERTextual Entailment with Structured Attentions and CompositionCHALLENGESemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGESemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
PAPERPHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECTAn implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGEVoice Conversion Challenge 2016CHALLENGEVoice Conversion Challenge 2018DATACMU_ARCTIC speech synthesis databasesDATATIMIT Acoustic-Phonetic Continuous Speech Corpus
WIKIWord embeddingTOOLKITGensim: word2vecTOOLKITfastTextTOOLKITGloVe: Global Vectors for Word RepresentationINFOWhere to get a pretrained modelPROJECTPre-trained word vectors of 30+ languagesPROJECTPolyglot: Distributed word representations for multilingual NLP
INFOWhat is Word Prediction?PAPERThe prediction of character based on recurrent neural network language modelPAPERAn Embedded Deep Learning based Word PredictionPAPEREvaluating Word Prediction: Framing Keystroke SavingsDATAAn Embedded Deep Learning based Word PredictionPROJECTWord Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?