Skip to content
A curated list of resources dedicated to Natural Language Generation (NLG)
Branch: master
Clone or download
Latest commit 3243b4b Jan 17, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE add license and description Nov 11, 2019
README.md add Plato dialog system Jan 17, 2020
code_of_conduct.md doc: add code of conduct Nov 22, 2019
contributing.md doc: add contributing.md and fix awesome lint errors Nov 4, 2019

README.md

Awesome Natural Language Generation Awesome

Natural Language Generation is a broad domain with applications in chat-bots, story generation, and data descriptions. There is a wide spectrum of different technologies addressing parts or the whole of the NLG process. This list aims to represent this deversity of NLG applications and techniques by providing links to various projects, tools, research papers, and learning materials.

Contents

Datasets

  • Alex Context NLG Dataset - A dataset for NLG in dialogue systems in the public transport information domain.
  • Box-score data - This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.
  • E2E - This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data.
  • Neural-Wikipedian - The repository contains the code along with the required corpora that were used in order to build a system that "learns" how to generate English biographies for Semantic Web triples.
  • WeatherGov - Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.
  • WebNLG - The enriched version of the WebNLG - a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation.
  • WikiBio - wikipedia biography dataset - This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).
  • The Wikipedia company corpus - Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English.
  • YelpNLG - YelpNLG provides resources for natural language generation of restaurant reviews.

Dialog

  • Chatito - Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
  • NNDIAL - NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models.
  • Plato - This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
  • RNNLG - RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains.
  • TGen - Statistical NLG for spoken dialogue systems.

Evaluation

  • NLG-eval - Evaluation code for various unsupervised automated metrics for Natural Language Generation.
  • VizSeq - A Visual Analysis Toolkit for Text Generation Tasks.

Grammar

  • OpenCCG - OpenCCG library for parsing and realization with CCG.
  • GrammaticalFramework - A programming language for multilingual grammar applications.
  • EasyCCG - CCG: All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
  • CCG Lab - All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
  • CCGweb - A Web platform for parsing and annotation.

Libraries

Narrative Generation

  • Random Story Generator - Using Natural Language Generation (NLG) to create a random short story.
  • Tracery - A story-grammar generation library for JavaScript.

Neural Net NLG

  • graph-2-text - Graph to sequence implemented in Pytorch combining Graph convolutional networks and opennmt-py.
  • Image Caption Generator - A Neural Network based generative model for captioning images using Tensorflow.
  • PPLM - Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
  • Texar - Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks.
  • textgenrnn - Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
  • Transformers - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
  • Summary Generation From Structured Data - For converting information present in the form of structured data into natural language text.

Papers

Products

  • Accelerated Text - Automatically generate multiple natural language descriptions of your data varying in wording and structure.
  • Twine - An open-source tool for telling interactive, nonlinear stories.

Realizers

  • Genl - Surface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar.
  • JSrealB - A JavaScript bilingual text realizer for web development.
  • SimpleNLG - Java API for Natural Language Generation.
  • SimpleNLG DE - German version of SimpleNLG 4.
  • SimpleNLG-EnFr - SimpleNLG-EnFr 1.1 is a bilingual English/French adaption of SimpleNLG v4.2.

Templating Languages

  • calyx - A Ruby library for generating text with recursive template grammars.
  • nalgene - Natural language generation language.
  • StringTemplate - Java template engine (with ports for C##, Objective-C, JavaScript, Scala) for generating source code, web pages, emails, or any other formatted text output.

Videos

License

CC0

To the extent possible under law, TokenMill has waived all copyright and related or neighboring rights to this work.

You can’t perform that action at this time.