Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

znculee/webnlg2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

WebNLG Challenge 2020

This repo provides data and scripts used in the paper Leveraging Large Pretrained Models for WebNLG 2020 by Xintong Li, Aleksandre Maskharashvili, Symon Jory Stevens-Guille, and Michael White, published at INLG2020.

Reference

@inproceedings{li-etal-2020-leveraging-large,
    title = "Leveraging Large Pretrained Models for {W}eb{NLG} 2020",
    author = "Li, Xintong  and
      Maskharashvili, Aleksandre  and
      Jory Stevens-Guille, Symon  and
      White, Michael",
    booktitle = "Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)",
    month = "12",
    year = "2020",
    address = "Dublin, Ireland (Virtual)",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.webnlg-1.12",
    pages = "117--124",
    abstract = "In this paper, we report experiments on finetuning large pretrained models to realize resource description framework (RDF) triples to natural language. We provide the details of how to build one of the top-ranked English generation models in WebNLG Challenge 2020. We also show that there appears to be considerable potential for reranking to improve the current state of the art both in terms of statistical metrics and model-based metrics. Our human analyses of the generated texts show that for Russian, pretrained models showed some success, both in terms of lexical and morpho-syntactic choices for generation, as well as for content aggregation. Nevertheless, in a number of cases, the model can be unpredictable, both in terms of failure or success. Omissions of the content and hallucinations, which in many cases occurred at the same time, were major problems. By contrast, the models for English showed near perfect performance on the validation set.",
}

WebNLG 2020

Data

The latest and other releases of the WebNLG 2020 data can be tracked in this repository, e.g. shimorina/webnlg-dataset/releases_v2.1/json. The offical challenge data can be download from the following links: train&dev and test.

Setup

huggingface/transformers should be installed from the source. The code has been tested on commit 3babef81 of huggingface/transformers.

git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout -b webnlg 3babef81
pip install -e .
cd ..
git clone https://github.com/znculee/finetune-transformers.git

Data Preprocessing

bash scripts/prepare.2020_v2.en.sh
bash scripts/train.2020_v2.t5_large.sh
bash scripts/generate.2020_v2.t5_large.sh

WebNLG 2017

Data

ThiagoCF05/webnlg/data/v1.5/en

Data Preprocessing

cp -r path/to/ThiagoCF05/webnlg/data/v1.5/en data/2017_v1_5
bash scripts/prepare.2017_v1_5.sh
bash scripts/train.2017_v1_5.t5_small.sh
bash scripts/generate.2017_v1_5.t5_small.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages