Skip to content

traopia/KGNarrative

Repository files navigation

Semantic Enhanced Content Planning: A Preliminary Exploration

This repo contains the code for "Semantic Enhanced Content Planning: A Preliminary Exploration", where different and deeper levels of semantic in a content planner are tested for textual generation. Two dataset are augmented with semantic information and tested on popular transformer models for language generation.

Pipe

DATASETS :

Two newly augmented are introduced based on the Existing WebNLG and DWIE. The enhanced version of these datasets can be found in the Datasets folder. The addition was done by either mining from text or scraping large knowledge bases. Recreating the augmentation can be done by running the scripts in Data_Preprocessing after the orginial dataset has been downloaded in the main folder. For each dataset the steps are:

DWIE:

Downaloding (clones and dowloads the full dataset):

git clone https://github.com/klimzaporojets/DWIE
python3 Data_Preprocessing/dwie_download.py

Preprocessing (GPU is required):

python Data_Preprocessing/preprocessing_DWIE.py

WebNlg

Download WebNLG from orginial repo (https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0) Release 3.0 in English is required Preprocessing (GPU is required) [takes considerable time]:

python Data_Preprocessing/preprocessing_WebNLG.py

MODELS

For the results, Bart-large was utilized with WebNLG and LongFormer (led) for DWIE. For finetuning model on a specific content planner: ($element is one of 'Types_KG' 'Instances_KG' 'Subclasses_KG' 'Instances_list' 'multi_Subclasses_KG' 'entities_list' 'semantic_of_news')

#WebNLG
python3 finetuning/finetunemodel_webnlg.py Datasets/WebNLG/4experiment $element bart-large path/to/results/$element
#DWIE
python3 finetuning/finetunemodel_led.py Datasets/WebNLG/4experiment $element led path/to/results/$element

RESULTS

To reproduce the results from the paper use the scripts in the scripts folder by running for example:

./experiment_scripts/webnlg_Semantic.sh

Dependencies

A working enviroment is provided in enviroment.yml. Both for the dataset generation and finetuning, minumal requirement file is provided in each folder. Parent metric was installed from source (https://github.com/KaijuML/parent). Same for Bleurt (https://github.com/google-research/bleurt)

Citations

Should you use this code/dataset for your own research, please cite:


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published