Simplification with Non-Aligned Data

This is a source code that supplements a paper:
Learning to Simplify with Data Hopelessly Out of Alignment, published on ArXiv.

Pre-requisites

Python 3.6
torch 1.3.1
torchaudio 0.10.0+cu113
torchtext 0.5.0
torchvision 0.11.1+cu113
sentencepiece 0.1.9 (together with its python library)

Setting it up

Download and untar the following at the project's top directory.

cd nonaligned_simple
tar jxvf asset.tar.bz2
tar jxvf js-resource.tar.bz2
tar jxvf wasser-resource.tar.bz2

Training

cd js
../util/train.sh -d tsd -b 64 -g 1

wasser-gan

cd wasser
../util/train.sh -d tsd -b 64 -g 1

-d : dataset name
-b : batch size
-g : GPU ID

Generation

Using pre-trained models

As a trained model is provided as part of the package, you can bypass training and go to the generation step directly. Here is what you do.

js-gan

cd js
../util/generate.sh -d tsd -g 1

The output is found in js/data/tsd/pred.out.

wasser-gan

cd wasser
../util/generate.sh -d tsd -g 1

-d : dataset name
-g : GPU ID

The result in wasser/data/tsd/pred.out.

Detokenization

generate.sh gives you a result in a sentence-piece format. Running the following will restore it into a normal text.

cd js
../util/decode_spm.sh

You find the result in js/data/tsd/pred_decoded.txt.

About Data

The training data is based on the sscorpus. We removed source/target pairs whose similarity exceeds 0.65. The test set is the same as one used by Zhang, et al (2017).

References

@misc{https://doi.org/10.48550/arxiv.2204.00741,
doi = {10.48550/ARXIV.2204.00741},
url = {https://arxiv.org/abs/2204.00741},
author = {Nomoto, Tadashi},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Learning to Simplify with Data Hopelessly Out of Alignment},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Share Alike 4.0 International}
}

@inproceedings{zhang-lapata-2017-sentence,
title = "Sentence Simplification with Deep Reinforcement Learning",
author = "Zhang, Xingxing  and Lapata, Mirella",
booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D17-1062",
 }

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
js		js
util		util
wasser		wasser
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplification with Non-Aligned Data

Pre-requisites

Setting it up

Training

Generation

Using pre-trained models

Detokenization

About Data

References

About

Releases

Packages

Contributors 2

Languages

tnomoto/nonaligned_simple

Folders and files

Latest commit

History

Repository files navigation

Simplification with Non-Aligned Data

Pre-requisites

Setting it up

Training

Generation

Using pre-trained models

Detokenization

About Data

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages