This repository is about Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance. Specifically, this repository includes:
- Set of Modified Incomplete TecHnical paper sentences (SMITH) - an evaluation dataset of pairs of draft sentences and their final versions
- Synthetic training dataset - a dataset including synthetic draft sentences used for training the baseline models
For the details, see the paper.
- SMITH - Licensed under the Creative Commons 4.0 BY (Attribution) license
- Synthetic training dataset - Licensed under the Creative Commons 3.0 BY-NC-SA (Attribution, Non-Commercial, Share-Alike) license.