This repository stores an official dataset of citation functions of the paper "SDCF: Semi-automatically structured Dataset of Citation Functions". This research aims to build a new dataset of citation functions. Our work was motivated by the fact that existing datasets consist of limited labels and contain few instances. Moreover, most of the existing labels of citation functions were built using limited research scopes.
The contribution of this paper provided in this repository are:
- A new labeling scheme of citation functions containing five coarse labels and 21 fine-grained labels.
- A labeling guidance, for annotators.
- A development dataset, which consists of 5,668 manually labeled instances.
- A final dataset, which consists of 1,840,815 automatically labeled instances.
The proposed dataset was developed by following two sub-stages.
- In the first stage, this research proposes a new labeling scheme of citation functions.
- In the second stage, we develop a new dataset of citation functions using the semiautomatic approach.
- The semiautomatic approach is implemented by creating a development dataset (manually labeled dataset), and the final dataset (automatically labeled dataset).
- Furthermore, we apply the Active Learning (AL) method as low resource scenarios.
The whole stages of dataset building are shown in the following figure.
The below figure represents the AL approach:
The proposed scheme consists of two parts, five coarse labels, and 21 fine-grained labels.
coarse labels | fine-grained labels |
---|---|
background | definition |
background | suggest |
background | judgment |
background | technical |
background | trend |
citing paper work | citing_paper_corroboration |
citing paper work | citing_paper_based_on |
citing paper work | citing_paper_use |
citing paper work | citing_paper_extend |
citing paper work | citing_paper_dominant |
citing paper work | citing_paper_future |
cited paper work | cited_paper_propose |
cited paper work | cited_paper_success |
cited paper work | cited_paper_weakness |
cited paper work | cited_paper_result |
cited paper work | cited_paper_dominant |
compare and contrast | compare |
compare and contrast | contrast |
other | other_cited_paper_comparison |
other | other_multiple_intent |
other | other_other |
The finel dataset of citation functions is organized as follow:
columns-0 | columns-1 | columns-2 | columns-3 | columns-4 | columns-5 |
---|---|---|---|---|---|
paper-id | published-date-in-ArXiv | paper-title | line-number | citing-sentence | label |
If you find that our datasets are useful, please cite:
Further questions, reach us on: setio@is.cs.tut.ac.jp