Skip to content

yuanyuanlei-nlp/SubReddit_agreement_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Read Me

These are two newly annotated datasets for the research of identifying (dis)agreement relations between (Quote, Response) comment pairs in social media platform.


Paper: Few-Shot (Dis)Agreement Identification in Online Discussions with Regularized and Augmented Meta-Learning
Accepted: The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
Authors: Yuanyuan Lei, Ruihong Huang
Affiliation: Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, USA
Paper Link: https://aclanthology.org/2022.findings-emnlp.409/


Task Description

In online discussion forums and social media platforms, people express different opinions towards a common topic, by posting their comments or replying to another user's previous comments. We call the comments replying to another comments as Response and the comments being replied as Quote. Researching the relation between (Quote, Response) pairs will enable many opinion mining applications. Specifically, the task is identifying the relation between (Quote, Response) pair to be Agreement, Disagreement, or Neutral.


Dataset Description

We collected and annotated two datasets from Reddit (https://www.reddit.com/) on two recent new topics, mask mandate and COVID vaccine, containing 1.5k and 1.4k comment pairs respectively, in the year of 2021.

  • SubReddit-MaskMandate (SubReddit-MaskMandate.csv) is collected from a sub forum on Reddit with the topic of mask mandate (https://www.reddit.com/r/antimaskers/), discussing whether people should wear masks during the COVID pandemic. The Cohen's kappa between annotators is 0.8012.
  • SubReddit-CovidVaccine (SubReddit-CovidVaccine.csv) is collected from a sub forum on Reddit with the topic of COVID vaccine (https://www.reddit.com/r/CovidVaccinated/), discussing whether people should take the COVID vaccination. The Cohen's kappa between annotators is 0.8215.

Dataset Usage

There are three columns in the two datasets:

  • quote_text: the text of Quote comments
  • response_text: the text of Response comments
  • relation: the annotated relation between (Quote, Response) pairs, containing three different classes, where 0 represents Disagreement, 1 represents Neutral, and 2 represents Agreement

Dataset Statistics

Dataset Agree Disagree Neutral Total
SubReddit-MaskMandate 645 546 343 1534
SubReddit-CovidVaccine 272 425 703 1400

Annotation Guideline

Every (Quote, Response) pairs are annotated with one of Agreement, Disagreement, and Neutral labels:

  • Agreement: both Quote and Response express subjective opinions towards some topics instead of sharing objective experience or giving objective advice, and there is explicit or implicit evidence for agreement / support / semantically similarity.
  • Disagreement: both Quote and Response express subjective opinions, and there is explicit or implicit evidence for disagreement / attack / denial / reject / challenge / semantically opposition.
  • Neutral: Quote or Response is not expressing subjective opinions, or Quote and Response are stating quite different / unrelated things, or both Quote and Response express opinions but Response is expressing vague / not sure / weak agreement or disagreement, or partially agree partially disagree.

Citation

If you use this dataset in your research, please refer to and cite:

Yuanyuan Lei and Ruihong Huang. 2022. Few-Shot (Dis)Agreement Identification in Online Discussions with Regularized and Augmented Meta-Learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5581–5593, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.


@inproceedings{lei-huang-2022-shot,
    title = "Few-Shot (Dis)Agreement Identification in Online Discussions with Regularized and Augmented Meta-Learning",
    author = "Lei, Yuanyuan  and
      Huang, Ruihong",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.409",
    pages = "5581--5593",
    abstract = "Online discussions are abundant with opinions towards a common topic, and identifying (dis)agreement between a pair of comments enables many opinion mining applications. Realizing the increasing needs to analyze opinions for emergent new topics that however tend to lack annotations, we present the first meta-learning approach for few-shot (dis)agreement identification that can be quickly applied to analyze opinions for new topics with few labeled instances. Furthermore, we enhance the meta-learner{'}s domain generalization ability from two perspectives. The first is domain-invariant regularization, where we design a lexicon-based regularization loss to enable the meta-learner to learn domain-invariant cues. The second is domain-aware augmentation, where we propose domain-aware task augmentation for meta-training to learn domain-specific expressions. In addition to using an existing dataset, we also evaluate our approach on two very recent new topics, mask mandate and COVID vaccine, using our newly annotated datasets containing 1.5k and 1.4k SubReddits comment pairs respectively. Extensive experiments on three domains/topics demonstrate the effectiveness of our meta-learning approach.",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published