No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Guidelines
LICENSE.md
README.md

README.md

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Sentiment Annotation Guidelines

This repository contains our guidelines for annotation of sentiment in social media. There are two versions, one with examples in Russian (VKontakte social network) and one with English examples from Twitter. The guidelines were prepared as part of RuSentiment project by Text Machine Lab for NLP.

RuSentiment dataset that was presented in the paper is no longer included in this repo due to request from VKontakte.

Project page: http://text-machine.cs.uml.edu/projects/rusentiment/

Paper: "Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M. and Gribov, A., 2018. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. In Proceedings of COLING 2018 (pp. 755-763)." PDF | BibTex

Highlights of our annotation policy:

  • negative and positive sentiment classes cover both implicit and explicit sentiment, both for expressing emotion and attitudes;
  • neutral class (unmarked for sentiment);
  • speech act class: social media posts often include formulaic greetings, thank-you posts and congratulatory posts, which may or may not express the actual sentiment of the sender;
  • "skip" class for unclear cases, noisy posts, content that was likely not created by the users themselves (poems, lyrics, jokes etc.).
  • cases of mixed sentiment are annotated for the dominant sentiment of the post, and the guidelines cover 6 frequent cases of mixed sentiment to improve inter-annotator agreement;
  • hashtags and smileys are not treated as automatic sentiment labels.

For Russian these guideines yielded annotation speed of 250-350 posts per hour, with Fleiss kappa of 0.654 for randomly selected posts. See paper for details on how active learning influenced the inter-annotator agreement.