This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Sentiment Annotation Guidelines
This repository contains our guidelines for annotation of sentiment in social media. There are two versions, one with examples in Russian (VKontakte social network) and one with English examples from Twitter. The guidelines were prepared as part of RuSentiment project by Text Machine Lab for NLP.
RuSentiment dataset that was presented in the paper is no longer included in this repo due to request from VKontakte.
Project page: http://text-machine.cs.uml.edu/projects/rusentiment/
Paper: "Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M. and Gribov, A., 2018. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. In Proceedings of COLING 2018 (pp. 755-763)." PDF | BibTex
Highlights of our annotation policy:
- negative and positive sentiment classes cover both implicit and explicit sentiment, both for expressing emotion and attitudes;
- neutral class (unmarked for sentiment);
- speech act class: social media posts often include formulaic greetings, thank-you posts and congratulatory posts, which may or may not express the actual sentiment of the sender;
- "skip" class for unclear cases, noisy posts, content that was likely not created by the users themselves (poems, lyrics, jokes etc.).
- cases of mixed sentiment are annotated for the dominant sentiment of the post, and the guidelines cover 6 frequent cases of mixed sentiment to improve inter-annotator agreement;
- hashtags and smileys are not treated as automatic sentiment labels.
For Russian these guideines yielded annotation speed of 250-350 posts per hour, with Fleiss kappa of 0.654 for randomly selected posts. See paper for details on how active learning influenced the inter-annotator agreement.