# Interannotation Agreement Study

## 0. Introduction

- We have three complete groups of annotation from three annotators from the group, so the annotation distribution of each annotators matters. 

- Based on the annotations we have, the Cohen's Kappa metric is used in our Interannotation Agreement Study. 

- We applied the Cohen's Kappa metric separately to the aspects annotation and the overall(aspect+sentiment) annotation and expect to get a higher agreement on average aspects annotation. 

- We compare the annotation in (annotator) pairs and average them in the end. The agreement scores in pairs tell us which annotators might have a higher agreement with the other two annotators and help us pick the final version of the annotation.

## 1. Experimenting with annotation options (Optional)

In order to improve annotation quality and annotation efficiency. We've done two things:

1. In order to improve the interannotation agreement/quality, we randomly extracted 10 samples from the data set and discussed together in order to align our understandings of the annotation guideline. We later updated the annotation guideline to reduce ambiguity based on the team discussion.

2. To improve the annotation efficiency, we extracted some keywords related to the aspects we should label so we can have an quick prediction about the sentence we will annotate. This is useful especially when the sentence is long and implicit.

## 2. Load the required packages

In [1]:
! pip install openpyxl

You should consider upgrading via the '/Users/qichao/opt/miniconda3/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

In [39]:
import pandas as pd
from nltk.metrics.agreement import AnnotationTask
from collections import Counter, defaultdict

## 3. Importing and Processing data

In [3]:
Andrew_annotation = pd.read_excel('raw_manual_annotation/Andrew_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)
Yuesheng_annotation = pd.read_excel('raw_manual_annotation/Yuesheng_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)
Jiang_annotation = pd.read_excel('raw_manual_annotation/Jiang_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)


In [4]:
def expand_columns(df):
    '''Add extra columns to indicate if aspects are annotated'''
    
    df.columns = ['id', 'title', 'focus_sentence', 'aspect_politics_sentiment', 'aspect_humanity_sentiment',
       'aspect_science_sentiment', 'aspect_economy_sentiment']
    
    df["aspect_politics"] = df["aspect_politics_sentiment"] == 'None'
    df["aspect_humanity"] = df["aspect_humanity_sentiment"] == 'None'
    df["aspect_science"] = df["aspect_science_sentiment"] == 'None'
    df["aspect_economy"] = df["aspect_economy_sentiment"] == 'None'
    
    #convert the bool into string type so we can later make the annotation triple.
    mask = df.applymap(type) != bool
    d = {True: 'TRUE', False: 'FALSE'}
    df = df.where(mask, df.replace(d))
    return df

In [5]:
Andrew_annotation = expand_columns(Andrew_annotation)
Yuesheng_annotation = expand_columns(Yuesheng_annotation)
Jiang_annotation = expand_columns(Jiang_annotation)

## 4. Interannotator agreement study

In [6]:
def convert_to_triples(annotator, df, col):
    '''extract triples from a dataframe based on one column names,
        each triple contains (an annotator name, a data id, an annotation of the column)'''
    triples = []
    for ind, id_ in enumerate(list(df['id'])):
        triple = (annotator, id_, col+"_"+list(df[col])[ind])
        triples.append(triple)
    return triples

In [7]:
def get_avg_kappa_score(annotators, df_1, df_2, cols):
    '''calculate the average Kappa agreement score of two annotators(in a list) based on a list of columns'''
    kappas = []
    for col in cols:
        annotator_1_triples = convert_to_triples(annotators[0], df_1, col)
        annotator_2_triples = convert_to_triples(annotators[1], df_2, col)
        annotation_task = AnnotationTask(annotator_1_triples+annotator_2_triples)
        kappa = annotation_task.kappa()
        kappas.append(kappa)
    avg_kappa = sum(kappas)/len(cols)
    return avg_kappa

In [10]:
aspect_agreement_1 = get_avg_kappa_score(["Andrew","Yuesheng"], Andrew_annotation, Yuesheng_annotation, ['aspect_politics', 'aspect_humanity',
       'aspect_science', 'aspect_economy'])
aspect_agreement_2 = get_avg_kappa_score(["Andrew","Jiang"], Andrew_annotation, Jiang_annotation, ['aspect_politics', 'aspect_humanity',
       'aspect_science', 'aspect_economy'])
aspect_agreement_3 = get_avg_kappa_score(["Yuesheng","Jiang"], Yuesheng_annotation, Jiang_annotation, ['aspect_politics', 'aspect_humanity',
       'aspect_science', 'aspect_economy'])


print('The aspect agreement between Andrew and Yuesheng:', aspect_agreement_1)
print('The aspect agreement between Andrew and Jiang:', aspect_agreement_2)
print('The aspect agreement between Jiang and Yuesheng:', aspect_agreement_3)
print("The average inter-annotation agreement score based on aspects is:", (aspect_agreement_1+aspect_agreement_2+aspect_agreement_3)/3)

The aspect agreement between Andrew and Yuesheng: 0.4998120533330417
The aspect agreement between Andrew and Jiang: 0.5220230237330821
The aspect agreement between Jiang and Yuesheng: 0.777201476231673
The average inter-annotation agreement score based on aspects is: 0.5996788510992656


<br>

In [9]:
overall_agreement_1 = get_avg_kappa_score(["Andrew","Yuesheng"], Andrew_annotation, Yuesheng_annotation, ['aspect_politics_sentiment', 'aspect_humanity_sentiment',
       'aspect_science_sentiment', 'aspect_economy_sentiment'])
overall_agreement_2 = get_avg_kappa_score(["Andrew","Jiang"], Andrew_annotation, Jiang_annotation, ['aspect_politics_sentiment', 'aspect_humanity_sentiment',
       'aspect_science_sentiment', 'aspect_economy_sentiment'])
overall_agreement_3 = get_avg_kappa_score(["Yuesheng","Jiang"], Yuesheng_annotation, Jiang_annotation, ['aspect_politics_sentiment', 'aspect_humanity_sentiment',
       'aspect_science_sentiment', 'aspect_economy_sentiment'])

print('The overall agreement between Andrew and Yuesheng:', overall_agreement_1)
print('The overall agreement between Andrew and Jiang:', overall_agreement_2)
print('The overall agreement between Yuesheng and Jiang:', overall_agreement_3)
print("The average inter-annotation agreement score is:", (overall_agreement_1+overall_agreement_2+overall_agreement_3)/3)

The overall agreement between Andrew and Yuesheng: 0.3962174797118952
The overall agreement between Andrew and Jiang: 0.4383249504905105
The overall agreement between Yuesheng and Jiang: 0.7004710021298375
The average inter-annotation agreement score is: 0.5116711441107477


## 5. Extract annotation by voting


In order to get data with higher quality annotation, we extract the annotation by voting. That is, we keep the annotations that more than 1 annotators agree on a data point, otherwise the data is annotated as None. The extracted annotation will be presented in our interface.

In [85]:
#re-load the original data
Andrew_annotation = pd.read_excel('raw_manual_annotation/Andrew_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)
Yuesheng_annotation = pd.read_excel('raw_manual_annotation/Yuesheng_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)
Jiang_annotation = pd.read_excel('raw_manual_annotation/Jiang_climate_change_edit_v3_n521_shuffled.xlsx', index_col=0, header=0)

In [86]:
def extract_annotation(df_1, df_2, df_3, cols):
    '''extract the annotation by voting. 
    That is, it keeps the annotations that more than 1 annotators agree on a data point, otherwise the data is annotated as None.
    The dataframe with extracted annotations is returned'''
    extracted_annotation = defaultdict(list)
    
    for col in cols:
        for ind in range(len((df_1[col]))):

            annotation_1 = list(df_1[col])[ind]
            annotation_2 = list(df_2[col])[ind]
            annotation_3 = list(df_3[col])[ind]

            anno, count = Counter([annotation_1,annotation_2,annotation_3]).most_common(1)[0]

            if count >1:
                extracted_annotation[col].append(anno)
            else:
                extracted_annotation[col].append('None')
    return pd.DataFrame(extracted_annotation)
extracted_df = extract_annotation(Andrew_annotation, Yuesheng_annotation, Jiang_annotation, ['aspect_politics', 'aspect_humanity',
       'aspect_science', 'aspect_economy'])

In [87]:
extracted_annotation = extracted_annotation.copy()
for col in extracted_df.columns:
    extracted_annotation[col] = list(extracted_df[col])

In [90]:
extracted_annotation.to_excel("extracted_annotation.xlsx")
print('The extracted annotation is written to Excel File successfully.')

The extracted annotation is written to Excel File successfully.


## 6. Conclusion

1. As expected, the average interannotation agreement score based on aspects is higher than the overall score, because the aspect is easier to identify.

2. By inspecting the scores between two annotators, we find that Jiang has a higher agreement score with other two annotators. The annotators Jiang and Yuesheng reach the highest agreement score either on aspects or overall annotation. This is probably because the two annotators share a similar cultural background and thus show closest perception of a text.