### Collecting Data - Lab 3
#### Inter-annotator agreement

In this exercise, you will calculate inter-annotator agreement using Cohen's kappa, both manually and using a ready-made Python function.

Imagine that you have two annotators labelling text sentiment, where the labels are "positive", "neutral", or "negative". The two lists below provide the annotators' labels. 

In [None]:
annotator_a = ["positive",
              "neutral",
              "negative",
              "negative",
              "positive",
              "neutral",
              "positive",
              "positive",
              "neutral",
              "neutral"]

In [None]:
annotator_b = ["neutral",
              "neutral",
              "negative",
              "negative",
              "positive",
              "neutral",
              "neutral",
              "positive",
              "neutral",
              "positive"]

Of course, Python has a function to compute Cohen's kappa automatically. It's provided in the Scikit-learn library as specified below.

In [None]:
from sklearn.metrics import cohen_kappa_score

In [None]:
kappa_automatic = cohen_kappa_score(annotator_a, annotator_b)
print(kappa_automatic)

Now you actually know the answer. What can you say about the inter-annotator agreement in this case? Look at the last few slides in Lecture 3 to interpret the resulting kappa value.

-----------------------------------

Next, the idea is to calculate Cohen's kappa manually. By going through the calculations step-by-step, you will be able to understand how it's actually computed.

Recall from the lecture that there are 4 steps to compute the value of Cohen's kappa:
1. Build a confusion matrix from the annotations.
2. Compute the raw (observed) agreement - a numeric variable `agreement_obs`.
3. Compute the expected agreement - a numeric variable `agreement_exp`.
5. Compute Cohen's kappa: $kappa = \frac{(agreement\_obs - agreement\_exp)}{(1 - agreement\_exp)}$

Before getting to the computations, we define a couple of useful variables below.

In [None]:
# Total number of items:
n_items = len(annotator_a)

# Labels:
annotation_labels = ["positive", "neutral", "negative"]

# Number of labels:
n_labels = len(annotation_labels)

#### Step 1. Confusion matrix.
To build a confusion matrix, you can use another function from the Scikit-learn library, `confusion_matrix`. Write 2-3 lines of code to import the function, compute, and print a confusion matrix for the annotations provided in the two lists above.

**Important**. In this case, you need to provide the `labels` argument to the `confusion_matrix` function for it to work properly. If needed, consult the documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

In [None]:
# Import the function:

# Compute a confusion matrix:

# Print the matrix:


#### Step 2. Observed agreement.
Observed agreement is computed by looking only at those cases where the two annotators agree. All such cases appear in the _diagonal_ of the confusion matrix (top left to bottom right).

To extract the diagonal from a matrix, you can use the `diag` function (without any arguments except your matrix) from the `numpy` package. If needed, consult the documentation:
https://numpy.org/doc/stable/reference/generated/numpy.diag.html

Write code to import the function, extract the diagonal into a list, sum all the list elements into `num_agreed_items` and compute the observed agreement: $agreement\_obs = \frac{num\_agreed\_items}{total\_items}$.

In [None]:
# Import the function:

# Extract the diagonal:

# Compute num_agreed_items, the number of items agreed upon:

# Compute the observed agreement:

# Print the observed agreement:


#### Step 3. Expected agreement.
Expected agreement is computed per label. In your confusion matrix, rows represent annotator A and columns annotator B. Each annotator can choose 1 of the 3 labels - therefore, you have 3 rows and 3 columns.

To compute the expected agreement for a particular label, you need to multiply two values:
1) total number of times annotator **A** chooses that label, divided by the total number of items;
2) total number of times annotator **B** chooses that label, divided by the total number of items.

To compute (1), you can add up the values in a particular row of the matrix `sum(matrix[label_idx,:])` and divide that by `n_items`.

To compute (2), you add up the values in a particular column: `sum(matrix[:, label_idx])` and again divide that by `n_items`.

By multiplying these two values, you get the expected agreement for a given label. If you do that in a loop and add up the values for each label, you get the total expected agreement value, `agreement_exp`.

In [None]:
# Define an empty list agreement_exp_per_label:

# Set agreement_exp to be 0:

# Use a for-loop (label_idx from 0 to n_labels):

    # Compute agreement_exp_per_label:

    # Add agreement_exp_per_label to agreement_exp:

# Print agreement_exp 


#### Step 5. Cohen's kappa.

All that's left to do is to supply your computed values of `agreement_obs` and `agreement_exp` to the formula defined above.

In [None]:
# Compute kappa:

# Print kappa:


Let's compare your kappa value to the one computed automatically using a Scikit-learn function. If the cell below returns True, you've got it right, otherwise you need to look back and find where you made a mistake.

In [None]:
kappa_automatic == kappa