# Experiment 2

#### Model Setup

Run models in the following order, using their output labels as features for the next model:

[1.](#1) Multiclass Person Name + Occupation Sequence Classifier

[2.](#2) Multilabel Linguistic Classifier

[3.](#3) Multilabel Document Classifier

***

* Supervised learning
    * Train, Validate, and (Blind) Test Data: under directory `../data/token_clf_data/experiment_input/`
    * Prediction Data: Data: under directory `../data/token_clf_data/model_output/experiment1/`
* Word Embeddings
    * Custom fastText (word2vec with subwords) embeddings of 100 dimensions trained on the CRC Archives catalog's descriptive metadata (harvested October 2020)

***

Load resources:

In [1]:
# For custom functions and variables
import utils, utils1, config

# For data analysis
import pandas as pd
import numpy as np
import os, re

# For creating directories
from pathlib import Path

# For preprocessing
from gensim.models import FastText
from gensim import utils as gensim_utils

# For multilabel token classification
import sklearn.metrics
from sklearn.preprocessing import MultiLabelBinarizer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

# For multiclass sequence classification
import sklearn_crfsuite
from sklearn_crfsuite import scorers
from sklearn_crfsuite import metrics

Define resources for the models:

In [2]:
Path(config.experiment_input_path).mkdir(parents=True, exist_ok=True)    # For train, devtest, and blind test data
Path(config.experiment1_output_path).mkdir(parents=True, exist_ok=True)  # For predictions
Path(config.experiment1_agmt_path).mkdir(parents=True, exist_ok=True)    # For agreement metrics

In [3]:
# Model 1:
ling_label_subset = ["B-Generalization", "I-Generalization", "B-Gendered-Role", "I-Gendered-Role", "B-Gendered-Pronoun", "I-Gendered-Pronoun"]
# Model 2:
pers_o_label_subset = ["B-Unknown", "I-Unknown", "B-Feminine", "I-Feminine", "B-Masculine", "I-Masculine", "B-Occupation", "I-Occupation"]
# Model 3:
so_label_subset = ["B-Stereotype", "I-Stereotype", "B-Omission", "I-Omission"]

In [4]:
ling_label_tags = {
    "Gendered-Pronoun": ["B-Gendered-Pronoun", "I-Gendered-Pronoun"], "Gendered-Role": ["B-Gendered-Role", "I-Gendered-Role"],"Generalization": ["B-Generalization", "I-Generalization"]
    }
pers_o_label_tags = {
    "Unknown": ["B-Unknown", "I-Unknown"], "Feminine": ["B-Feminine", "I-Feminine"], "Masculine": ["B-Masculine", "I-Masculine"],
     "Occupation": ["B-Occupation", "I-Occupation"]
    }
so_label_tags = {
    "Stereotype": ["B-Stereotype", "I-Stereotype"], "Omission": ["B-Omission", "I-Omission"]
             }

In [5]:
d = 100  # dimensions of word embeddings (should match utils1.py)

<a id="1"></a>
## 1. Person Name + Occupation Labels

Train a multiclass sequence classifier, using Conditional Random Field with Adaptive Regularization of Weight Vectors (AROW), on the Person Name and Occupation labels.

Multiclass is a suitable setup for these labels because they are mutually exclusive (no one token should have more than one of these labels).  The sequence classifier with AROW was the highest performing for past algorithm experiments with sequence classifiers for Person Name and Occupation labels.

The devtest data subset from the model in step 1 will be the train data subset in this step, with the predicted Linguistic labels as features passed into this second model.  The train data subset from the first model will be the devtest data subset for this second model.

In [7]:
train_df = pd.read_csv(config.tokc_path+"experiment_input/token_validate.csv", index_col=0)
dev_df =  pd.read_csv(config.tokc_path+"experiment_input/token_train.csv", index_col=0)
perso_train, perso_dev = utils.selectDataForLabels(train_df, dev_df, "tag", pers_o_label_subset)
print(perso_train.shape, perso_dev.shape)

(316721, 10) (308583, 10)


#### Preprocessing

In [8]:
train_df = perso_train.drop(columns=["description_id", "ann_id", "token_offsets", "field", "subset", "pos"])
dev_df = perso_dev.drop(columns=["description_id", "ann_id", "token_offsets", "field", "subset", "pos"])

In [9]:
df_train_token_groups = utils.implodeDataFrame(train_df, ['token_id', 'sentence_id', 'token'])
df_train_token_groups = df_train_token_groups.reset_index()
# df_train_token_groups.head()

In [11]:
df_dev_token_groups = utils.implodeDataFrame(dev_df, ['token_id', 'sentence_id', 'token'])
df_dev_token_groups = df_dev_token_groups.reset_index()
# df_dev_token_groups.head()

In [12]:
df_train_grouped = utils.implodeDataFrame(df_train_token_groups, ['sentence_id'])
df_dev_grouped = utils.implodeDataFrame(df_dev_token_groups, ['sentence_id'])
df_train_grouped = df_train_grouped.rename(columns={"token":"sentence"})
df_dev_grouped = df_dev_grouped.rename(columns={"token":"sentence"})
df_train_grouped.head()

Unnamed: 0_level_0,token_id,sentence,tag
sentence_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,"[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]","[Title, :, Papers, of, The, Very, Rev, Prof, J...","[[O], [O], [O], [O], [O, B-Unknown, B-Masculin..."
3,"[109, 110, 111, 112, 113, 114, 115, 116, 117, ...","[Biographical, /, Historical, :, Professor, Ja...","[[O], [O], [O], [O], [B-Masculine], [I-Masculi..."
5,"[154, 155, 156, 157, 158, 159, 160, 161, 162, ...","[After, his, ordination, he, spent, three, yea...","[[O], [O], [O], [O], [O], [O], [O], [O], [O], ..."
7,"[216, 217, 218, 219, 220, 221, 222, 223, 224, ...","[His, primary, interests, were, in, liturgy, a...","[[O], [O], [O], [O], [O], [O], [O], [O], [O], ..."
9,"[256, 257, 258, 259, 260, 261, 262, 263, 264, ...","[The, service, was, relayed, around, the, worl...","[[O], [O], [O], [O], [O], [O], [O], [O], [O], ..."


Zip the linguistic label and BIO tags together with the tokens so each sentence item is a tuple: `(TOKEN, TAG_LIST)`

In [13]:
df_train_grouped = df_train_grouped.reset_index()
df_dev_grouped = df_dev_grouped.reset_index()
train_sentences_pers = utils1.zipDevFeaturesAndTarget(df_train_grouped, "tag")  # Dev because not using additional feature col for linguistic labels
print(train_sentences_pers[2][:3])
dev_sentences_pers = utils1.zipDevFeaturesAndTarget(df_dev_grouped, "tag")
print(dev_sentences_pers[0][:3])

[('After', ['O']), ('his', ['O']), ('ordination', ['O'])]
[('Scope', ['O']), ('and', ['O']), ('Contents', ['O'])]


In [14]:
train_sentences = train_sentences_pers
dev_sentences = dev_sentences_pers

In [15]:
# Features
X_train = [utils1.extractSentenceFeatures(sentence) for sentence in train_sentences]
X_dev = [utils1.extractSentenceFeatures(sentence) for sentence in dev_sentences]
# Target
y_train = [utils1.extractSentenceTargets(sentence) for sentence in train_sentences]
y_dev = [utils1.extractSentenceTargets(sentence) for sentence in dev_sentences]

#### Train

Train a Conditional Random Field (CRF) model with the default parameters on the **Person Name** category of tags.  We'll increase the max iterations to 100 for this model.

In [16]:
a = "arow"
clf_pers = sklearn_crfsuite.CRF(algorithm=a, variance=0.5, max_iterations=100, all_possible_transitions=True)

In [17]:
# https://stackoverflow.com/questions/66059532/attributeerror-crf-object-has-no-attribute-keep-tempfiles
try:
    clf_pers.fit(X_train, y_train)
except AttributeError:
    pass

Remove `'O'` tags from the targets list since we are interested in the ability to apply the gendered and gender biased language related tags, and the `'O'` tags far outnumber the tags for gendered and gender biased language.

In [18]:
targets = list(clf_pers.classes_)
targets.remove('O')
print(targets)

['I-Unknown', 'B-Masculine', 'I-Masculine', 'B-Occupation', 'I-Occupation', 'B-Unknown', 'I-Feminine', 'B-Feminine']


#### Predict

In [19]:
y_pred = clf_pers.predict(X_dev)

#### Evaluate: All Labels

In [20]:
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))

  - F1: 0.42825014312210974
  - Prec: 0.4496106745338277
  - Rec 0.41771448419590135


Save the prediction data:

In [21]:
df_dev_grouped = df_dev_grouped.rename(columns={"tag":"tag_pers_o_expected"})
df_dev_grouped.insert(len(df_dev_grouped.columns), "tag_pers_o_predicted", y_pred)
df_dev_grouped.head()

Unnamed: 0,sentence_id,token_id,sentence,tag_pers_o_expected,tag_pers_o_predicted
0,2,"[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2...","[Scope, and, Contents, :, Sermons, and, addres...","[[O], [O], [O], [O], [O], [O], [O], [O], [O], ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
1,8,"[233, 234, 235, 236, 237, 238, 239, 240, 241, ...","[James, Whyte, was, called, upon, to, preach, ...","[[B-Masculine], [I-Masculine], [O], [O], [O], ...","[B-Masculine, O, O, O, O, O, O, O, O, O, O, O,..."
2,19,"[520, 521, 522, 523, 524, 525, 526, 527, 528, ...","[Rev, Tom, Allan, 's, first, charge, was, Nort...","[[B-Masculine], [I-Masculine], [I-Masculine], ...","[B-Masculine, O, O, O, O, O, O, O, O, O, O, O,..."
3,21,"[579, 580, 581, 582, 583, 584, 585, 586, 587, ...","[In, 1953, the, "", Tell, Scotland, "", committe...","[[O], [O], [O], [O], [O], [O], [O], [O], [O], ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
4,30,"[768, 769, 770, 771, 772, 773, 774, 775, 776, ...","[Title, :, Papers, of, Rev, Prof, Alec, Campbe...","[[O], [O], [O], [O], [B-Masculine, B-Unknown],...","[O, O, O, O, O, O, B-Masculine, I-Masculine, I..."


In [22]:
df_dev_grouped = df_dev_grouped.set_index("sentence_id")
df_dev_exploded = df_dev_grouped.explode(list(df_dev_grouped.columns))
df_dev_exploded.head()

Unnamed: 0_level_0,token_id,sentence,tag_pers_o_expected,tag_pers_o_predicted
sentence_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2,16,Scope,[O],O
2,17,and,[O],O
2,18,Contents,[O],O
2,19,:,[O],O
2,20,Sermons,[O],O


In [23]:
filename = "crf_{a}_pers_o_baseline_fastText{d}_nolingfeatures_predictions.csv".format(a=a, d=d)
df_dev_exploded.to_csv(config.experiment1_output_path+filename)

#### Evaluate: Each Label

The built-in evaluation approach is strict, so unless the model predictions' labels are on text spans that exactly match the development data's test, the predicted labels will be deemed incorrect.

In [24]:
a = "arow"
category = "pers_o"
filename = "crf_{a}_{c}_baseline_fastText{d}_nolingfeatures_predictions.csv".format(a=a, c=category, d=d)
pred_perso = pd.read_csv(config.experiment1_output_path+filename)
pred_perso = utils.getColumnValuesAsLists(pred_perso, "tag_{}_expected".format(category))
# pred_pers.head()

Calculate performance metrics for each category of labels:

In [25]:
pred_perso = utils.isPredictedInExpected(pred_perso, "tag_{}_expected".format(category), "tag_{}_predicted".format(category), '_merge', 'O')
pred_perso.head()

Unnamed: 0,sentence_id,token_id,sentence,tag_pers_o_expected,tag_pers_o_predicted,_merge
0,2,16,Scope,[O],O,true negative
1,2,17,and,[O],O,true negative
2,2,18,Contents,[O],O,true negative
3,2,19,:,[O],O,true negative
4,2,20,Sermons,[O],O,true negative


In [26]:
pred_perso_stats = utils.getScoresByCatTags(
    pred_perso, "_merge", pers_o_label_subset[0], "tag_{}_expected".format(category), "tag_{}_predicted".format(category), "token_id"
)
for i in range(1, len(pers_o_label_subset)):
    tag_stats = utils.getScoresByCatTags(
        pred_perso, "_merge", pers_o_label_subset[i], "tag_{}_expected".format(category), "tag_{}_predicted".format(category), "token_id"
    )
    pred_perso_stats = pd.concat([pred_perso_stats, tag_stats])
pred_perso_stats

Unnamed: 0,tag(s),false negative,false positive,true positive,precision,recall,f1
0,B-Unknown,998,1126,1347,0.544683,0.574414,0.559153
0,I-Unknown,1872,1826,2409,0.568831,0.562719,0.565759
0,B-Feminine,47,344,358,0.509972,0.883951,0.646793
0,I-Feminine,179,591,571,0.491394,0.761333,0.59728
0,B-Masculine,240,603,905,0.600133,0.790393,0.682247
0,I-Masculine,568,662,766,0.536415,0.574213,0.554671
0,B-Occupation,420,421,582,0.580259,0.580838,0.580549
0,I-Occupation,674,561,571,0.504417,0.458635,0.480438


Save the statistics:

In [27]:
pred_perso_stats.to_csv(
    config.experiment1_agmt_path+"crf_{a}_baseline_fastText{d}_{c}_nolingfeatures_strict_agmt.csv".format(a=a, c=category, d=d)
)

#### Annotation Agreement

Calculate agreement at the annotation level, so if the model labels any word correctly from a manually annotated text span, that annotation is recorded as being correctly labeled (`true positive`).  Note whether the models' labels are an `exact_match`, `label_match`, `category_match` or `mismatch`.

Load the annotation data:

*Note: `ann_id` of `9999` indicates no annotation*

In [28]:
dev_df =  pd.read_csv(config.tokc_path+"experiment_input/token_train.csv", usecols=["sentence_id", "ann_id", "token_id", "tag"])
# dev_df.head()

Group the annotation data by token:

In [29]:
df_ann = utils.implodeDataFrame(dev_df, ["sentence_id", "ann_id", "token_id"])
df_ann = df_ann.reset_index()
# print(df_ann.shape)
# df_ann.head()

Align the columns of the dev and prediction DataFrames:

In [31]:
# Rename `sentence` column `token`
pred_perso = pred_perso.rename(columns={"sentence":"token"})
# pred_perso.head()

Join the data, adding the annotation IDs (`ann_id` column) to the prediction DataFrames:

In [32]:
index_list = ["sentence_id", "token_id"]

In [33]:
pred_perso_ann = pred_perso.join(df_ann.set_index(index_list), on=index_list, how="left")
pred_perso_ann = pred_perso_ann.drop(columns=["tag"])  # duplicate of tag_expected
assert pred_perso_ann.loc[pred_perso_ann["token_id"].isna()].shape[0] == 0
assert pred_perso_ann.loc[pred_perso_ann["ann_id"].isna()].shape[0] == 0
assert pred_perso_ann.loc[pred_perso_ann["tag_pers_o_predicted"].isna()].shape[0] == 0
assert pred_perso_ann.loc[pred_perso_ann["tag_pers_o_expected"].isna()].shape[0] == 0
# pred_perso_ann.head()

Explode the DataFrame:

In [34]:
pred_perso_ann = pred_perso_ann.explode(["tag_pers_o_expected"])

Generalize the BIO tags to label names:

In [35]:
# Get the predicted labels
pred_labels = list(pred_perso_ann["tag_{}_predicted".format(category)])
pred_labels = [label if label == "O" else label[2:] for label in pred_labels]
pred_perso_ann.insert(len(pred_perso_ann.columns), "label_{}_predicted".format(category), pred_labels)
# Get the lists of expected labels
exp_labels = list(pred_perso_ann["tag_{}_expected".format(category)])
exp_labels = [label if label == "O" else label[2:] for label in exp_labels]
pred_perso_ann.insert(len(pred_perso_ann.columns), "label_{}_expected".format(category), exp_labels)
# pred_perso_ann.head()

Group the data by annotation:

In [36]:
pred_perso_ann = pred_perso_ann.drop(columns=["tag_{}_expected".format(category), "tag_{}_predicted".format(category)])
pred_perso_ann = utils.implodeDataFrame(pred_perso_ann, ["sentence_id", "ann_id"])
pred_perso_ann = pred_perso_ann.reset_index()
pred_perso_ann.head()

Unnamed: 0,sentence_id,ann_id,token_id,token,_merge,label_pers_o_predicted,label_pers_o_expected
0,2,99999,"[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2...","[Scope, and, Contents, :, Sermons, and, addres...","[true negative, true negative, true negative, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
1,8,14387,"[233, 234]","[James, Whyte]","[true positive, false negative]","[Masculine, O]","[Masculine, Masculine]"
2,8,99999,"[235, 236, 237, 238, 239, 240, 241, 242, 243, ...","[was, called, upon, to, preach, at, the, memor...","[true negative, true negative, true negative, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
3,19,9518,[533],[he],[true negative],[O],[O]
4,19,9519,[539],[he],[true negative],[O],[O]


Record the agreements and disagreements:

In [37]:
agmt_types_perso, agmt_labels_perso = utils1.getAnnotationAgreement(pred_perso_ann, "label_pers_o_predicted", "label_pers_o_expected")

In [38]:
pred_perso_ann.insert(len(pred_perso_ann.columns), "annotation_agreement", agmt_types_perso)
pred_perso_ann.insert(len(pred_perso_ann.columns), "agreement_label", agmt_labels_perso)
pred_perso_ann.head()

Unnamed: 0,sentence_id,ann_id,token_id,token,_merge,label_pers_o_predicted,label_pers_o_expected,annotation_agreement,agreement_label
0,2,99999,"[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2...","[Scope, and, Contents, :, Sermons, and, addres...","[true negative, true negative, true negative, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...",false positive,Occupation
1,8,14387,"[233, 234]","[James, Whyte]","[true positive, false negative]","[Masculine, O]","[Masculine, Masculine]",true positive,Masculine
2,8,99999,"[235, 236, 237, 238, 239, 240, 241, 242, 243, ...","[was, called, upon, to, preach, at, the, memor...","[true negative, true negative, true negative, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...",true negative,O
3,19,9518,[533],[he],[true negative],[O],[O],true negative,O
4,19,9519,[539],[he],[true negative],[O],[O],true negative,O


In [39]:
metrics_perso_all = utils1.getAnnotationAgreementMetrics(pred_perso_ann, "all")
metrics_perso_pn = utils1.getAnnotationAgreementMetrics(pred_perso_ann.loc[~(pred_perso_ann.agreement_label.isin(["Occupation","O"]))], "Person Name")
metrics_perso_unk = utils1.getAnnotationAgreementMetrics(pred_perso_ann.loc[pred_perso_ann.agreement_label == "Unknown"], "Unknown")
metrics_perso_fem = utils1.getAnnotationAgreementMetrics(pred_perso_ann.loc[pred_perso_ann.agreement_label == "Feminine"], "Feminine")
metrics_perso_mas = utils1.getAnnotationAgreementMetrics(pred_perso_ann.loc[pred_perso_ann.agreement_label == "Masculine"], "Masculine")
metrics_perso_occ = utils1.getAnnotationAgreementMetrics(pred_perso_ann.loc[pred_perso_ann.agreement_label == "Occupation"], "Occupation")
metrics_perso = pd.concat([metrics_perso_all, metrics_perso_pn, metrics_perso_unk, metrics_perso_fem, metrics_perso_mas, metrics_perso_occ])
metrics_perso

Unnamed: 0,labels,false negative,true positive,false positive,precision,recall,f_1
0,all,3413,6556,2578,0.717758,0.657639,0.686384
0,Person Name,2750,5799,2169,0.727786,0.678325,0.702186
0,Unknown,1915,2752,1114,0.711847,0.589672,0.645025
0,Feminine,146,842,449,0.652208,0.852227,0.738921
0,Masculine,689,2205,606,0.784418,0.761921,0.773006
0,Occupation,663,757,409,0.649228,0.533099,0.58546


Save the metrics:

In [40]:
metrics_perso.to_csv(
    config.experiment1_agmt_path+"crf_{a}_baseline_fastText{d}_{c}_nolingfeatures_annot_agmt.csv".format(a=a, d=d, c=category)
)

### Loose Evaluation

As with the manual annotation evaluation, we want to evaluate the predictions more loosely, considering overlapping text spans in addition to exactly matching text spans.

#### Token Agreement

First, generalize the tokens' IOB tags to the label, and calculate agreement scores for each label.

In [45]:
pred_perso_labels = pred_perso.copy()
pred_perso_labels = pred_perso_labels.drop(columns=["_merge"])
tag_exp = list(pred_perso_labels["tag_{}_expected".format(category)])
tag_pred = list(pred_perso_labels["tag_{}_predicted".format(category)])
label_exp = [[tag if tag == "O" else tag[2:] for tag in tag_exp_list] for tag_exp_list in tag_exp]
label_pred = [tag if tag == "O" else tag[2:] for tag in tag_pred]
pred_perso_labels = pred_perso_labels.drop(columns=["tag_{}_expected".format(category), "tag_{}_predicted".format(category)])
pred_perso_labels.insert(len(pred_perso_labels.columns), "label_{}_expected".format(category), label_exp)
pred_perso_labels.insert(len(pred_perso_labels.columns), "label_{}_predicted".format(category), label_pred)
# pred_pers_labels.loc[pred_pers_labels.label_personname_predicted == "Feminine"].head()  # Looks good

Calculate the agreement metrics at the label level for each token:

In [46]:
tags = ['Unknown', 'Feminine', 'Masculine', 'Occupation']
pred_perso_labels = utils.isPredictedInExpected(pred_perso_labels, "label_{}_expected".format(category), "label_{}_predicted".format(category), '_merge', 'O')

pred_perso_stats = utils.getScoresByCatTags(
    pred_perso_labels, "_merge", tags[0], "label_{}_expected".format(category), "label_{}_predicted".format(category), "token_id"
)
for i in range(1, len(tags)):
    tag_stats = utils.getScoresByCatTags(
        pred_perso_labels, "_merge", tags[i], "label_{}_expected".format(category), "label_{}_predicted".format(category), "token_id"
    )
    pred_perso_stats = pd.concat([pred_perso_stats, tag_stats])
pred_perso_stats

Unnamed: 0,tag(s),false negative,false positive,true positive,precision,recall,f1
0,Unknown,2865,2614,4094,0.610316,0.588303,0.599107
0,Feminine,226,849,1015,0.544528,0.817889,0.653784
0,Masculine,806,1120,1816,0.618529,0.692601,0.653472
0,Occupation,1094,901,1234,0.577986,0.530069,0.552991


Combine and save the performance measures:

In [47]:
pred_perso_stats.to_csv(
    config.experiment1_agmt_path+"crf_{a}_baseline_fastText{d}_{c}_nolingfeatures_loose_agmt.csv".format(a=a, d=d, c=category)
)