# Baseline Gender Biased Sequence Classifiers with fastText

### Target: Labels

### Features: Word Embeddings

* Supervised learning
    * Train, Validate, and (Blind) Test Data: under directory `../data/token_clf_data/model_input/`
    * Prediction Data: Data: under directory `../data/token_clf_data/model_output/crf_l2sgd_baseline/`
* Sequence classification
    * 3 categories, 9 lables (2 from original annotation taxonomy weren't applied during manual annotation):
        1. Person Name: Unknown, Feminine, Masculine (Non-binary not applied during annotation)
        2. Linguistic: Generalization, Gendered Pronoun, Gendered Role
        3. Contextual: Occupation, Omission, Stereotype (Empowering only applied by one annotator and too few times for training)
    * 1 model per category
* Word embeddings
    * Custom fastText (word2vec with subwords, trained on Archives' descriptive metadata extracted in October 2020)  

***

### Table of Contents

[0.](#0) Preprocessing

[1.](#1) Models

[2.](#2) Performance Evaluation

[Appendix A](#A) Person Name Model Optimization

[Appendix B](#B) Linguistic Model Optimization

[Appendix C](#C) Contextual Model Optimization


***

Load necessary libraries:

In [1]:
# For custom functions and variables
import utils, config

# For data analysis
import pandas as pd
import numpy as np
import os, re

# For creating directories
from pathlib import Path

# For preprocessing
import scipy.stats
from gensim.models import FastText
from gensim import utils as gensim_utils

# For classification
import sklearn_crfsuite
from sklearn_crfsuite import scorers
from sklearn_crfsuite import metrics

# For evaluation
from collections import Counter
from sklearn.metrics import classification_report, make_scorer
from sklearn.metrics import confusion_matrix, multilabel_confusion_matrix, ConfusionMatrixDisplay#, plot_confusion_matrix
from sklearn.metrics import precision_recall_fscore_support, f1_score
from intervaltree import Interval, IntervalTree

<a id="0"></a>
## 0. Preprocessing

Load the train and validation (dev) data:

In [2]:
df_train = pd.read_csv(config.tokc_path+"model_input/token_train.csv", index_col=0)
df_dev = pd.read_csv(config.tokc_path+"model_input/token_validate.csv", index_col=0)
print(df_train.shape, df_dev.shape)
df_train.head()

(467564, 10) (157740, 10)


Unnamed: 0,description_id,sentence_id,ann_id,token_id,token,token_offsets,pos,tag,field,subset
3,1,1,99999,3,Title,"(17, 22)",NN,O,Title,train
4,1,1,99999,4,:,"(22, 23)",:,O,Title,train
5,1,1,99999,5,Papers,"(24, 30)",NNS,O,Title,train
6,1,1,99999,6,of,"(31, 33)",IN,O,Title,train
7,1,1,14384,7,The,"(34, 37)",DT,B-Unknown,Title,train


Drop duplicate rows with all but the same annotation ID:

In [3]:
df_train = df_train.drop(columns=["ann_id"])
df_train = df_train.drop_duplicates()
df_dev = df_dev.drop(columns=["ann_id"])
df_dev = df_dev.drop_duplicates()
print(df_train.shape, df_dev.shape)

(463441, 9) (156146, 9)


Remove Non-binary labels as these were mistaken labels identified early on that were meant to be excluded, and because only one token has this label, it prevents the data from being input into the models with cross-validation.

In [4]:
df_train = df_train.loc[df_train.tag != "B-Nonbinary"]
df_train = df_train.loc[df_train.tag != "I-Nonbinary"]

In [5]:
df_train.shape

(463439, 9)

Remove columns that won't be used as features for the classifiers and remove any duplicate rows that remain:

In [6]:
cols_to_keep = ["sentence_id", "token_id", "pos", "token", "tag"]

In [7]:
df_train = df_train[cols_to_keep]
df_train = df_train.drop_duplicates()
df_dev = df_dev[cols_to_keep]
df_dev = df_dev.drop_duplicates()
# df_train.head(20)

Create separate subsets of data for each category so they can be used with three separate models, replacing `NaN` tag values with `'O'`:

In [8]:
tags = (df_train.tag.unique())
tags.sort()
print(tags)

['B-Feminine' 'B-Gendered-Pronoun' 'B-Gendered-Role' 'B-Generalization'
 'B-Masculine' 'B-Occupation' 'B-Omission' 'B-Stereotype' 'B-Unknown'
 'I-Feminine' 'I-Gendered-Pronoun' 'I-Gendered-Role' 'I-Generalization'
 'I-Masculine' 'I-Occupation' 'I-Omission' 'I-Stereotype' 'I-Unknown' 'O']


In [9]:
df_dev = df_dev.drop_duplicates()
df_dev.tag.value_counts()

O                     141279
I-Unknown               3062
B-Unknown               1886
I-Omission              1502
I-Masculine             1240
B-Omission              1008
B-Masculine              968
I-Occupation             781
B-Gendered-Pronoun       744
I-Stereotype             721
I-Feminine               675
B-Occupation             654
B-Gendered-Role          575
B-Feminine               281
B-Stereotype             252
B-Generalization         240
I-Generalization         145
I-Gendered-Role          118
I-Gendered-Pronoun        15
Name: tag, dtype: int64

In [10]:
print(df_dev.shape)
print(len(df_dev.token_id.unique()))

(156146, 5)
152455


***
**Optional** - if selecting subset of tags

In [11]:
# ling_cat_tags = ['B-Gendered-Pronoun', 'B-Gendered-Role', 'B-Generalization', 'I-Gendered-Pronoun', 'I-Gendered-Role', 'I-Generalization']
# df_train_ling = df_train.loc[df_train.tag.isin(ling_cat_tags)]
# df_dev_ling = df_dev.loc[df_dev.tag.isin(ling_cat_tags)]
# category = "linguistic"

In [12]:
# pers_cat_tags = ['B-Feminine', 'B-Masculine', 'B-Unknown', 'I-Feminine', 'I-Masculine', 'I-Unknown']
# df_train_pers = df_train.loc[df_train.tag.isin(pers_cat_tags)]
# df_dev_pers = df_dev.loc[df_dev.tag.isin(pers_cat_tags)]

In [13]:
# cont_cat_tags = ['B-Occupation', 'B-Omission', 'B-Stereotype', 'I-Occupation', 'I-Omission', 'I-Stereotype']
# df_train_cont = df_train.loc[df_train.tag.isin(cont_cat_tags)]
# df_dev_cont = df_dev.loc[df_dev.tag.isin(cont_cat_tags)]

In [14]:
perso_cat_tags = ['B-Feminine', 'B-Masculine', 'B-Occupation', 'B-Unknown', 'I-Feminine', 'I-Masculine', 'I-Occupation', 'I-Unknown']
df_train_perso = df_train.loc[df_train.tag.isin(perso_cat_tags)]
df_dev_perso = df_dev.loc[df_dev.tag.isin(perso_cat_tags)]
category = "pers_o"

In [15]:
df_train = (df_train.drop(columns=["tag"])).drop_duplicates()
df_dev = (df_dev.drop(columns=["tag"])).drop_duplicates()

In [16]:
join_cols = ["sentence_id", "token_id", "pos", "token"]

In [17]:
# df_train_ling = df_train.join(df_train_ling.set_index(join_cols), on=join_cols, how="outer")
# df_train_ling = df_train_ling.fillna('O')
# # df_train_ling.head()
# df_dev_ling = df_dev.join(df_dev_ling.set_index(join_cols), on=join_cols, how="outer")
# df_dev_ling = df_dev_ling.fillna('O')
# df_dev_ling.head()

In [18]:
# df_train_pers = df_train.join(df_train_pers.set_index(join_cols), on=join_cols, how="outer")
# df_train_pers = df_train_pers.rename(columns={"tag":"tag_personname"})
# df_train_pers = df_train_pers.fillna('O')
# df_dev_pers = df_dev.join(df_dev_pers.set_index(join_cols), on=join_cols, how="outer")
# df_dev_pers = df_dev_pers.rename(columns={"tag":"tag_personname"})
# df_dev_pers = df_dev_pers.fillna('O')
# # df_dev_pers.head()

In [19]:
# df_train_cont = df_train.join(df_train_cont.set_index(join_cols), on=join_cols, how="outer")
# df_train_cont = df_train_cont.rename(columns={"tag":"tag_contextual"})
# df_train_cont = df_train_cont.fillna('O')
# df_dev_cont = df_dev.join(df_dev_cont.set_index(join_cols), on=join_cols, how="outer")
# df_dev_cont = df_dev_cont.rename(columns={"tag":"tag_contextual"})
# df_dev_cont = df_dev_cont.fillna('O')
# df_train_cont.head()

In [20]:
df_train_perso = df_train.join(df_train_perso.set_index(join_cols), on=join_cols, how="outer")
# df_train_perso = df_train_perso.rename(columns={"tag":"tag_personname"})
df_train_perso = df_train_perso.fillna('O')
df_dev_perso = df_dev.join(df_dev_perso.set_index(join_cols), on=join_cols, how="outer")
# df_dev_perso = df_dev_perso.rename(columns={"tag":"tag_personname"})
df_dev_perso = df_dev_perso.fillna('O')
# df_dev_pers.head()

In [21]:
# df_train = df_train_ling.drop_duplicates()
# df_dev = df_dev_ling.drop_duplicates()
df_train = df_train_perso.drop_duplicates()
df_dev = df_dev_perso.drop_duplicates()
# df_train = df_train_pers.drop_duplicates()
# df_dev = df_dev_pers.drop_duplicates()
# df_train = df_train_cont.drop_duplicates()
# df_dev = df_dev_cont.drop_duplicates()

In [22]:
df_dev.tag.value_counts()

O               144035
I-Unknown         3062
B-Unknown         1886
I-Masculine       1240
B-Masculine        968
I-Occupation       781
I-Feminine         675
B-Occupation       654
B-Feminine         281
Name: tag, dtype: int64

In [23]:
# train_dfs = [df_train_ling, df_train_pers, df_train_cont]
# dev_dfs = [df_dev_ling, df_dev_pers, df_dev_cont]
# for df in train_dfs:
#     print(df.shape[0], len(df.token_id.unique()))
# print()
# for df in dev_dfs:
#     print(df.shape[0], len(df.token_id.unique()))

***

Tokens can have multiple tags, so there are more rows than unique token IDs.  In order to pass the data into a CRF model, we need to have one tag per token, so we'll simply **take the first tag** when we extract features for each token.

#### Word Embeddings

Use the custom fastText word embeddings, trained on the entire dataset of descriptive metadata from the Archives (harvested in October 2020) using the Continuous Bag-of-Words (CBOW) algorithm.  Subword embeddings (for subwords from 2 to 6 characters long, inclusive) are used to infer the embeddings for out-of-vocabulary (OOV) words.

Use the word embedding model trained on lowercased text to 100 dimensions: 

In [24]:
# dimensions = ["50", "100", "200", "300"]
# d = dimensions[1]
# file_name = config.fasttext_path+"fasttext{}_lowercased.model".format(d)  #get_tmpfile()
# embedding_model = FastText.load(file_name)

In [25]:
# vocabulary = list(df_train.token.unique())
# vocabulary_lowercased = [token.lower() for token in vocabulary]
# vocabulary_lowercased = list(set(vocabulary_lowercased))
# print("Vocabulary size:", len(vocabulary))
# print("Lowercased vocabulary size:", len(vocabulary_lowercased))

Define feature dictionaries for baseline models, using only the word embeddings and token as features:

In [26]:
# # Get a vector representation of a token from a fastText word embedding model
# def extractEmbedding(token, fasttext_model=embedding_model):
#     if token.isalpha():
#         token = token.lower()
#     embedding = fasttext_model.wv[token]
#     return embedding

def extractTokenFeatures(sentence, i):
    token = sentence[i][0]
    pos = sentence[i][1]
    features = {
        'bias': 1.0,
        'token': token
    }
    
#     # Add each value in a token's word embedding as a separate feature
#     embedding = extractEmbedding(token)
#     for i,n in enumerate(embedding):
#         features['e{}'.format(i)] = n
    
    # Record whether a token is the first or last token of a sentence
    if i == 0:
        features['START'] = True
    elif i == (len(sentence) - 1):
        features['END'] = True
    
    return features

def extractSentenceFeatures(sentence):
    return [extractTokenFeatures(sentence, i) for i in range(len(sentence))]

def extractSentenceTargets(sentence):
    return [tag_list[0] for token, pos, tag_list in sentence]

def extractSentenceTokens(sentence):
    return [token for token, pos, tag_list in sentence]

*References:*
* *https://sklearn-crfsuite.readthedocs.io/en/latest/tutorial.html*
* *https://stackoverflow.com/questions/58736548/how-to-use-word-embedding-as-features-for-crf-sklearn-crfsuite-model-training*

<a id="1"></a>
## 1. Models

<a id="all"></a>
## All Labels

* **Features:** custom fastText embeddings
* **Target:** IOB tags
* **Algorithm:** AROW, variance=1

#### Preprocessing

In [27]:
df_train_token_groups = utils.implodeDataFrame(df_train, ['token_id', 'sentence_id', 'pos', 'token'])
df_dev_token_groups = utils.implodeDataFrame(df_dev, ['token_id', 'sentence_id', 'pos', 'token'])
df_train_token_groups = df_train_token_groups.reset_index()
df_dev_token_groups = df_dev_token_groups.reset_index()

In [28]:
print(df_dev_token_groups.shape)
df_dev_token_groups.head()

(152455, 5)


Unnamed: 0,token_id,sentence_id,pos,token,tag
0,154,5,IN,After,[O]
1,155,5,PRP$,his,[O]
2,156,5,NN,ordination,[O]
3,157,5,PRP,he,[O]
4,158,5,VBD,spent,[O]


In [29]:
df_train_grouped = utils.implodeDataFrame(df_train_token_groups, ['sentence_id'])
df_dev_grouped = utils.implodeDataFrame(df_dev_token_groups, ['sentence_id'])
df_train_grouped = df_train_grouped.rename(columns={"token":"sentence"})
df_dev_grouped = df_dev_grouped.rename(columns={"token":"sentence"})
# df_dev_grouped.head()

Zip the POS and category tags together with the tokens so each sentence item is a tuple: `(TOKEN, POS-TAG, TAG_LIST)`

In [30]:
df_train_grouped = df_train_grouped.reset_index()
df_dev_grouped = df_dev_grouped.reset_index()
train_sentences = utils.zipFeaturesAndTarget(df_train_grouped, "tag")
dev_sentences = utils.zipFeaturesAndTarget(df_dev_grouped, "tag")

In [31]:
# Features
X_train = [extractSentenceFeatures(sentence) for sentence in train_sentences]
X_dev = [extractSentenceFeatures(sentence) for sentence in dev_sentences]
# Target
y_train = [extractSentenceTargets(sentence) for sentence in train_sentences]
y_dev = [extractSentenceTargets(sentence) for sentence in dev_sentences]

**From Optimization of Baseline Model:** arow with variance=1 was best-performing algorithm/parameter combination.

#### Train

Train a Conditional Random Field (CRF) model with 50 maximum iterations.

In [32]:
clf = sklearn_crfsuite.CRF(algorithm='arow', variance=1, max_iterations=50, all_possible_transitions=True)

In [33]:
# https://stackoverflow.com/questions/66059532/attributeerror-crf-object-has-no-attribute-keep-tempfiles
try:
    clf.fit(X_train, y_train)
except AttributeError:
    pass

Remove `'O'` tags from the targets list since we are interested in the ability to apply the gendered and gender biased language related tags, and the `'O'` tags far outnumber the tags for gendered and gender biased language.

In [34]:
targets = list(clf.classes_)
targets.remove('O')
print(targets)

['B-Unknown', 'I-Unknown', 'I-Masculine', 'B-Masculine', 'B-Occupation', 'I-Occupation', 'B-Feminine', 'I-Feminine']


#### Predict

In [35]:
y_pred = clf.predict(X_dev)

#### Evaluate

##### Summary (with O label)

In [36]:
# Evaluate
print(clf.algorithm, clf.c1, clf.c2, clf.pa_type, clf.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 1
  Macro:
  - F1: 0.5231534463290515
  - Prec: 0.566404777718956
  - Rec 0.48959059685433803
  Micro:
  - F1: 0.5031209362808843
  - Prec: 0.5558908045977011
  - Rec 0.45950118764845604
  Per Label:
  - F1: [0.49798387 0.50869389 0.37206879 0.36728625 0.62697023 0.55892731
 0.62753036 0.62576687]
  - Prec: [0.57087827 0.58133087 0.378579   0.3964687  0.72469636 0.61971831
 0.5984556  0.66111111]
  - Rec [0.44159714 0.45219267 0.36577869 0.34210526 0.55246914 0.50899743
 0.65957447 0.59400998]


##### Strict Evaluation

In [37]:
df_dev_grouped = df_dev_grouped.rename(columns={"tag":"tag_expected"})
df_dev_grouped.insert(len(df_dev_grouped.columns), "tag_predicted", y_pred)
# df_dev_grouped.head()

In [38]:
df_dev_grouped = df_dev_grouped.set_index(["sentence_id"])
df_dev_exploded = df_dev_grouped.explode(list(df_dev_grouped.columns))
# df_dev_exploded.head()
df_dev_exploded = df_dev_exploded.explode(["tag_expected"])
df_dev_exploded.head()

Unnamed: 0_level_0,token_id,pos,sentence,tag_expected,tag_predicted
sentence_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
5,154,IN,After,O,O
5,155,PRP$,his,O,O
5,156,NN,ordination,O,O
5,157,PRP,he,O,O
5,158,VBD,spent,O,O


In [39]:
df_dev_exploded = df_dev_exploded.fillna("O")

In [40]:
df_dev_exploded = df_dev_exploded.rename(columns={"sentence":"token"})

In [41]:
exp_df = df_dev_exploded.drop(columns=["tag_predicted"]).reset_index()
pred_df = df_dev_exploded.drop(columns=["tag_expected"]).reset_index()
# pred_df.head()

In [42]:
eval_df = utils.makeEvaluationDataFrame(
    exp_df, 
    pred_df, 
    ["sentence_id", "token_id", "token", "pos", "tag_expected"],   # left on
    ["sentence_id", "token_id", "token", "pos", "tag_predicted"],  # right on
    ["sentence_id", "token_id", "token", "pos", "tag_expected", "tag_predicted", "_merge"],  # final column list
    "tag_expected",
    "tag_predicted", 
    "token_id",  # ID column
    "O"          # No tag value
)
eval_df.head()

Unnamed: 0,sentence_id,token_id,token,pos,tag_expected,tag_predicted,_merge
0,5,154,After,IN,O,O,true negative
1,5,155,his,PRP$,O,O,true negative
2,5,156,ordination,NN,O,O,true negative
3,5,157,he,PRP,O,O,true negative
4,5,158,spent,VBD,O,O,true negative


In [43]:
print(eval_df.shape)
eval_df = eval_df.drop_duplicates()
print(eval_df.shape)

(155606, 7)
(154479, 7)


In [44]:
# filename = "crf_arow_var1_{c}_baseline_fastText{d}_predictions.csv".format(d=d, c=category)
filename = "crf_arow_var1_{c}_baseline_no_embeddings_predictions.csv".format(c=category)
eval_df.to_csv(config.tokc_path+"sequence_model_output/"+filename)

Calculate precision, recall, and F1 score at the token level for each tag:

In [45]:
if category == "linguistic":
    targets = ['B-Gendered-Pronoun', 'I-Gendered-Pronoun', 'B-Gendered-Role', 'I-Gendered-Role', 'B-Generalization', 'I-Generalization']
elif category == "pers_o":
    targets = ['B-Feminine', 'I-Feminine', 'B-Masculine', 'I-Masculine', 'B-Unknown', 'I-Unknown', 'B-Occupation', 'I-Occupation']
else:
    print(category+" not recognized as a category.")

In [46]:
agmt_stats = pd.DataFrame()
for tag in targets:
#     getScoresByTags(df, eval_col, tags, exp_col="expected_tag", pred_col="predicted_tag"):
    tag_agmt_stats = utils.getScoresByTags(eval_df, "_merge", [tag], exp_col="tag_expected", pred_col="tag_predicted")
    agmt_stats = pd.concat([agmt_stats, tag_agmt_stats])
agmt_stats

Unnamed: 0,tag(s),false negative,false positive,true negative,true positive,precision,recall,f1
0,B-Feminine,52,51,0,358,0.875306,0.873171,0.874237
0,I-Feminine,121,50,0,810,0.94186,0.870032,0.904523
0,B-Masculine,349,155,0,646,0.806492,0.649246,0.719376
0,I-Masculine,475,267,0,898,0.770815,0.654042,0.707644
0,B-Unknown,375,219,0,1642,0.882321,0.81408,0.846828
0,I-Unknown,590,309,0,2746,0.898854,0.823141,0.859333
0,B-Occupation,28,21,0,722,0.971736,0.962667,0.96718
0,I-Occupation,34,21,0,792,0.97417,0.958838,0.966443


Save the statistics:

In [47]:
filename = "crf_arow_var1_{c}_baseline_no_embeddings_agreement.csv".format(c=category)
agmt_stats.to_csv(config.tokc_path+"sequence_model_performance/"+filename)

<a id="A"></a>
## Appendix A: Person Name Model Optimization

#### Optimization

Look for the highest-performing (based on F1 score) models by trying different algorithms and parameters.  Algorithms vailable with sklearn_crfsuite are:
 * 'lbfgs' - Gradient descent using the L-BFGS method
 * 'l2sgd' - Stochastic Gradient Descent with L2 regularization term
 * 'ap' - Averaged Perceptron
 * 'pa' - Passive Aggressive (PA)
 * 'arow' - Adaptive Regularization Of Weight Vector (AROW)

In [116]:
algorithms = ['lbfgs', 'l2sgd', 'ap', 'pa', 'arow']
max_iters=50

In [41]:
# df_train["tag_personname"].unique()
targets = [
        'B-Unknown', 'I-Unknown', 'B-Feminine',
        'I-Feminine', 'B-Masculine', 'I-Masculine'
]
# f1_scorer = make_scorer(
#     metrics.flat_f1_score, average='None', 
#     labels=targets
# )

In [42]:
crf0A = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0, max_iterations=max_iters, all_possible_transitions=True) # unlimited iterations
crf0B = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, max_iterations=max_iters, all_possible_transitions=True)
crf0C = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf1A = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=1.0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 1000
crf1B = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf2 = sklearn_crfsuite.CRF(algorithm=algorithms[2], max_iterations=max_iters, all_possible_transitions=True) # max iters: 100

crf3A = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf3B = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=1, max_iterations=max_iters, all_possible_transitions=True)
crf3C = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=2, max_iterations=max_iters, all_possible_transitions=True)

crf4A = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=1, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf4B = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=0.5, max_iterations=max_iters, all_possible_transitions=True)

In [43]:
try:
    crf0A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0A.predict(X_dev)

# Evaluate
print(crf0A.algorithm, crf0A.c1, crf0A.c2, crf0A.pa_type, crf0A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0 None None None
  Weighted:
  - F1: 0.25563333936374655
  - Prec: 0.48241170869152883
  - Rec 0.17545363623374768
  Unweighted:
  - F1: [0.21609604 0.24892487 0.40752351 0.28761651 0.32380952 0.23603462]
  - Prec: [0.42631579 0.42087254 0.77380952 0.72       0.51987768 0.51020408]
  - Rec [0.14472901 0.17672414 0.27659574 0.1797005  0.2351314  0.15353122]


In [44]:
try:
    crf0B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0B.predict(X_dev)

# Evaluate
print(crf0B.algorithm, crf0B.c1, crf0B.c2, crf0B.pa_type, crf0B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 None None None
  Weighted:
  - F1: 0.3952936125163706
  - Prec: 0.6164456508900145
  - Rec 0.2961851693099014
  Unweighted:
  - F1: [0.34542314 0.37176232 0.62222222 0.53304904 0.43134087 0.38205128]
  - Prec: [0.62794349 0.63431542 0.74117647 0.74183976 0.5184466  0.51114923]
  - Rec [0.23823705 0.26293103 0.53617021 0.41597338 0.36929461 0.30501535]


In [45]:
try:
    crf0C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0C.predict(X_dev)

# Evaluate
print(crf0C.algorithm, crf0C.c1, crf0C.c2, crf0C.pa_type, crf0C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 0.2 None None
  Weighted:
  - F1: 0.4505459769939627
  - Prec: 0.6028142561062638
  - Rec 0.36133733390484357
  Unweighted:
  - F1: [0.45994065 0.48070953 0.63333333 0.5320911  0.38327526 0.30410184]
  - Prec: [0.60963618 0.62804171 0.71891892 0.70410959 0.51764706 0.49199085]
  - Rec [0.36926742 0.38936782 0.56595745 0.42762063 0.30428769 0.22006141]


In [46]:
try:
    crf1A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1A.predict(X_dev)

# Evaluate
print(crf1A.algorithm, crf1A.c1, crf1A.c2, crf1A.pa_type, crf1A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 1.0 None None
  Weighted:
  - F1: 0.39947172781326473
  - Prec: 0.5390631041498564
  - Rec 0.31975996570938703
  Unweighted:
  - F1: [0.37020484 0.37389855 0.5990566  0.5520728  0.44057052 0.35034657]
  - Prec: [0.49403579 0.55477032 0.67195767 0.70360825 0.51576994 0.4557377 ]
  - Rec [0.29600953 0.28196839 0.54042553 0.45424293 0.38450899 0.28454452]


In [47]:
try:
    crf1B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1B.predict(X_dev)

# Evaluate
print(crf1B.algorithm, crf1B.c1, crf1B.c2, crf1B.pa_type, crf1B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 0.2 None None
  Weighted:
  - F1: 0.28237156184801626
  - Prec: 0.6289538389122998
  - Rec 0.19402771824546364
  Unweighted:
  - F1: [0.26691042 0.26218487 0.63341646 0.54115226 0.31621349 0.09779482]
  - Prec: [0.57367387 0.59541985 0.76506024 0.70889488 0.58148148 0.77272727]
  - Rec [0.17391304 0.16810345 0.54042553 0.43760399 0.21715076 0.05220061]


In [48]:
try:
    crf2.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf2.predict(X_dev)

# Evaluate
print(crf2.algorithm, crf2.c1, crf2.c2, crf2.pa_type, crf2.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

ap None None None None
  Weighted:
  - F1: 0.43418543421107464
  - Prec: 0.6506056343043833
  - Rec 0.33190455779397054
  Unweighted:
  - F1: [0.41818182 0.42871094 0.6367713  0.60142712 0.46962233 0.29945694]
  - Prec: [0.62162162 0.66920732 0.67298578 0.77631579 0.57777778 0.61858974]
  - Rec [0.31506849 0.31537356 0.60425532 0.49084859 0.395574   0.1975435 ]


In [49]:
try:
    crf3A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3A.predict(X_dev)

# Evaluate
print(crf3A.algorithm, crf3A.c1, crf3A.c2, crf3A.pa_type, crf3A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 0 None
  Weighted:
  - F1: 0.46544025425232854
  - Prec: 0.6526455542987322
  - Rec 0.36676668095442205
  Unweighted:
  - F1: [0.46613697 0.48264984 0.66350711 0.59205021 0.45128205 0.30015552]
  - Prec: [0.63900415 0.64752116 0.7486631  0.7971831  0.59060403 0.62459547]
  - Rec [0.36688505 0.38469828 0.59574468 0.47088186 0.36514523 0.1975435 ]


In [50]:
try:
    crf3B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3B.predict(X_dev)

# Evaluate
print(crf3B.algorithm, crf3B.c1, crf3B.c2, crf3B.pa_type, crf3B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 1 None
  Weighted:
  - F1: 0.46124270643488524
  - Prec: 0.6487355256491798
  - Rec 0.36233747678239747
  Unweighted:
  - F1: [0.46369138 0.47971145 0.66019417 0.58029979 0.44633731 0.29434547]
  - Prec: [0.63523316 0.6440678  0.76836158 0.81381381 0.58093126 0.60509554]
  - Rec [0.36509827 0.38218391 0.5787234  0.45091514 0.36237898 0.19447288]


In [51]:
try:
    crf3C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3C.predict(X_dev)

# Evaluate
print(crf3C.algorithm, crf3C.c1, crf3C.c2, crf3C.pa_type, crf3C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 2 None
  Weighted:
  - F1: 0.4648015025167207
  - Prec: 0.6483430407678211
  - Rec 0.36748106872410347
  Unweighted:
  - F1: [0.46373544 0.48826291 0.65876777 0.58201058 0.4467354  0.29439252]
  - Prec: [0.62830957 0.64653641 0.74331551 0.7994186  0.58956916 0.61563518]
  - Rec [0.36748064 0.39224138 0.59148936 0.45757072 0.35961272 0.19344933]


In [52]:
try:
    crf4A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4A.predict(X_dev)

# Evaluate
print(crf4A.algorithm, crf4A.c1, crf4A.c2, crf4A.pa_type, crf4A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 1
  Weighted:
  - F1: 0.4815200566676591
  - Prec: 0.5175284691788858
  - Rec 0.46506643806258036
  Unweighted:
  - F1: [0.45074415 0.48698438 0.6437247  0.59171598 0.51690294 0.38585209]
  - Prec: [0.55643045 0.55022624 0.61389961 0.60137457 0.42664266 0.35      ]
  - Rec [0.3787969  0.43678161 0.67659574 0.58236273 0.65560166 0.42988741]


In [53]:
try:
    crf4B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4B.predict(X_dev)

# Evaluate
print(crf4B.algorithm, crf4B.c1, crf4B.c2, crf4B.pa_type, crf4B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 0.5
  Weighted:
  - F1: 0.4947955715164867
  - Prec: 0.5537045694622348
  - Rec 0.44920702957565367
  Unweighted:
  - F1: [0.50312809 0.51784329 0.65154639 0.62031107 0.41166937 0.36140135]
  - Prec: [0.56259205 0.56281619 0.632      0.68902439 0.49706458 0.45230769]
  - Rec [0.45503276 0.47952586 0.67234043 0.5640599  0.35131397 0.30092119]


**Best model:** arow with variance=0.5

<a id="B"></a>
## Appendix B: Linguistic Model Optimization

#### Optimization

Look for the highest-performing (based on F1 score) models by trying different algorithms and parameters.  Algorithms vailable with sklearn_crfsuite are:
 * 'lbfgs' - Gradient descent using the L-BFGS method
 * 'l2sgd' - Stochastic Gradient Descent with L2 regularization term
 * 'ap' - Averaged Perceptron
 * 'pa' - Passive Aggressive (PA)
 * 'arow' - Adaptive Regularization Of Weight Vector (AROW)

In [100]:
algorithms = ['lbfgs', 'l2sgd', 'ap', 'pa', 'arow']
max_iters=50

In [29]:
# df_train["tag_linguistic"].unique()
targets = [
        'B-Gendered-Pronoun', 'I-Gendered-Pronoun', 'B-Generalization',
        'I-Generalization', 'B-Gendered-Role', 'I-Gendered-Role'
]
f1_scorer = make_scorer(
    metrics.flat_f1_score, average='None', 
    labels=targets
)

In [30]:
crf0A = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0, max_iterations=max_iters, all_possible_transitions=True) # unlimited iterations
crf0B = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, max_iterations=max_iters, all_possible_transitions=True)
crf0C = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf1A = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=1.0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 1000
crf1B = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf2 = sklearn_crfsuite.CRF(algorithm=algorithms[2], max_iterations=max_iters, all_possible_transitions=True) # max iters: 100

crf3A = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf3B = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=1, max_iterations=max_iters, all_possible_transitions=True)
crf3C = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=2, max_iterations=max_iters, all_possible_transitions=True)

crf4A = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=1, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf4B = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=0.5, max_iterations=max_iters, all_possible_transitions=True)

In [31]:
try:
    crf0A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0A.predict(X_dev)

# Evaluate
print(crf0A.algorithm, crf0A.c1, crf0A.c2, crf0A.pa_type, crf0A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0 None None None
  Weighted:
  - F1: 0.5220463286505957
  - Prec: 0.6424500624544645
  - Rec 0.4922135706340378
  Unweighted:
  - F1: [0.85307443 0.         0.00892857 0.         0.48390942 0.29530201]
  - Prec: [0.81559406 0.         0.25       0.         0.74087591 0.6875    ]
  - Rec [0.89416554 0.         0.00454545 0.         0.35929204 0.18803419]


In [32]:
try:
    crf0B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0B.predict(X_dev)

# Evaluate
print(crf0B.algorithm, crf0B.c1, crf0B.c2, crf0B.pa_type, crf0B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 None None None
  Weighted:
  - F1: 0.6083674858352446
  - Prec: 0.7689434393136277
  - Rec 0.60734149054505
  Unweighted:
  - F1: [0.87484511 0.         0.05286344 0.05298013 0.67586207 0.40993789]
  - Prec: [0.8050171  0.         0.85714286 0.57142857 0.76222222 0.75      ]
  - Rec [0.95793758 0.         0.02727273 0.02777778 0.60707965 0.28205128]


In [33]:
try:
    crf0C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0C.predict(X_dev)

# Evaluate
print(crf0C.algorithm, crf0C.c1, crf0C.c2, crf0C.pa_type, crf0C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 0.2 None None
  Weighted:
  - F1: 0.6427339974408373
  - Prec: 0.7651587413557269
  - Rec 0.6218020022246941
  Unweighted:
  - F1: [0.85677912 0.         0.28679245 0.15662651 0.6903164  0.41463415]
  - Prec: [0.80695444 0.         0.84444444 0.59090909 0.75313808 0.72340426]
  - Rec [0.91316147 0.         0.17272727 0.09027778 0.63716814 0.29059829]


In [34]:
try:
    crf1A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1A.predict(X_dev)

# Evaluate
print(crf1A.algorithm, crf1A.c1, crf1A.c2, crf1A.pa_type, crf1A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 1.0 None None
  Weighted:
  - F1: 0.5811224023583894
  - Prec: 0.6272431558647711
  - Rec 0.5817575083426029
  Unweighted:
  - F1: [0.88389058 0.         0.         0.         0.62406816 0.34899329]
  - Prec: [0.80066079 0.         0.         0.         0.78342246 0.8125    ]
  - Rec [0.98643148 0.         0.         0.         0.51858407 0.22222222]


In [35]:
try:
    crf1B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1B.predict(X_dev)

# Evaluate
print(crf1B.algorithm, crf1B.c1, crf1B.c2, crf1B.pa_type, crf1B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 0.2 None None
  Weighted:
  - F1: 0.517290080102963
  - Prec: 0.7148552332736035
  - Rec 0.514460511679644
  Unweighted:
  - F1: [0.88242424 0.         0.00904977 0.         0.41344956 0.37735849]
  - Prec: [0.7973713  0.         1.         0.         0.69747899 0.71428571]
  - Rec [0.98778833 0.         0.00454545 0.         0.29380531 0.25641026]


In [36]:
try:
    crf2.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf2.predict(X_dev)

# Evaluate
print(crf2.algorithm, crf2.c1, crf2.c2, crf2.pa_type, crf2.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

ap None None None None
  Weighted:
  - F1: 0.6014970716276341
  - Prec: 0.8224996619695852
  - Rec 0.5912124582869855
  Unweighted:
  - F1: [0.87876923 0.64       0.07017544 0.10457516 0.63731656 0.28767123]
  - Prec: [0.80405405 0.8        1.         0.88888889 0.781491   0.72413793]
  - Rec [0.9687924  0.53333333 0.03636364 0.05555556 0.5380531  0.17948718]


In [31]:
try:
    crf3A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3A.predict(X_dev)

# Evaluate
print(crf3A.algorithm, crf3A.c1, crf3A.c2, crf3A.pa_type, crf3A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 0 None
  Weighted:
  - F1: 0.6608095292949409
  - Prec: 0.7676344251172861
  - Rec 0.6551724137931034
  Unweighted:
  - F1: [0.88456865 0.69230769 0.29104478 0.23255814 0.67378641 0.40697674]
  - Prec: [0.80088009 0.81818182 0.8125     0.71428571 0.74623656 0.63636364]
  - Rec [0.98778833 0.6        0.17727273 0.13888889 0.61415929 0.2991453 ]


In [32]:
try:
    crf3B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3B.predict(X_dev)

# Evaluate
print(crf3B.algorithm, crf3B.c1, crf3B.c2, crf3B.pa_type, crf3B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 1 None
  Weighted:
  - F1: 0.6599815730795738
  - Prec: 0.762389078290318
  - Rec 0.6490545050055617
  Unweighted:
  - F1: [0.87945879 0.69230769 0.29927007 0.24277457 0.67249757 0.40462428]
  - Prec: [0.80427447 0.81818182 0.75925926 0.72413793 0.74568966 0.625     ]
  - Rec [0.97014925 0.6        0.18636364 0.14583333 0.61238938 0.2991453 ]


In [33]:
try:
    crf3C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3C.predict(X_dev)

# Evaluate
print(crf3C.algorithm, crf3C.c1, crf3C.c2, crf3C.pa_type, crf3C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 2 None
  Weighted:
  - F1: 0.6597046279568559
  - Prec: 0.7595230166063811
  - Rec 0.6490545050055617
  Unweighted:
  - F1: [0.87730061 0.69230769 0.30434783 0.25142857 0.67120623 0.4       ]
  - Prec: [0.80067189 0.81818182 0.75       0.70967742 0.74514039 0.64150943]
  - Rec [0.97014925 0.6        0.19090909 0.15277778 0.61061947 0.29059829]


In [42]:
try:
    crf4A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4A.predict(X_dev)

# Evaluate
print(crf4A.algorithm, crf4A.c1, crf4A.c2, crf4A.pa_type, crf4A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 1
  Weighted:
  - F1: 0.6586210838588668
  - Prec: 0.6726122299605511
  - Rec 0.664071190211346
  Unweighted:
  - F1: [0.87015385 0.54545455 0.28490028 0.27237354 0.66475645 0.48913043]
  - Prec: [0.79617117 0.5        0.38167939 0.30973451 0.7219917  0.67164179]
  - Rec [0.95929444 0.6        0.22727273 0.24305556 0.6159292  0.38461538]


In [43]:
try:
    crf4B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4B.predict(X_dev)

# Evaluate
print(crf4B.algorithm, crf4B.c1, crf4B.c2, crf4B.pa_type, crf4B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 0.5
  Weighted:
  - F1: 0.6488288150410274
  - Prec: 0.6222010099532319
  - Rec 0.6846496106785317
  Unweighted:
  - F1: [0.86294416 0.69230769 0.29045643 0.21761658 0.67236955 0.38541667]
  - Prec: [0.81048868 0.81818182 0.26717557 0.17355372 0.65066225 0.49333333]
  - Rec [0.92265943 0.6        0.31818182 0.29166667 0.69557522 0.31623932]


**Best model:** pa with pa_type=0; arow with variance=0.5 also has strong performance, and is strongest with other categories, Person Name and Contextual.

<a id="C"></a>
## Appendix C: Contextual Model Optimization

#### Optimization

Look for the highest-performing (based on F1 score) models by trying different algorithms and parameters.  Algorithms vailable with sklearn_crfsuite are:
 * 'lbfgs' - Gradient descent using the L-BFGS method
 * 'l2sgd' - Stochastic Gradient Descent with L2 regularization term
 * 'ap' - Averaged Perceptron
 * 'pa' - Passive Aggressive (PA)
 * 'arow' - Adaptive Regularization Of Weight Vector (AROW)

In [78]:
algorithms = ['lbfgs', 'l2sgd', 'ap', 'pa', 'arow']
max_iters=50

In [79]:
# df_train["tag_contextual"].unique()
targets = [
        'B-Occupation', 'I-Occupation', 'B-Stereotype',
        'I-Stereotype', 'B-Omission', 'I-Omission'
]
f1_scorer = make_scorer(
    metrics.flat_f1_score, average='None', 
    labels=targets
)

In [80]:
crf0A = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0, max_iterations=max_iters, all_possible_transitions=True) # unlimited iterations
crf0B = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, max_iterations=max_iters, all_possible_transitions=True)
crf0C = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf1A = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=1.0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 1000
crf1B = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf2 = sklearn_crfsuite.CRF(algorithm=algorithms[2], max_iterations=max_iters, all_possible_transitions=True) # max iters: 100

crf3A = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf3B = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=1, max_iterations=max_iters, all_possible_transitions=True)
crf3C = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=2, max_iterations=max_iters, all_possible_transitions=True)

crf4A = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=1, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf4B = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=0.5, max_iterations=max_iters, all_possible_transitions=True)

In [81]:
try:
    crf0A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0A.predict(X_dev)

# Evaluate
print(crf0A.algorithm, crf0A.c1, crf0A.c2, crf0A.pa_type, crf0A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0 None None None
  Weighted:
  - F1: 0.07526574269976748
  - Prec: 0.3419825203723303
  - Rec 0.04234527687296417
  Unweighted:
  - F1: [0.         0.         0.         0.         0.18710263 0.11749681]
  - Prec: [0.         0.         0.         0.         0.76865672 0.58974359]
  - Rec [0.         0.         0.         0.         0.10651499 0.06524823]


In [82]:
try:
    crf0B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0B.predict(X_dev)

# Evaluate
print(crf0B.algorithm, crf0B.c1, crf0B.c2, crf0B.pa_type, crf0B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 None None None
  Weighted:
  - F1: 0.27203523227916193
  - Prec: 0.6341356146479743
  - Rec 0.1780673181324647
  Unweighted:
  - F1: [0.27612903 0.25559105 0.11299435 0.16603774 0.48888889 0.1993205 ]
  - Prec: [0.75352113 0.65934066 0.58823529 0.56410256 0.79672897 0.49438202]
  - Rec [0.16903633 0.15852048 0.0625     0.09734513 0.35263702 0.1248227 ]


In [83]:
try:
    crf0C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0C.predict(X_dev)

# Evaluate
print(crf0C.algorithm, crf0C.c1, crf0C.c2, crf0C.pa_type, crf0C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 0.2 None None
  Weighted:
  - F1: 0.36974453222807907
  - Prec: 0.6650245193879263
  - Rec 0.2625407166123778
  Unweighted:
  - F1: [0.47764449 0.42293907 0.16304348 0.22738386 0.53013699 0.27465536]
  - Prec: [0.77112676 0.65738162 0.625      0.66428571 0.78498986 0.54411765]
  - Rec [0.34597156 0.31175694 0.09375    0.13716814 0.40020683 0.18368794]


In [84]:
try:
    crf1A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1A.predict(X_dev)

# Evaluate
print(crf1A.algorithm, crf1A.c1, crf1A.c2, crf1A.pa_type, crf1A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 1.0 None None
  Weighted:
  - F1: 0.14238250038533273
  - Prec: 0.6949935082632666
  - Rec 0.08534201954397394
  Unweighted:
  - F1: [0.01253918 0.01308901 0.02325581 0.05890603 0.33637117 0.1907061 ]
  - Prec: [0.8        0.71428571 0.16666667 0.6        0.84583333 0.63967611]
  - Rec [0.00631912 0.00660502 0.0125     0.03097345 0.20992761 0.11205674]


In [85]:
try:
    crf1B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1B.predict(X_dev)

# Evaluate
print(crf1B.algorithm, crf1B.c1, crf1B.c2, crf1B.pa_type, crf1B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 0.2 None None
  Weighted:
  - F1: 0.23290156335890247
  - Prec: 0.587957004746522
  - Rec 0.16547231270358306
  Unweighted:
  - F1: [0.03703704 0.06532663 0.17894737 0.16252822 0.51733333 0.25569358]
  - Prec: [0.8        0.66666667 0.56666667 0.34615385 0.72795497 0.47318008]
  - Rec [0.01895735 0.0343461  0.10625    0.10619469 0.40124095 0.1751773 ]


In [86]:
try:
    crf2.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf2.predict(X_dev)

# Evaluate
print(crf2.algorithm, crf2.c1, crf2.c2, crf2.pa_type, crf2.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

ap None None None None
  Weighted:
  - F1: 0.22686387330210564
  - Prec: 0.8443018920394695
  - Rec 0.14505971769815418
  Unweighted:
  - F1: [0.26168224 0.20303384 0.02453988 0.03478261 0.49893086 0.15275995]
  - Prec: [0.84482759 0.87       0.66666667 1.         0.80275229 0.80405405]
  - Rec [0.15481833 0.11492734 0.0125     0.01769912 0.36194416 0.08439716]


In [87]:
try:
    crf3A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3A.predict(X_dev)

# Evaluate
print(crf3A.algorithm, crf3A.c1, crf3A.c2, crf3A.pa_type, crf3A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 0 None
  Weighted:
  - F1: 0.306310049934223
  - Prec: 0.80194162836387
  - Rec 0.20781758957654722
  Unweighted:
  - F1: [0.45810056 0.32640333 0.08284024 0.04329004 0.5375603  0.22061483]
  - Prec: [0.78244275 0.76585366 0.77777778 1.         0.80578512 0.73493976]
  - Rec [0.32385466 0.20739762 0.04375    0.02212389 0.4033092  0.12978723]


In [88]:
try:
    crf3B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3B.predict(X_dev)

# Evaluate
print(crf3B.algorithm, crf3B.c1, crf3B.c2, crf3B.pa_type, crf3B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 1 None
  Weighted:
  - F1: 0.30799201369715534
  - Prec: 0.8058167663636654
  - Rec 0.20868621064060802
  Unweighted:
  - F1: [0.45240761 0.32432432 0.08284024 0.04892086 0.5399449  0.22543701]
  - Prec: [0.77692308 0.76097561 0.77777778 1.         0.80824742 0.75100402]
  - Rec [0.31911532 0.20607662 0.04375    0.02507375 0.40537746 0.13262411]


In [89]:
try:
    crf3C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3C.predict(X_dev)

# Evaluate
print(crf3C.algorithm, crf3C.c1, crf3C.c2, crf3C.pa_type, crf3C.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 2 None
  Weighted:
  - F1: 0.30804522569488546
  - Prec: 0.809491042440604
  - Rec 0.20912052117263843
  Unweighted:
  - F1: [0.46784922 0.32398754 0.08333333 0.04610951 0.53830228 0.22128174]
  - Prec: [0.78438662 0.75728155 0.875      1.         0.80912863 0.75      ]
  - Rec [0.33333333 0.20607662 0.04375    0.02359882 0.4033092  0.12978723]


In [90]:
try:
    crf4A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4A.predict(X_dev)

# Evaluate
print(crf4A.algorithm, crf4A.c1, crf4A.c2, crf4A.pa_type, crf4A.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 1
  Weighted:
  - F1: 0.36669929570944
  - Prec: 0.40929601777426833
  - Rec 0.3355048859934853
  Unweighted:
  - F1: [0.6036036  0.5095057  0.24512535 0.24287653 0.37733645 0.24971537]
  - Prec: [0.70230608 0.60035842 0.22110553 0.22487437 0.43355705 0.26857143]
  - Rec [0.52922591 0.44253633 0.275      0.2640118  0.33402275 0.23333333]


In [91]:
try:
    crf4B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4B.predict(X_dev)

# Evaluate
print(crf4B.algorithm, crf4B.c1, crf4B.c2, crf4B.pa_type, crf4B.variance)
print("  Weighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="weighted", zero_division=0, labels=targets))
print("  Unweighted:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 0.5
  Weighted:
  - F1: 0.3925447072732632
  - Prec: 0.43471962523819374
  - Rec 0.36503800217155263
  Unweighted:
  - F1: [0.57769653 0.50433526 0.21782178 0.22643746 0.42546064 0.32653061]
  - Prec: [0.68546638 0.55661882 0.18032787 0.18929633 0.46237864 0.38461538]
  - Rec [0.49921011 0.46103038 0.275      0.28171091 0.39400207 0.28368794]


**Best model:** arow with variance=0.5

<a id="A"></a>
## Appendix D: Person Name + Occupation Model Optimization

#### Optimization

Look for the highest-performing (based on F1 score) models by trying different algorithms and parameters.  Algorithms vailable with sklearn_crfsuite are:
 * 'lbfgs' - Gradient descent using the L-BFGS method
 * 'l2sgd' - Stochastic Gradient Descent with L2 regularization term
 * 'ap' - Averaged Perceptron
 * 'pa' - Passive Aggressive (PA)
 * 'arow' - Adaptive Regularization Of Weight Vector (AROW)

#### Preprocessing

In [41]:
df_train = df_train_perso
df_dev = df_dev_perso

Group the data by token, so the all the tags for one token are recorded in a list for that token's row:

In [42]:
df_train_token_groups = utils.implodeDataFrame(df_train, ['token_id', 'sentence_id', 'pos', 'token'])
df_dev_token_groups = utils.implodeDataFrame(df_dev, ['token_id', 'sentence_id', 'pos', 'token'])
df_train_token_groups = df_train_token_groups.reset_index()
df_dev_token_groups = df_dev_token_groups.reset_index()

Group the data by sentence, where each sentence is a list of tokens:

In [43]:
df_train_grouped = utils.implodeDataFrame(df_train_token_groups, ['sentence_id'])
df_dev_grouped = utils.implodeDataFrame(df_dev_token_groups, ['sentence_id'])
df_train_grouped = df_train_grouped.rename(columns={"token":"sentence"})
df_dev_grouped = df_dev_grouped.rename(columns={"token":"sentence"})
# df_dev_grouped.head()

Zip the POS and category tags together with the tokens so each sentence item is a tuple: `(TOKEN, POS-TAG, TAG_LIST)`

In [44]:
df_dev_perso.tag_personname.value_counts()

O               144035
I-Unknown         3062
B-Unknown         1886
I-Masculine       1240
B-Masculine        968
I-Occupation       781
I-Feminine         675
B-Occupation       654
B-Feminine         281
Name: tag_personname, dtype: int64

In [46]:
df_train_grouped = df_train_grouped.reset_index()
df_dev_grouped = df_dev_grouped.reset_index()
train_sentences = utils.zipFeaturesAndTarget(df_train_grouped, "tag_personname")
print(train_sentences_ling[0][:3])
dev_sentences = utils.zipFeaturesAndTarget(df_dev_grouped, "tag_personname")
print(dev_sentences[0][:3])

[('Title', 'NN', ['O']), (':', ':', ['O']), ('Papers', 'NNS', ['O'])]
[('After', 'IN', ['O']), ('his', 'PRP$', ['O']), ('ordination', 'NN', ['O'])]


Extract the features and targets:

In [47]:
# Features
X_train = [extractSentenceFeatures(sentence) for sentence in train_sentences]
X_dev = [extractSentenceFeatures(sentence) for sentence in dev_sentences]
# Target
y_train = [extractSentenceTargets(sentence) for sentence in train_sentences]
y_dev = [extractSentenceTargets(sentence) for sentence in dev_sentences]

#### Training

In [48]:
algorithms = ['lbfgs', 'l2sgd', 'ap', 'pa', 'arow']
max_iters=50

In [49]:
targets = perso_cat_tags
print(targets)

['B-Feminine', 'B-Masculine', 'B-Occupation', 'B-Unknown', 'I-Feminine', 'I-Masculine', 'I-Occupation', 'I-Unknown']


Note that the individual tag scores are reported in alphabetical order as they are listed above!

In [50]:
crf0A = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0, max_iterations=max_iters, all_possible_transitions=True) # unlimited iterations
crf0B = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, max_iterations=max_iters, all_possible_transitions=True)
crf0C = sklearn_crfsuite.CRF(algorithm=algorithms[0], c1=0.1, c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf1A = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=1.0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 1000
crf1B = sklearn_crfsuite.CRF(algorithm=algorithms[1], c2=0.2, max_iterations=max_iters, all_possible_transitions=True)

crf2 = sklearn_crfsuite.CRF(algorithm=algorithms[2], max_iterations=max_iters, all_possible_transitions=True) # max iters: 100

crf3A = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=0, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf3B = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=1, max_iterations=max_iters, all_possible_transitions=True)
crf3C = sklearn_crfsuite.CRF(algorithm=algorithms[3], pa_type=2, max_iterations=max_iters, all_possible_transitions=True)

crf4A = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=1, max_iterations=max_iters, all_possible_transitions=True) # max iters: 100
crf4B = sklearn_crfsuite.CRF(algorithm=algorithms[4], variance=0.5, max_iterations=max_iters, all_possible_transitions=True)

In [51]:
try:
    crf0A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0A.predict(X_dev)

# Evaluate
print(crf0A.algorithm, crf0A.c1, crf0A.c2, crf0A.pa_type, crf0A.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0 None None None
  Macro:
  - F1: 0.11806056283047392
  - Prec: 0.4043996591038267
  - Rec 0.07129545254147415
  Micro:
  - F1: 0.1202252944188428
  - Prec: 0.43643122676579926
  - Rec 0.06971496437054632
  Per Label:
  - F1: [0.19259259 0.20689655 0.00304878 0.09503916 0.11994003 0.19811321
 0.00251256 0.12634161]
  - Prec: [0.74285714 0.46601942 0.125      0.38396624 0.60606061 0.42567568
 0.05555556 0.43006263]
  - Rec [0.1106383  0.13296399 0.00154321 0.05423123 0.06655574 0.12909836
 0.00128535 0.07404745]


In [52]:
try:
    crf0B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0B.predict(X_dev)

# Evaluate
print(crf0B.algorithm, crf0B.c1, crf0B.c2, crf0B.pa_type, crf0B.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 None None None
  Macro:
  - F1: 0.4019058883799401
  - Prec: 0.647656187601269
  - Rec 0.30500624658036857
  Micro:
  - F1: 0.3995488964072821
  - Prec: 0.6209313970956435
  - Rec 0.29453681710213775
  Per Label:
  - F1: [0.57356608 0.40512821 0.28498728 0.38471023 0.49947313 0.35170604
 0.28218332 0.43349282]
  - Prec: [0.69277108 0.52901786 0.8115942  0.61986755 0.68103448 0.48905109
 0.70984456 0.64806867]
  - Rec [0.4893617  0.32825485 0.17283951 0.27890346 0.39434276 0.27459016
 0.17609254 0.32566499]


In [53]:
try:
    crf0C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf0C.predict(X_dev)

# Evaluate
print(crf0C.algorithm, crf0C.c1, crf0C.c2, crf0C.pa_type, crf0C.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

lbfgs 0.1 0.2 None None
  Macro:
  - F1: 0.4966810338805706
  - Prec: 0.6569351509464121
  - Rec 0.40417134541234684
  Micro:
  - F1: 0.47962591850367403
  - Prec: 0.6394221254700179
  - Rec 0.383729216152019
  Per Label:
  - F1: [0.65550239 0.41196013 0.4978903  0.45596645 0.62535748 0.34590377
 0.47733105 0.50353669]
  - Prec: [0.74863388 0.51452282 0.78666667 0.63280423 0.73214286 0.47330961
 0.71355499 0.65384615]
  - Rec [0.58297872 0.3434903  0.36419753 0.35637664 0.54575707 0.27254098
 0.35861183 0.40941769]


In [54]:
try:
    crf1A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1A.predict(X_dev)

# Evaluate
print(crf1A.algorithm, crf1A.c1, crf1A.c2, crf1A.pa_type, crf1A.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 1.0 None None
  Macro:
  - F1: 0.349324614839451
  - Prec: 0.6810710834972303
  - Rec 0.2613987831635396
  Micro:
  - F1: 0.3216942329570316
  - Prec: 0.6112404389757233
  - Rec 0.2182897862232779
  Per Label:
  - F1: [0.58221024 0.45285935 0.19099591 0.246139   0.4950495  0.37765634
 0.16397229 0.28571429]
  - Prec: [0.79411765 0.51223776 0.82352941 0.64720812 0.73051948 0.46348733
 0.80681818 0.67065073]
  - Rec [0.45957447 0.40581717 0.10802469 0.15196663 0.37437604 0.31864754
 0.09125964 0.18152408]


In [55]:
try:
    crf1B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf1B.predict(X_dev)

# Evaluate
print(crf1B.algorithm, crf1B.c1, crf1B.c2, crf1B.pa_type, crf1B.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

l2sgd None 0.2 None None
  Macro:
  - F1: 0.34286973415639593
  - Prec: 0.662314589957276
  - Rec 0.24484063837773637
  Micro:
  - F1: 0.3241336742791641
  - Prec: 0.6292365628209518
  - Rec 0.2182897862232779
  Per Label:
  - F1: [0.57526882 0.38923767 0.1972973  0.33727551 0.46614872 0.29356471
 0.17508418 0.30908096]
  - Prec: [0.7810219  0.55216285 0.79347826 0.63636364 0.7        0.4987715
 0.69026549 0.64645309]
  - Rec [0.45531915 0.30055402 0.11265432 0.22943981 0.34941764 0.2079918
 0.10025707 0.2030913 ]


In [56]:
try:
    crf2.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf2.predict(X_dev)

# Evaluate
print(crf2.algorithm, crf2.c1, crf2.c2, crf2.pa_type, crf2.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

ap None None None None
  Macro:
  - F1: 0.42742563294382585
  - Prec: 0.6790362476324592
  - Rec 0.33416246546919387
  Micro:
  - F1: 0.41287817505258045
  - Prec: 0.6473871131405378
  - Rec 0.3030878859857482
  Per Label:
  - F1: [0.63636364 0.46992783 0.36682243 0.43791103 0.57471264 0.31905465
 0.21252796 0.40208488]
  - Prec: [0.68292683 0.55809524 0.75480769 0.62403528 0.77247191 0.57142857
 0.81896552 0.64955894]
  - Rec [0.59574468 0.40581717 0.24228395 0.33730632 0.45757072 0.22131148
 0.12210797 0.29115744]


In [57]:
try:
    crf3A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3A.predict(X_dev)

# Evaluate
print(crf3A.algorithm, crf3A.c1, crf3A.c2, crf3A.pa_type, crf3A.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 0 None
  Macro:
  - F1: 0.4784699483902634
  - Prec: 0.6952382695543528
  - Rec 0.37839935128012814
  Micro:
  - F1: 0.4751027866605756
  - Prec: 0.6618582944420874
  - Rec 0.37054631828978624
  Per Label:
  - F1: [0.65566038 0.46666667 0.47739222 0.48789435 0.58016878 0.31363636
 0.3401222  0.50621863]
  - Prec: [0.73544974 0.58577406 0.74917492 0.63454198 0.7925072  0.60174419
 0.81862745 0.64408662]
  - Rec [0.59148936 0.38781163 0.35030864 0.39630513 0.45757072 0.21209016
 0.21465296 0.41696621]


In [58]:
try:
    crf3B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3B.predict(X_dev)

# Evaluate
print(crf3B.algorithm, crf3B.c1, crf3B.c2, crf3B.pa_type, crf3B.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 1 None
  Macro:
  - F1: 0.4830839994843489
  - Prec: 0.6936733090977836
  - Rec 0.38455668657324876
  Micro:
  - F1: 0.47600913937547606
  - Prec: 0.6634819532908705
  - Rec 0.37114014251781474
  Per Label:
  - F1: [0.67132867 0.46422629 0.48856549 0.48803828 0.60474716 0.30335366
 0.34210526 0.50230719]
  - Prec: [0.74226804 0.58125    0.74840764 0.63811357 0.79619565 0.5922619
 0.8047619  0.64612776]
  - Rec [0.61276596 0.38642659 0.36265432 0.39511323 0.4875208  0.20389344
 0.21722365 0.4108555 ]


In [59]:
try:
    crf3C.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf3C.predict(X_dev)

# Evaluate
print(crf3C.algorithm, crf3C.c1, crf3C.c2, crf3C.pa_type, crf3C.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

pa None None 2 None
  Macro:
  - F1: 0.4773530017269165
  - Prec: 0.696614533193446
  - Rec 0.3761656310189457
  Micro:
  - F1: 0.4739488117001828
  - Prec: 0.6607901444350043
  - Rec 0.3694774346793349
  Per Label:
  - F1: [0.65393795 0.45923461 0.48580442 0.49176729 0.57142857 0.31268882
 0.34046891 0.50349345]
  - Prec: [0.74456522 0.575      0.76237624 0.63696682 0.79525223 0.59482759
 0.8226601  0.64126808]
  - Rec [0.58297872 0.38227147 0.35648148 0.40047676 0.44592346 0.21209016
 0.21465296 0.41445004]


In [62]:
try:
    crf4A.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4A.predict(X_dev)

# Evaluate
print(crf4A.algorithm, crf4A.c1, crf4A.c2, crf4A.pa_type, crf4A.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 1
  Macro:
  - F1: 0.5133300275449454
  - Prec: 0.5378413835501588
  - Rec 0.5040420010135154
  Micro:
  - F1: 0.49219892735251103
  - Prec: 0.5055082623935904
  - Rec 0.4795724465558195
  Per Label:
  - F1: [0.61075269 0.49718222 0.60491803 0.50335121 0.58477157 0.38247012
 0.42766631 0.49552807]
  - Prec: [0.6173913  0.45371429 0.6451049  0.57503828 0.75       0.37209302
 0.36290323 0.52648605]
  - Rec [0.60425532 0.5498615  0.56944444 0.44755662 0.47920133 0.39344262
 0.52056555 0.46800863]


In [61]:
try:
    crf4B.fit(X_train, y_train)
except AttributeError:
    pass

# Predict
y_pred = crf4B.predict(X_dev)

# Evaluate
print(crf4B.algorithm, crf4B.c1, crf4B.c2, crf4B.pa_type, crf4B.variance)
print("  Macro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="macro", zero_division=0, labels=targets))
print("  Micro:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average="micro", zero_division=0, labels=targets))
print("  Per Label:")
print("  - F1:", metrics.flat_f1_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Prec:", metrics.flat_precision_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))
print("  - Rec", metrics.flat_recall_score(y_dev, y_pred, average=None, zero_division=0, labels=targets))

arow None None None 0.5
  Macro:
  - F1: 0.48948765405359684
  - Prec: 0.5048488465375063
  - Rec 0.4791669431025771
  Micro:
  - F1: 0.4660059339688151
  - Prec: 0.49737232178951624
  - Rec 0.43836104513064134
  Per Label:
  - F1: [0.62985685 0.37897469 0.6215781  0.47994697 0.49915966 0.36127637
 0.49381188 0.45129671]
  - Prec: [0.60629921 0.35653236 0.64983165 0.54070202 0.50424448 0.38258877
 0.47613365 0.52245863]
  - Rec [0.65531915 0.40443213 0.59567901 0.43146603 0.49417637 0.34221311
 0.51285347 0.39719626]


**Best model:** AROW with variance=1