# Build NER Tagger from Scratch

Baut auf dem Sourcecode von Sarkar auf (Ch08c), 
erweitert um eigene Ergänzungen und Anpassungen  

__Named Entity Recognition (NER)__ , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

There are various off the shelf solutions which offer capabilities to perform named entity extraction. Yet there are times when the requirements are beyond the capabilities of off-the-shelf classifiers.

In this notebook, we will go through an exercise to build our own NER using Conditional Random Fields.
We would be utilizing ```sklearn_crfsuite``` to develop our NER.


The goal of a named entity recognition (NER) system is to identify all textual mentions of the named entities. This can be broken down into two sub-tasks: identifying the boundaries of the NE, and identifying its type.

Named entity recognition is a task that is well-suited to the type of classifier-based approach. In particular, a tagger can be built that labels each word in a sentence using the IOB format, where chunks are labelled by their appropriate type.

The IOB Tagging system contains tags of the form:

B - {CHUNK_TYPE} – for the word in the Beginning chunk
I - {CHUNK_TYPE} – for words Inside the chunk
O – Outside any chunk
The IOB tags are further classified into the following classes –

geo = Geographical Entity
org = Organization
per = Person
gpe = Geopolitical Entity
tim = Time indicator
art = Artifact
eve = Event
nat = Natural Phenomenon

In [2]:
import pandas as pd

df = pd.read_csv("./ner_dataset.csv", encoding='ISO-8859-1')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 4 columns):
 #   Column      Non-Null Count    Dtype 
---  ------      --------------    ----- 
 0   Sentence #  47959 non-null    object
 1   Word        1048565 non-null  object
 2   POS         1048575 non-null  object
 3   Tag         1048575 non-null  object
dtypes: object(4)
memory usage: 32.0+ MB


Hmm, 50'000 Sätze mit ca. 1'000'000 Wörtern. 
Wie sieht das im Dataframe aus?

In [3]:
df.head(20)

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,,of,IN,O
2,,demonstrators,NNS,O
3,,have,VBP,O
4,,marched,VBN,O
5,,through,IN,O
6,,London,NNP,B-geo
7,,to,TO,O
8,,protest,VB,O
9,,the,DT,O


In [4]:
# übersichtlicher
df.head(20).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
Sentence #,Sentence: 1,,,,,,,,,,,,,,,,,,,
Word,Thousands,of,demonstrators,have,marched,through,London,to,protest,the,war,in,Iraq,and,demand,the,withdrawal,of,British,troops
POS,NNS,IN,NNS,VBP,VBN,IN,NNP,TO,VB,DT,NN,IN,NNP,CC,VB,DT,NN,IN,JJ,NNS
Tag,O,O,O,O,O,O,B-geo,O,O,O,O,O,B-geo,O,O,O,O,O,B-gpe,O


In [5]:
# Sentence # auffüllen
df = df.fillna(method='ffill')  # forward fill
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 4 columns):
 #   Column      Non-Null Count    Dtype 
---  ------      --------------    ----- 
 0   Sentence #  1048575 non-null  object
 1   Word        1048575 non-null  object
 2   POS         1048575 non-null  object
 3   Tag         1048575 non-null  object
dtypes: object(4)
memory usage: 32.0+ MB


  df = df.fillna(method='ffill')  # forward fill


In [6]:
df['Sentence #'].nunique(), df.Word.nunique(), df.POS.nunique(), df.Tag.nunique()

(47959, 35177, 42, 17)

In [7]:
df.Tag.value_counts()

Tag
O        887908
B-geo     37644
B-tim     20333
B-org     20143
I-per     17251
B-per     16990
I-org     16784
B-gpe     15870
I-geo      7414
I-tim      6528
B-art       402
B-eve       308
I-art       297
I-eve       253
B-nat       201
I-gpe       198
I-nat        51
Name: count, dtype: int64

ungleichmässige Verteilung - was bedeutet das für das Machine Learning?

## Conditional Random Fields

HR: CRF ist besonders geeignet für Sequenz-Vorhersage, also bspw. Tagging einer Sequenz (POS, NER)

As mentioned above, NER belongs to sequence modeling class of problems. There are different algorithms to tackle sequence modeling, __CRF__ or _Conditional Random Fields_ are one such example. CRFs are proven to perform extremely well on NER and related domains. In this notebook, we will attempt at developing our own NER based on CRFs.

---

__Question__: What is a CRF and how does it work?

__Wikipedia__ :  CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets $X$ and $Y$, the observed and output variables, respectively; the conditional distribution $p(Y|X)$ is then modeled.

For more details, checkout the paper [__Conditional Random Fields: Probabilistic Models
for Segmenting and Labeling Sequence Data__](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers)

## Prepare Data

CRF trains upon sequence of input data to learn transitions from one state (label) to another. 
To enable such an algorithm, we need to define features which take into account different transitions. 
In the function ```word2features()``` below, we transform each word into a feature dictionary depicting the following attributes or features:

+ lower case of word
+ suffix containing last 3 characters
+ suffix containing last 2 characters
+ flags to determine upper-case, title-case, numeric data and POS tag

We also attach attributes related to previous and next words or tags to determine beginning of sentence (BOS) or end of sentence (EOS)

Anmerkung HR: diese Features sind im Englischen besser geeignet als für das Deutsche

In [8]:
def word2features(sent, i):
    word = sent[i][0]
    postag = sent[i][1]

    features = {
        # das Wort an der aktuellen Position
        # 'bias': 1.0, HR: scheint keine Rolle zu spielen
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word[-2:]': word[-2:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
        'postag': postag,
        'postag[:2]': postag[:2],
    }
    if i > 0:
        # das Wort davor (falls vorhanden)
        word1 = sent[i-1][0]
        postag1 = sent[i-1][1]
        features.update({
            '-1:word.lower()': word1.lower(),
            '-1:word.istitle()': word1.istitle(),
            '-1:word.isupper()': word1.isupper(),
            '-1:postag': postag1,
            '-1:postag[:2]': postag1[:2],
        })
    else:
        features['BOS'] = True

    if i < len(sent)-1:
        # das Wort danach (falls vorhanden)
        word1 = sent[i+1][0]
        postag1 = sent[i+1][1]
        features.update({
            '+1:word.lower()': word1.lower(),
            '+1:word.istitle()': word1.istitle(),
            '+1:word.isupper()': word1.isupper(),
            '+1:postag': postag1,
            '+1:postag[:2]': postag1[:2],
        })
    else:
        features['EOS'] = True

    return features

In [9]:
def sent2features(sent):
    return [word2features(sent, i) for i in range(len(sent))]

def sent2labels(sent):
    return [label for token, postag, label in sent]

In [10]:
# Funktion zum Zusammenstellen der Werte in je einem Tripel pro Wort: (Wort, POS-Tag, NER-Tag)
#  übergeben wird der Dataframe, mit dem Zip werden die Einzelwert-Listen zu einer Liste zusammengefasst; dann zu einem Tripel zusammengestellt
agg_func = lambda s: [(w, p, t) for w, p, t in zip(s['Word'].values.tolist(), 
                                                   s['POS'].values.tolist(), 
                                                   s['Tag'].values.tolist())]

In [11]:
grouped_df = df.groupby('Sentence #').apply(agg_func)  # Gruppieren anhand der Satznummer (also jeder Satz für sich)

print(grouped_df[grouped_df.index == 'Sentence: 1'].values)

[list([('Thousands', 'NNS', 'O'), ('of', 'IN', 'O'), ('demonstrators', 'NNS', 'O'), ('have', 'VBP', 'O'), ('marched', 'VBN', 'O'), ('through', 'IN', 'O'), ('London', 'NNP', 'B-geo'), ('to', 'TO', 'O'), ('protest', 'VB', 'O'), ('the', 'DT', 'O'), ('war', 'NN', 'O'), ('in', 'IN', 'O'), ('Iraq', 'NNP', 'B-geo'), ('and', 'CC', 'O'), ('demand', 'VB', 'O'), ('the', 'DT', 'O'), ('withdrawal', 'NN', 'O'), ('of', 'IN', 'O'), ('British', 'JJ', 'B-gpe'), ('troops', 'NNS', 'O'), ('from', 'IN', 'O'), ('that', 'DT', 'O'), ('country', 'NN', 'O'), ('.', '.', 'O')])]


In [12]:
grouped_df.info()

<class 'pandas.core.series.Series'>
Index: 47959 entries, Sentence: 1 to Sentence: 9999
Series name: None
Non-Null Count  Dtype 
--------------  ----- 
47959 non-null  object
dtypes: object(1)
memory usage: 749.4+ KB


eine Zeile pro Satz

In [13]:
grouped_df.head()

Sentence #
Sentence: 1        [(Thousands, NNS, O), (of, IN, O), (demonstrat...
Sentence: 10       [(Iranian, JJ, B-gpe), (officials, NNS, O), (s...
Sentence: 100      [(Helicopter, NN, O), (gunships, NNS, O), (Sat...
Sentence: 1000     [(They, PRP, O), (left, VBD, O), (after, IN, O...
Sentence: 10000    [(U.N., NNP, B-geo), (relief, NN, O), (coordin...
dtype: object

Alle Sätze in eine Liste

In [14]:
sentences = [s for s in grouped_df]
sentences[0]

[('Thousands', 'NNS', 'O'),
 ('of', 'IN', 'O'),
 ('demonstrators', 'NNS', 'O'),
 ('have', 'VBP', 'O'),
 ('marched', 'VBN', 'O'),
 ('through', 'IN', 'O'),
 ('London', 'NNP', 'B-geo'),
 ('to', 'TO', 'O'),
 ('protest', 'VB', 'O'),
 ('the', 'DT', 'O'),
 ('war', 'NN', 'O'),
 ('in', 'IN', 'O'),
 ('Iraq', 'NNP', 'B-geo'),
 ('and', 'CC', 'O'),
 ('demand', 'VB', 'O'),
 ('the', 'DT', 'O'),
 ('withdrawal', 'NN', 'O'),
 ('of', 'IN', 'O'),
 ('British', 'JJ', 'B-gpe'),
 ('troops', 'NNS', 'O'),
 ('from', 'IN', 'O'),
 ('that', 'DT', 'O'),
 ('country', 'NN', 'O'),
 ('.', '.', 'O')]

Für die Sätze können die Features erstellt werden, hier ein Beispiel:

In [15]:
sent2features(sentences[0][5:7])


[{'word.lower()': 'through',
  'word[-3:]': 'ugh',
  'word[-2:]': 'gh',
  'word.isupper()': False,
  'word.istitle()': False,
  'word.isdigit()': False,
  'postag': 'IN',
  'postag[:2]': 'IN',
  'BOS': True,
  '+1:word.lower()': 'london',
  '+1:word.istitle()': True,
  '+1:word.isupper()': False,
  '+1:postag': 'NNP',
  '+1:postag[:2]': 'NN'},
 {'word.lower()': 'london',
  'word[-3:]': 'don',
  'word[-2:]': 'on',
  'word.isupper()': False,
  'word.istitle()': True,
  'word.isdigit()': False,
  'postag': 'NNP',
  'postag[:2]': 'NN',
  '-1:word.lower()': 'through',
  '-1:word.istitle()': False,
  '-1:word.isupper()': False,
  '-1:postag': 'IN',
  '-1:postag[:2]': 'IN',
  'EOS': True}]

Dito für die Label

In [16]:
print(sent2labels(sentences[0]))

['O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O']


## Prepare Train and Test Datasets

In [17]:
from sklearn.model_selection import train_test_split
import numpy as np

X = np.array([sent2features(s) for s in sentences], dtype=object)
Y = np.array([sent2labels(s) for s in sentences], dtype=object)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=42)
X_train.shape, X_test.shape

((35969,), (11990,))

In [None]:
# !pip install sklearn-crfsuite  # nur einmal ausführen!

# ACHTUNG

Die CRF-Suite ist nicht kompatibel mit den aktuellen Versionen von SK-learn. Daher müssen im Sourcecode zwei kleine Änderungen durchgeführt werden:

In der Klassendefinition von CRF (Dateiname: estimator.py) müssen in der __init__ Methode zwei Variablen ergänzt werden:

       self.keep_tempfiles = keep_tempfiles
       self.model_filename = model_filename

Einfach an das Ende der Deklaration anhängen. 

Die Datei finden sie bei den installierten Bibliotheken oder über pycharm mit Klick auf "CRF", dann "F4" ("Jump to Source").
  

# Train the model!

Train the model using the default configurations mentioned in the [sklearn-crfsuite API docs](https://sklearn-crfsuite.readthedocs.io/en/latest/api.html)


- __algorithm:__ the training algorithm. We use [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) for gradient descent for optimization and getting model parameters
- __c1:__ Coefficient for Lasso (L1) regularization
- __c2:__ Coefficient for Ridge (L2) regularization
- __all_possible_transitions:__ Specify whether CRFsuite generates transition features that do not even occur in the training data


In [18]:
import sklearn_crfsuite

crf = sklearn_crfsuite.CRF(algorithm='lbfgs',
                           c1=0.1,
                           c2=0.1,
                           max_iterations=100,
                           all_possible_transitions=True,
                           verbose=True)

In [19]:
crf.fit(X_train, y_train)


loading training data to CRFsuite: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 35969/35969 [00:04<00:00, 7982.47it/s]



Feature generation
type: CRF1d
feature.minfreq: 0.000000
feature.possible_states: 0
feature.possible_transitions: 1
0....1....2....3....4....5....6....7....8....9....10
Number of features: 133612
Seconds required: 1.013

L-BFGS optimization
c1: 0.100000
c2: 0.100000
num_memories: 6
max_iterations: 100
epsilon: 0.000010
stop: 10
delta: 0.000010
linesearch: MoreThuente
linesearch.max_iterations: 20

Iter 1   time=1.59  loss=1452006.34 active=132620 feature_norm=1.00
Iter 2   time=2.35  loss=1060401.86 active=131946 feature_norm=5.56
Iter 3   time=0.80  loss=796426.72 active=125323 feature_norm=4.88
Iter 4   time=3.98  loss=469102.20 active=126078 feature_norm=3.95
Iter 5   time=0.79  loss=406489.87 active=131847 feature_norm=4.75
Iter 6   time=0.79  loss=344518.07 active=130981 feature_norm=5.02
Iter 7   time=0.82  loss=302282.92 active=130924 feature_norm=5.68
Iter 8   time=0.80  loss=254083.73 active=126248 feature_norm=7.16
Iter 9   time=0.79  loss=217889.18 active=115925 feature_nor

# Modell testen

In [20]:
y_pred = crf.predict(X_test)
print(y_pred[0])

['O', 'O', 'O', 'O', 'B-per', 'I-per', 'O', 'B-org', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']


In [21]:
# zum Vergleich
print(y_test[0])

['O', 'O', 'O', 'O', 'B-per', 'I-per', 'O', 'B-org', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']


In [22]:
from sklearn_crfsuite import metrics as crf_metrics

crf_metrics.flat_accuracy_score(y_test, y_pred)

0.9726367115218063

Liefert ordentliche Werte. Wie sieht es in der praktischen Anwendung aus?

Dafür müssen wir einen Text ja zuerst mit POS-Tags versehen, dann kann das Modell darauf angewendet werden.

In [23]:
import re

text = """Three more countries have joined an “international grand committee” of parliaments, adding to calls for 
Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore 
bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 
November with the intention of hearing from Zuckerberg. Since the Cambridge Analytica scandal broke, the Facebook chief 
has only appeared in front of two legislatures: the American Senate and House of Representatives, and the European parliament. 
Facebook has consistently rebuffed attempts from others, including the UK and Canadian parliaments, to hear from Zuckerberg. 
He added that an article in the New York Times on Thursday, in which the paper alleged a pattern of behaviour from Facebook 
to “delay, deny and deflect” negative news stories, “raises further questions about how recent data breaches were allegedly 
dealt with within Facebook.”
"""

text = re.sub(r'\n', '', text)
text

'Three more countries have joined an “international grand committee” of parliaments, adding to calls for Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 November with the intention of hearing from Zuckerberg. Since the Cambridge Analytica scandal broke, the Facebook chief has only appeared in front of two legislatures: the American Senate and House of Representatives, and the European parliament. Facebook has consistently rebuffed attempts from others, including the UK and Canadian parliaments, to hear from Zuckerberg. He added that an article in the New York Times on Thursday, in which the paper alleged a pattern of behaviour from Facebook to “delay, deny and deflect” negative news stories, “raises further questions about how recent data breaches were allegedly dealt with within Facebook.”'

### Pipeline Step 1

- Tokenize Text
- POS Tagging

In [24]:
import nltk

text_tokens = nltk.word_tokenize(text)
text_pos = nltk.pos_tag(text_tokens)
text_pos[:10]

[('Three', 'CD'),
 ('more', 'JJR'),
 ('countries', 'NNS'),
 ('have', 'VBP'),
 ('joined', 'VBN'),
 ('an', 'DT'),
 ('“', 'NNP'),
 ('international', 'JJ'),
 ('grand', 'JJ'),
 ('committee', 'NN')]

### Pipeline Step 2
- Extract Features from the POS tagged text document
- Hint: Use `sent2features`

In [25]:
features = [sent2features(text_pos)]
features[0][0]

{'word.lower()': 'three',
 'word[-3:]': 'ree',
 'word[-2:]': 'ee',
 'word.isupper()': False,
 'word.istitle()': True,
 'word.isdigit()': False,
 'postag': 'CD',
 'postag[:2]': 'CD',
 'BOS': True,
 '+1:word.lower()': 'more',
 '+1:word.istitle()': False,
 '+1:word.isupper()': False,
 '+1:postag': 'JJR',
 '+1:postag[:2]': 'JJ'}

### Pipeline Step 3
- Use the CRF Model `crf` to predict on the features

In [26]:
labels = crf.predict(features)
doc_labels = labels[0]
doc_labels[10:20]

['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-art', 'I-art']

### Pipeline Step 4
- Combine text tokens with NER Tags
- Retrieve relevant named entities from NER Tags

In [27]:
text_ner = [(token, tag) for token, tag in zip(text_tokens, doc_labels)]
print(text_ner)

[('Three', 'O'), ('more', 'O'), ('countries', 'O'), ('have', 'O'), ('joined', 'O'), ('an', 'O'), ('“', 'O'), ('international', 'O'), ('grand', 'O'), ('committee', 'O'), ('”', 'O'), ('of', 'O'), ('parliaments', 'O'), (',', 'O'), ('adding', 'O'), ('to', 'O'), ('calls', 'O'), ('for', 'O'), ('Facebook', 'B-art'), ('’', 'I-art'), ('s', 'O'), ('boss', 'O'), (',', 'O'), ('Mark', 'B-per'), ('Zuckerberg', 'I-per'), (',', 'O'), ('to', 'O'), ('give', 'O'), ('evidence', 'O'), ('on', 'O'), ('misinformation', 'O'), ('to', 'O'), ('the', 'O'), ('coalition', 'O'), ('.', 'O'), ('Brazil', 'B-geo'), (',', 'O'), ('Latvia', 'B-org'), ('and', 'I-org'), ('Singapore', 'I-org'), ('bring', 'O'), ('the', 'O'), ('total', 'O'), ('to', 'O'), ('eight', 'O'), ('different', 'O'), ('parliaments', 'O'), ('across', 'O'), ('the', 'O'), ('world', 'O'), (',', 'O'), ('with', 'O'), ('plans', 'O'), ('to', 'O'), ('send', 'O'), ('representatives', 'O'), ('to', 'O'), ('London', 'B-geo'), ('on', 'O'), ('27', 'B-tim'), ('November', 

Insgesamt schon brauchbar...

Zum Vergleich und für eigene Experimente: spaCy

In [41]:
import spacy
from spacy import displacy

nlp = spacy.load('en_core_web_sm')2y
text_nlp = nlp(text)
text_nlp
displacy.render(text_nlp, style='ent', jupyter=True)

Testweise Auswertung, da die metrics-library rumzickt...    

In [29]:
from sklearn.metrics import classification_report

classification_report(y_test, y_pred,labels=labels)

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.

In [30]:
from collections import defaultdict

def confusion_matrix(y_true, y_pred):
    """
    Konfusionsmatrix für den Datentyp hier: Liste von Listen mit variabler Länge
    Label werden on-the-fly angelegt, anhand von y_true
    Rückgabe: Dictionary echtes_label: (Dictionary vorhergesagtes_label: Anzahl)
    :param y_true: 
    :param y_pred: 
    :return: 
    """
    cm = defaultdict(lambda: defaultdict(int))
    for seq1, seq2 in zip(y_true, y_pred):
        for l1, l2 in zip(seq1, seq2):
            cm[l1][l2] += 1
    return cm
            

In [31]:
conf_mat = confusion_matrix(y_test, y_pred)
print(conf_mat)

defaultdict(<function confusion_matrix.<locals>.<lambda> at 0x7f5f7a2dde40>, {'O': defaultdict(<class 'int'>, {'O': 220742, 'B-per': 105, 'B-gpe': 44, 'B-tim': 206, 'B-org': 215, 'I-org': 220, 'I-tim': 127, 'B-geo': 152, 'I-per': 80, 'I-geo': 29, 'I-eve': 3, 'B-art': 4, 'I-gpe': 1, 'B-eve': 2, 'I-art': 2, 'B-nat': 2, 'I-nat': 2}), 'B-per': defaultdict(<class 'int'>, {'B-per': 3536, 'B-geo': 186, 'I-org': 51, 'I-per': 137, 'B-org': 157, 'O': 158, 'I-nat': 1, 'B-eve': 3, 'B-art': 2, 'I-tim': 2, 'B-tim': 3, 'I-geo': 3}), 'I-per': defaultdict(<class 'int'>, {'I-per': 3860, 'I-geo': 49, 'I-org': 162, 'B-org': 16, 'B-per': 112, 'B-geo': 6, 'O': 64, 'I-eve': 1, 'I-art': 1, 'B-tim': 1, 'I-tim': 1}), 'B-org': defaultdict(<class 'int'>, {'B-org': 3761, 'B-geo': 624, 'O': 302, 'B-art': 10, 'I-per': 82, 'B-per': 224, 'B-gpe': 20, 'I-org': 59, 'I-tim': 7, 'I-geo': 8, 'B-eve': 7, 'B-tim': 10, 'I-eve': 2}), 'B-gpe': defaultdict(<class 'int'>, {'B-gpe': 3728, 'B-geo': 162, 'B-per': 7, 'B-org': 29, 'O'

In [32]:
cnf_df = pd.DataFrame(conf_mat)
cnf_df

Unnamed: 0,O,B-per,I-per,B-org,B-gpe,I-org,B-geo,B-tim,I-geo,I-tim,B-art,I-gpe,B-nat,I-nat,B-eve,I-art,I-eve
O,220742,158.0,64.0,302.0,20.0,263.0,186.0,414.0,48.0,202.0,28.0,3.0,26.0,9.0,25.0,23.0,22.0
B-per,105,3536.0,112.0,224.0,7.0,44.0,108.0,2.0,3.0,2.0,7.0,,1.0,,2.0,1.0,
B-gpe,44,,,20.0,3728.0,3.0,24.0,,3.0,,7.0,1.0,,,,,
B-tim,206,3.0,1.0,10.0,,3.0,11.0,4513.0,3.0,95.0,1.0,,,,7.0,1.0,2.0
B-org,215,157.0,16.0,3761.0,29.0,68.0,313.0,12.0,13.0,,25.0,1.0,10.0,,6.0,3.0,
I-org,220,51.0,162.0,59.0,4.0,3331.0,49.0,4.0,134.0,3.0,5.0,3.0,1.0,2.0,1.0,39.0,9.0
I-tim,127,2.0,1.0,7.0,,4.0,2.0,72.0,,1291.0,,,1.0,,5.0,,12.0
B-geo,152,186.0,6.0,624.0,162.0,65.0,8602.0,69.0,77.0,6.0,12.0,,2.0,,4.0,2.0,1.0
I-per,80,137.0,3860.0,82.0,,231.0,33.0,4.0,70.0,3.0,1.0,4.0,,,,12.0,1.0
I-geo,29,3.0,49.0,8.0,8.0,165.0,62.0,4.0,1468.0,2.0,,5.0,,,,3.0,1.0


Accuracy/Genauigkeit

In [33]:
korrekt_gesamt = 0
anzahl = 0

korrekt_labels = 0
labels_anzahl = 0

for lbl, dct in zip(conf_mat.keys(), conf_mat.values()):
    korrekt = dct[lbl]
    gesamt = sum(dct.values())
    print(f"Label: {lbl} \t korrekt: {korrekt} \t gesamt: {gesamt} \t Genauigkeit: {korrekt/gesamt}")
    korrekt_gesamt += korrekt
    anzahl += gesamt
    korrekt_labels += korrekt/gesamt
    labels_anzahl += 1
    
print()
print(f"Genauigkeit über alle: {korrekt_gesamt/anzahl}")
print(f"Genauigkeit gemittelt: {korrekt_labels/labels_anzahl}")

Label: O 	 korrekt: 220742 	 gesamt: 221936 	 Genauigkeit: 0.9946200706509984
Label: B-per 	 korrekt: 3536 	 gesamt: 4239 	 Genauigkeit: 0.8341589997640954
Label: I-per 	 korrekt: 3860 	 gesamt: 4273 	 Genauigkeit: 0.903346594898198
Label: B-org 	 korrekt: 3761 	 gesamt: 5116 	 Genauigkeit: 0.7351446442533229
Label: B-gpe 	 korrekt: 3728 	 gesamt: 3961 	 Genauigkeit: 0.9411764705882353
Label: I-org 	 korrekt: 3331 	 gesamt: 4195 	 Genauigkeit: 0.7940405244338499
Label: B-geo 	 korrekt: 8602 	 gesamt: 9403 	 Genauigkeit: 0.9148144209294906
Label: B-tim 	 korrekt: 4513 	 gesamt: 5095 	 Genauigkeit: 0.8857703631010795
Label: I-geo 	 korrekt: 1468 	 gesamt: 1826 	 Genauigkeit: 0.8039430449069004
Label: I-tim 	 korrekt: 1291 	 gesamt: 1604 	 Genauigkeit: 0.8048628428927681
Label: B-art 	 korrekt: 15 	 gesamt: 102 	 Genauigkeit: 0.14705882352941177
Label: I-gpe 	 korrekt: 19 	 gesamt: 36 	 Genauigkeit: 0.5277777777777778
Label: B-nat 	 korrekt: 14 	 gesamt: 55 	 Genauigkeit: 0.25454545454545