# Named Entity Recognition and Classification (NERC) Analysis

**Objective:** Compare two state-of-the-art NER systems on provided test data

**Systems Evaluated:**
- System 1: spaCy transformer model (`en_core_web_trf`)  
- System 2: BERT NER model (`dslim/bert-base-NER`)

**Methodology:** Apply both pre-trained models to NER-test.tsv and compare performance using standard NER evaluation metrics.

 Importing necessary libraries

In [1]:
import pandas as pd
import numpy as np
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForTokenClassification, 
    TrainingArguments, DataCollatorForTokenClassification
)

import torch
from sklearn.metrics import classification_report
from seqeval.metrics import accuracy_score, f1_score, precision_score, recall_score
import spacy
from spacy.training import Example
import random


test_data = pd.read_csv('datasets/NER-test.tsv', sep='\t')
test_data.drop('sentence_id', axis=1, inplace=True)
print(test_data.head())

   token_id     token BIO_NER_tag
0         0        If           O
1         1    you're           O
2         2  visiting           O
3         3     Paris  B-LOCATION
4         4         ,           O


Data Set Loading and Preprocessing


In [2]:
class NERDatasetLoader:
    def __init__(self):
        self.datasets = {}
        self.label_mappings = {}
    
    def load_conll2003(self):
        """Load CoNLL-2003 dataset"""
        print("Loading CoNLL-2003 dataset...")
        dataset = load_dataset("conll2003")
        
        # Extract labels
        labels = dataset["train"].features["ner_tags"].feature.names
        self.label_mappings['conll2003'] = {i: label for i, label in enumerate(labels)}
        
        self.datasets['conll2003'] = {
            'train': dataset['train'],
            'validation': dataset['validation'],
            'test': dataset['test'],
            'labels': labels
        }
        print(f"CoNLL-2003 loaded. Labels: {labels}")
        return self.datasets['conll2003']
    
    def load_wikiann(self, language='en'):
        """Load WikiANN dataset for English"""
        print(f"Loading WikiANN ({language}) dataset...")
        dataset = load_dataset("wikiann", language)
        
        # Extract labels
        labels = dataset["train"].features["ner_tags"].feature.names
        self.label_mappings['wikiann'] = {i: label for i, label in enumerate(labels)}
        
        self.datasets['wikiann'] = {
            'train': dataset['train'],
            'validation': dataset['validation'],
            'test': dataset['test'],
            'labels': labels
        }
        print(f"WikiANN ({language}) loaded. Labels: {labels}")
        return self.datasets['wikiann']
    
    def load_wnut17(self):
        """Load WNUT-17 dataset"""
        print("Loading WNUT-17 dataset...")
        dataset = load_dataset("wnut_17")
        
        # Extract labels
        labels = dataset["train"].features["ner_tags"].feature.names
        self.label_mappings['wnut17'] = {i: label for i, label in enumerate(labels)}
        
        self.datasets['wnut17'] = {
            'train': dataset['train'],
            'validation': dataset['validation'],
            'test': dataset['test'],
            'labels': labels
        }
        print(f"WNUT-17 loaded. Labels: {labels}")
        return self.datasets['wnut17']
    
    def get_combined_labels(self):
        """Get all unique labels across datasets"""
        all_labels = set()
        for dataset_name, mapping in self.label_mappings.items():
            all_labels.update(mapping.values())
        return sorted(list(all_labels))

**Data preprocessing for training

In [3]:
class NERDataPreprocessor:
    def __init__(self, tokenizer_name="bert-base-cased"):
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
        
    def align_labels_with_tokens(self, labels, word_ids):
        """Align labels with tokenized words"""
        new_labels = []
        current_word = None
        for word_id in word_ids:
            if word_id != current_word:
                current_word = word_id
                label = -100 if word_id is None else labels[word_id]
                new_labels.append(label)
            elif word_id is None:
                new_labels.append(-100)
            else:
                label = labels[word_id]
                if label % 2 == 1:  # If it's an I- tag
                    new_labels.append(label)
                else:  # If it's a B- tag, change to I-
                    new_labels.append(label + 1 if label != 0 else 0)
        return new_labels
    
    def tokenize_and_align_labels(self, examples, label_all_tokens=True):
        """Tokenize and align labels for BERT-style models"""
        tokenized_inputs = self.tokenizer(
            examples["tokens"], 
            truncation=True, 
            is_split_into_words=True,
            padding=True,
            max_length=512
        )
        
        labels = []
        for i, label in enumerate(examples["ner_tags"]):
            word_ids = tokenized_inputs.word_ids(batch_index=i)
            aligned_labels = self.align_labels_with_tokens(label, word_ids)
            labels.append(aligned_labels)
        
        tokenized_inputs["labels"] = labels
        return tokenized_inputs
    
    def prepare_spacy_data(self, dataset):
        """Prepare data for spaCy training"""
        training_data = []
        
        for example in dataset:
            tokens = example['tokens']
            ner_tags = example['ner_tags']
            
            # Convert to spaCy format
            entities = []
            start_pos = 0
            
            for i, (token, tag) in enumerate(zip(tokens, ner_tags)):
                if tag != 0:  # Not 'O' tag
                    tag_name = dataset.features['ner_tags'].feature.names[tag]
                    if tag_name.startswith('B-'):
                        entity_start = start_pos
                        entity_label = tag_name[2:]
                        entity_end = start_pos + len(token)
                        
                        # Check for I- tags following this B- tag
                        j = i + 1
                        while j < len(ner_tags) and dataset.features['ner_tags'].feature.names[ner_tags[j]].startswith(f'I-{entity_label}'):
                            entity_end = start_pos + len(' '.join(tokens[i:j+1]))
                            j += 1
                        
                        entities.append((entity_start, entity_end, entity_label))
                
                start_pos += len(token) + 1  # +1 for space
            
            text = ' '.join(tokens)
            training_data.append((text, {"entities": entities}))
        
        return training_data

In [4]:
# Load pre-trained spaCy model
try:
    nlp_spacy = spacy.load("en_core_web_trf")
    print("spaCy model loaded successfully")
except OSError:
    print("spaCy model not found. Please install with: python -m spacy download en_core_web_trf")

# Load pre-trained BERT NER pipeline
from transformers import pipeline
bert_ner = pipeline("ner", 
                   model="dslim/bert-base-NER", 
                   tokenizer="dslim/bert-base-NER",
                   aggregation_strategy="simple")
print("BERT NER pipeline loaded successfully")

# Test both models on a sample
test_text = "Apple Inc. is based in Cupertino, California. Tim Cook is the CEO."
print(f"Testing on: '{test_text}'")

# Test spaCy
doc = nlp_spacy(test_text)
print("spaCy entities:", [(ent.text, ent.label_) for ent in doc.ents])

# Test BERT
bert_entities = bert_ner(test_text)
print("BERT entities:", [(ent['word'], ent['entity_group']) for ent in bert_entities])

  model.load_state_dict(torch.load(filelike, map_location=device))


spaCy model loaded successfully


Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


BERT NER pipeline loaded successfully
Testing on: 'Apple Inc. is based in Cupertino, California. Tim Cook is the CEO.'
spaCy entities: [('Apple Inc.', 'ORG'), ('Cupertino', 'GPE'), ('California', 'GPE'), ('Tim Cook', 'PERSON')]
BERT entities: [('Apple Inc', 'ORG'), ('Cupertino', 'LOC'), ('California', 'LOC'), ('Tim Cook', 'PER')]


In [5]:
def load_test_data():
    """Load and process the NER-test.tsv file"""
    test_df = pd.read_csv('datasets/NER-test.tsv', sep='\t')
    
    # Group by sentence_id to reconstruct sentences
    sentences = []
    sentence_data = []
    
    for sentence_id in test_df['sentence_id'].unique():
        sentence_tokens = test_df[test_df['sentence_id'] == sentence_id]
        
        tokens = sentence_tokens['token'].tolist()
        ner_tags = sentence_tokens['BIO_NER_tag'].tolist()
        sentence_text = ' '.join(tokens)
        
        sentences.append(sentence_text)
        sentence_data.append({
            'sentence_id': sentence_id,
            'tokens': tokens,
            'ner_tags': ner_tags,
            'sentence': sentence_text
        })
    
    return sentences, sentence_data

# Load the test data
test_sentences, test_data_structured = load_test_data()

print(f"Loaded {len(test_sentences)} test sentences")
print("First few examples:")

# Show first few examples
for i, data in enumerate(test_data_structured[:3]):
    print(f"Sentence {data['sentence_id']}: {data['sentence']}")
    print(f"True NER tags: {data['ner_tags']}")
    print()

Loaded 15 test sentences
First few examples:
Sentence 0: If you're visiting Paris , make sure to see the Louvre , as they exhibit the Mona Lisa !
True NER tags: ['O', 'O', 'O', 'B-LOCATION', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O', 'O', 'O', 'O', 'B-WORK_OF_ART', 'I-WORK_OF_ART', 'O']

Sentence 1: Amazon , Google and Meta control a huge share of the technology market globally .
True NER tags: ['B-ORG', 'O', 'B-ORG', 'O', 'B-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

Sentence 2: Did you hear Pharoah Sanders recorded an album with Floating Points ?
True NER tags: ['O', 'O', 'O', 'B-PERSON', 'I-PERSON', 'O', 'O', 'O', 'O', 'B-PERSON', 'I-PERSON', 'O']



In [6]:
def apply_spacy_ner(sentences):
    """Apply spaCy NER to sentences and return BIO tags"""
    results = []
    
    for sentence in sentences:
        doc = nlp_spacy(sentence)
        tokens = [token.text for token in doc]
        ner_tags = []
        
        for token in doc:
            if token.ent_type_:
                if token.ent_iob_ == 'B':
                    ner_tags.append(f"B-{token.ent_type_}")
                elif token.ent_iob_ == 'I':
                    ner_tags.append(f"I-{token.ent_type_}")
                else:
                    ner_tags.append('O')
            else:
                ner_tags.append('O')
        
        results.append({'tokens': tokens, 'ner_tags': ner_tags})
    
    return results

def apply_bert_ner(sentences):
    """Apply BERT NER to sentences and return BIO tags"""
    results = []
    
    for sentence in sentences:
        # Get BERT predictions
        entities = bert_ner(sentence)
        
        # Simple tokenization for alignment
        tokens = sentence.split()
        ner_tags = ['O'] * len(tokens)
        
        # Map BERT entities to tokens
        for entity in entities:
            entity_text = entity['word'].replace('##', '')
            entity_label = entity['entity_group']
            
            # Find matching tokens
            for i, token in enumerate(tokens):
                if (entity_text.lower() in token.lower() or 
                    token.lower() in entity_text.lower()):
                    if ner_tags[i] == 'O':
                        ner_tags[i] = f"B-{entity_label}"
                    break
        
        results.append({'tokens': tokens, 'ner_tags': ner_tags})
    
    return results

# Apply both systems to test data
print("Applying spaCy NER...")
spacy_results = apply_spacy_ner(test_sentences)

print("Applying BERT NER...")
bert_results = apply_bert_ner(test_sentences)

print("Both NER systems applied successfully")
print(f"Processed {len(spacy_results)} sentences with spaCy")
print(f"Processed {len(bert_results)} sentences with BERT")

Applying spaCy NER...
Applying BERT NER...
Both NER systems applied successfully
Processed 15 sentences with spaCy
Processed 15 sentences with BERT


In [7]:
def create_comparison_dataframe(test_data_structured, spacy_results, bert_results):
    """Create detailed comparison dataframe"""
    comparison_data = []
    
    for i, (test_data, spacy_result, bert_result) in enumerate(zip(test_data_structured, spacy_results, bert_results)):
        sentence_id = test_data['sentence_id']
        true_tokens = test_data['tokens']
        true_labels = test_data['ner_tags']
        
        spacy_tokens = spacy_result['tokens']
        spacy_labels = spacy_result['ner_tags']
        
        bert_tokens = bert_result['tokens']
        bert_labels = bert_result['ner_tags']
        
        # Align tokens (use true tokens as reference)
        for j, token in enumerate(true_tokens):
            true_label = true_labels[j] if j < len(true_labels) else 'O'
            spacy_label = spacy_labels[j] if j < len(spacy_labels) else 'O'
            bert_label = bert_labels[j] if j < len(bert_labels) else 'O'
            
            comparison_data.append({
                'sentence_id': sentence_id,
                'token': token,
                'true_label': true_label,
                'spacy_pred': spacy_label,
                'bert_pred': bert_label,
                'spacy_correct': spacy_label == true_label,
                'bert_correct': bert_label == true_label,
                'systems_agree': spacy_label == bert_label
            })
    
    return pd.DataFrame(comparison_data)

# Create comparison dataframe
comparison_df = create_comparison_dataframe(test_data_structured, spacy_results, bert_results)

print("SYSTEM COMPARISON RESULTS:")
print("=" * 50)
print(f"Total tokens analyzed: {len(comparison_df)}")
print(f"spaCy accuracy: {comparison_df['spacy_correct'].mean():.4f}")
print(f"BERT accuracy: {comparison_df['bert_correct'].mean():.4f}")
print(f"System agreement: {comparison_df['systems_agree'].mean():.4f}")

print("\nSample Comparison (first 10 tokens):")
print(comparison_df[['token', 'true_label', 'spacy_pred', 'bert_pred', 'spacy_correct', 'bert_correct']].head(10))

# Show disagreements
disagreements = comparison_df[~comparison_df['systems_agree']]
print(f"\nSystem Disagreements ({len(disagreements)} tokens):")
if len(disagreements) > 0:
    print(disagreements[['token', 'true_label', 'spacy_pred', 'bert_pred']].head())

SYSTEM COMPARISON RESULTS:
Total tokens analyzed: 216
spaCy accuracy: 0.8380
BERT accuracy: 0.7454
System agreement: 0.7130

Sample Comparison (first 10 tokens):
      token  true_label spacy_pred bert_pred  spacy_correct  bert_correct
0        If           O          O         O           True          True
1    you're           O          O         O           True          True
2  visiting           O          O         O           True          True
3     Paris  B-LOCATION          O     B-LOC          False         False
4         ,           O      B-GPE         O          False          True
5      make           O          O         O           True          True
6      sure           O          O         O           True          True
7        to           O          O         O           True          True
8       see           O          O         O           True          True
9       the           O          O         O           True          True

System Disagreements (6

In [8]:
def extract_entities_from_bio(tokens, bio_tags):
    """Extract named entities from BIO-tagged tokens"""
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, tag in zip(tokens, bio_tags):
        if tag.startswith('B-'):
            # Save previous entity if exists
            if current_entity and current_tokens:
                entities.append({
                    'text': ' '.join(current_tokens),
                    'label': current_entity,
                    'tokens': current_tokens.copy()
                })
            
            # Start new entity
            current_entity = tag[2:]  # Remove 'B-' prefix
            current_tokens = [token]
            
        elif tag.startswith('I-') and current_entity == tag[2:]:
            # Continue current entity
            current_tokens.append(token)
            
        else:
            # End current entity
            if current_entity and current_tokens:
                entities.append({
                    'text': ' '.join(current_tokens),
                    'label': current_entity,
                    'tokens': current_tokens.copy()
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity and current_tokens:
        entities.append({
            'text': ' '.join(current_tokens),
            'label': current_entity,
            'tokens': current_tokens.copy()
        })
    
    return entities

# Extract entities for each system
print("🔍 NAMED ENTITY EXTRACTION RESULTS")
print("="*60)

for i, (test_data, spacy_result, bert_result) in enumerate(zip(test_data_structured, spacy_results, bert_results)):
    sentence = test_data['sentence']
    true_entities = extract_entities_from_bio(test_data['tokens'], test_data['ner_tags'])
    spacy_entities = extract_entities_from_bio(spacy_result['tokens'], spacy_result['ner_tags'])
    bert_entities = extract_entities_from_bio(bert_result['tokens'], bert_result['ner_tags'])
    
    print(f"\n📝 Sentence {test_data['sentence_id']}: {sentence}")
    print(f"True entities: {[(e['text'], e['label']) for e in true_entities]}")
    print(f"spaCy entities: {[(e['text'], e['label']) for e in spacy_entities]}")
    print(f"BERT entities: {[(e['text'], e['label']) for e in bert_entities]}")
    
    if i >= 4:  # Show first 5 sentences
        break

🔍 NAMED ENTITY EXTRACTION RESULTS

📝 Sentence 0: If you're visiting Paris , make sure to see the Louvre , as they exhibit the Mona Lisa !
True entities: [('Paris', 'LOCATION'), ('Louvre', 'ORG'), ('Mona Lisa', 'WORK_OF_ART')]
spaCy entities: [('Paris', 'GPE'), ('Louvre', 'FAC'), ('the Mona Lisa', 'WORK_OF_ART')]
BERT entities: [('Paris', 'LOC'), ('Louvre', 'ORG'), ('Mona', 'MISC')]

📝 Sentence 1: Amazon , Google and Meta control a huge share of the technology market globally .
True entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]
spaCy entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]
BERT entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]

📝 Sentence 2: Did you hear Pharoah Sanders recorded an album with Floating Points ?
True entities: [('Pharoah Sanders', 'PERSON'), ('Floating Points', 'PERSON')]
spaCy entities: [('Pharoah Sanders', 'PERSON'), ('Floating Points', 'WORK_OF_ART')]
BERT entities: [('Pharoah', 'PER'), ('Sanders', 'PER'), (

In [9]:
def extract_entities_from_bio(tokens, bio_tags):
    """Extract named entities from BIO-tagged tokens"""
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, tag in zip(tokens, bio_tags):
        if tag.startswith('B-'):
            # Save previous entity if exists
            if current_entity and current_tokens:
                entities.append({
                    'text': ' '.join(current_tokens),
                    'label': current_entity
                })
            
            # Start new entity
            current_entity = tag[2:]  # Remove 'B-' prefix
            current_tokens = [token]
            
        elif tag.startswith('I-') and current_entity == tag[2:]:
            # Continue current entity
            current_tokens.append(token)
            
        else:
            # End current entity
            if current_entity and current_tokens:
                entities.append({
                    'text': ' '.join(current_tokens),
                    'label': current_entity
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity and current_tokens:
        entities.append({
            'text': ' '.join(current_tokens),
            'label': current_entity
        })
    
    return entities

# Extract entities for each system
print("NAMED ENTITY EXTRACTION RESULTS")
print("=" * 60)

for i, (test_data, spacy_result, bert_result) in enumerate(zip(test_data_structured, spacy_results, bert_results)):
    sentence = test_data['sentence']
    true_entities = extract_entities_from_bio(test_data['tokens'], test_data['ner_tags'])
    spacy_entities = extract_entities_from_bio(spacy_result['tokens'], spacy_result['ner_tags'])
    bert_entities = extract_entities_from_bio(bert_result['tokens'], bert_result['ner_tags'])
    
    print(f"\nSentence {test_data['sentence_id']}: {sentence}")
    print(f"True entities: {[(e['text'], e['label']) for e in true_entities]}")
    print(f"spaCy entities: {[(e['text'], e['label']) for e in spacy_entities]}")
    print(f"BERT entities: {[(e['text'], e['label']) for e in bert_entities]}")
    
    if i >= 4:  # Show first 5 sentences
        break

NAMED ENTITY EXTRACTION RESULTS

Sentence 0: If you're visiting Paris , make sure to see the Louvre , as they exhibit the Mona Lisa !
True entities: [('Paris', 'LOCATION'), ('Louvre', 'ORG'), ('Mona Lisa', 'WORK_OF_ART')]
spaCy entities: [('Paris', 'GPE'), ('Louvre', 'FAC'), ('the Mona Lisa', 'WORK_OF_ART')]
BERT entities: [('Paris', 'LOC'), ('Louvre', 'ORG'), ('Mona', 'MISC')]

Sentence 1: Amazon , Google and Meta control a huge share of the technology market globally .
True entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]
spaCy entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]
BERT entities: [('Amazon', 'ORG'), ('Google', 'ORG'), ('Meta', 'ORG')]

Sentence 2: Did you hear Pharoah Sanders recorded an album with Floating Points ?
True entities: [('Pharoah Sanders', 'PERSON'), ('Floating Points', 'PERSON')]
spaCy entities: [('Pharoah Sanders', 'PERSON'), ('Floating Points', 'WORK_OF_ART')]
BERT entities: [('Pharoah', 'PER'), ('Sanders', 'PER'), ('Floatin

In [10]:
# Detailed error analysis
print("FINAL ANALYSIS & ERROR PATTERNS")
print("=" * 60)

# Overall performance
spacy_accuracy = comparison_df['spacy_correct'].mean()
bert_accuracy = comparison_df['bert_correct'].mean()
agreement_rate = comparison_df['systems_agree'].mean()

print(f"\nOverall Performance:")
print(f"spaCy accuracy: {spacy_accuracy:.4f}")
print(f"BERT accuracy: {bert_accuracy:.4f}")
print(f"System agreement: {agreement_rate:.4f}")

# Performance by entity type
print(f"\nPerformance by Entity Type:")
for entity_type in ['PERSON', 'ORG', 'LOCATION', 'WORK_OF_ART', 'MISC']:
    entity_mask = comparison_df['true_label'].str.contains(entity_type, na=False)
    entity_tokens = comparison_df[entity_mask]
    
    if len(entity_tokens) > 0:
        spacy_acc = entity_tokens['spacy_correct'].mean()
        bert_acc = entity_tokens['bert_correct'].mean()
        print(f"{entity_type}: spaCy {spacy_acc:.3f}, BERT {bert_acc:.3f} ({len(entity_tokens)} tokens)")

# Error patterns
spacy_errors = comparison_df[~comparison_df['spacy_correct']]
bert_errors = comparison_df[~comparison_df['bert_correct']]

print(f"\nError Summary:")
print(f"spaCy errors: {len(spacy_errors)} tokens")
print(f"BERT errors: {len(bert_errors)} tokens")

if len(spacy_errors) > 0:
    print("\nMost common spaCy error patterns:")
    spacy_error_patterns = spacy_errors.groupby(['true_label', 'spacy_pred']).size().reset_index(name='count')
    print(spacy_error_patterns.sort_values('count', ascending=False).head())

if len(bert_errors) > 0:
    print("\nMost common BERT error patterns:")
    bert_error_patterns = bert_errors.groupby(['true_label', 'bert_pred']).size().reset_index(name='count')
    print(bert_error_patterns.sort_values('count', ascending=False).head())

# Final recommendations
better_system = "spaCy" if spacy_accuracy > bert_accuracy else "BERT"
print(f"\nConclusions:")
print(f"Best performing system: {better_system}")
print(f"Recommendations:")
print("1. Use ensemble approach combining both systems")
print("2. Add post-processing rules for specific domains")
print("3. Consider fine-tuning on domain-specific data")

# Create summary table
summary_data = {
    'System': ['spaCy', 'BERT'],
    'Accuracy': [spacy_accuracy, bert_accuracy],
    'Error_Count': [len(spacy_errors), len(bert_errors)]
}
summary_df = pd.DataFrame(summary_data)
print(f"\nSummary Table:")
print(summary_df.round(4))

FINAL ANALYSIS & ERROR PATTERNS

Overall Performance:
spaCy accuracy: 0.8380
BERT accuracy: 0.7454
System agreement: 0.7130

Performance by Entity Type:
PERSON: spaCy 0.600, BERT 0.000 (25 tokens)
ORG: spaCy 0.462, BERT 0.385 (13 tokens)
LOCATION: spaCy 0.000, BERT 0.000 (5 tokens)
WORK_OF_ART: spaCy 1.000, BERT 0.000 (14 tokens)

Error Summary:
spaCy errors: 35 tokens
BERT errors: 55 tokens

Most common spaCy error patterns:
    true_label spacy_pred  count
7     B-PERSON          O      4
11    I-PERSON   B-PERSON      3
8   I-LOCATION      I-GPE      2
21           O   I-PERSON      2
20           O     I-DATE      2

Most common BERT error patterns:
       true_label bert_pred  count
14       I-PERSON         O      9
16  I-WORK_OF_ART         O      7
5        B-PERSON     B-PER      7
12          I-ORG         O      4
13       I-PERSON     B-PER      4

Conclusions:
Best performing system: spaCy
Recommendations:
1. Use ensemble approach combining both systems
2. Add post-process