In [1]:
!pip install transformers
!pip install transformers[sentencepiece]
!pip install datasets
!pip install evaluate

[0m

# Token classification Task as follows:
**Named entity recognition (NER)**: Find the entities (such as persons, locations, or organizations) in a sentence. This can be formulated as attributing a label to each token by having one class per entity and one class for “no entity.”  
**Part-of-speech tagging (POS)**: Mark each word in a sentence as corresponding to a particular part of speech (such as noun, verb, adjective, etc.).  
**Chunking**: Find the tokens that belong to the same entity. This task (which can be combined with POS or NER) can be formulated as attributing one label (usually B-) to any tokens that are at the beginning of a chunk, another label (usually I-) to tokens that are inside a chunk, and a third label (usually O) to tokens that don’t belong to any chunk.

# we will fine-tune a model (BERT) on a NER task

In [2]:
# Preparing the data
# First things first, we need a dataset suitable for token classification. In this section we will use the CoNLL-2003 dataset, which contains news stories from Reuters.
from datasets import load_dataset

raw_datasets = load_dataset("conll2003")

  0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 14042
    })
    validation: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3251
    })
    test: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3454
    })
})

In [4]:
# Let’s have a look at the first element of the training set:
raw_datasets["train"][0]["tokens"]

['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']

In [5]:
# Since we want to perform named entity recognition, we will look at the NER tags:
raw_datasets["train"][0]["ner_tags"]

[3, 0, 7, 0, 0, 0, 7, 0, 0]

Those are the labels as integers ready for training, but they’re not necessarily useful when we want to inspect the data. Like for text classification, we can access the correspondence between those integers and the label names by looking at the features attribute of our dataset:

In [6]:
ner_feature = raw_datasets["train"].features["ner_tags"]
ner_feature

Sequence(feature=ClassLabel(num_classes=9, names=['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC'], id=None), length=-1, id=None)

In [7]:
# The type of the elements of the sequence is in the feature attribute of this ner_feature
label_names = ner_feature.feature.names
label_names

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

O means the word doesn’t correspond to any entity.  
B-PER/I-PER means the word corresponds to the beginning of/is inside a person entity.  
B-ORG/I-ORG means the word corresponds to the beginning of/is inside an organization entity.  
B-LOC/I-LOC means the word corresponds to the beginning of/is inside a location entity.  
B-MISC/I-MISC means the word corresponds to the beginning of/is inside a miscellaneous entity.  

In [8]:
# Now decoding the labels we saw earlier gives us this:
words = raw_datasets["train"][0]["tokens"]
labels = raw_datasets["train"][0]["ner_tags"]
line1 = ""
line2 = ""
for word, label in zip(words, labels):
    full_label = label_names[label]
    max_length = max(len(word), len(full_label))
    line1 += word + " " * (max_length - len(word) + 1)
    line2 += full_label + " " * (max_length - len(full_label) + 1)

print(line1)
print(line2)

EU    rejects German call to boycott British lamb . 
B-ORG O       B-MISC O    O  O       B-MISC  O    O 


In [9]:
# let’s create our tokenizer object
from transformers import AutoTokenizer

model_checkpoint = "bert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [10]:
# To tokenize a pre-tokenized input, we can use our tokenizer as usual and just add is_split_into_words=True:
inputs = tokenizer(raw_datasets["train"][0]["tokens"], is_split_into_words=True)
inputs.tokens()

['[CLS]',
 'EU',
 'rejects',
 'German',
 'call',
 'to',
 'boycott',
 'British',
 'la',
 '##mb',
 '.',
 '[SEP]']

In [11]:
# because we’re using a fast tokenizer we have access to the 🤗 Tokenizers superpowers, which means we can easily map each token to its corresponding word
inputs.word_ids()

[None, 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, None]

In [12]:
# The first rule we’ll apply is that special tokens get a label of -100. This is because by default -100 is an index that is ignored in the loss function we will use (cross entropy)
def align_labels_with_tokens(labels, word_ids):
    new_labels = []
    current_word = None
    for word_id in word_ids:
        if word_id != current_word:
            # Start of a new word!
            current_word = word_id
            label = -100 if word_id is None else labels[word_id]
            new_labels.append(label)
        elif word_id is None:
            # Special token
            new_labels.append(-100)
        else:
            # Same word as previous token
            label = labels[word_id]
            # If the label is B-XXX we change it to I-XXX
            if label % 2 == 1:
                label += 1
            new_labels.append(label)

    return new_labels

In [13]:
labels = raw_datasets["train"][0]["ner_tags"]
word_ids = inputs.word_ids()
print(labels)
print(align_labels_with_tokens(labels, word_ids))

[3, 0, 7, 0, 0, 0, 7, 0, 0]
[-100, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0, -100]


To take advantage of the speed of our fast tokenizer, it’s best to tokenize lots of texts at the same time, so we’ll write a function that processes a list of examples and use the Dataset.map() method with the option batched=True. 

In [14]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(
        examples["tokens"], truncation=True, is_split_into_words=True
    )
    all_labels = examples["ner_tags"]
    new_labels = []
    for i, labels in enumerate(all_labels):
        word_ids = tokenized_inputs.word_ids(i)
        new_labels.append(align_labels_with_tokens(labels, word_ids))

    tokenized_inputs["labels"] = new_labels
    return tokenized_inputs

In [15]:
tokenized_datasets = raw_datasets.map(
    tokenize_and_align_labels,
    batched=True,
    remove_columns=raw_datasets["train"].column_names,
)

  0%|          | 0/15 [00:00<?, ?ba/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/4 [00:00<?, ?ba/s]

# Fine-tuning the model with the Trainer API

In [16]:
# Data collation
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

In [17]:
# To test this on a few samples, we can just call it on a list of examples from our tokenized training set:
batch = data_collator([tokenized_datasets["train"][i] for i in range(2)])
batch["labels"]

tensor([[-100,    3,    0,    7,    0,    0,    0,    7,    0,    0,    0, -100],
        [-100,    1,    2, -100, -100, -100, -100, -100, -100, -100, -100, -100]])

In [18]:
for i in range(2):
    print(tokenized_datasets["train"][i]["labels"])

[-100, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0, -100]
[-100, 1, 2, -100]


# Metrics

In [19]:
# The traditional framework used to evaluate token classification prediction is seqeval. To use this metric, we first need to install the seqeval library:
!pip install seqeval

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[0m

In [20]:
import evaluate

metric = evaluate.load("seqeval")

This metric does not behave like the standard accuracy: it will actually take the lists of labels as strings, not integers, so we will need to fully decode the predictions and labels before passing them to the metric. Let’s see how it works. First, we’ll get the labels for our first training example:

In [21]:
labels = raw_datasets["train"][0]["ner_tags"]
labels = [label_names[i] for i in labels]
labels

['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']

In [22]:
# We can then create fake predictions for those by just changing the value at index 2:
predictions = labels.copy()
predictions[2] = "O"
metric.compute(predictions=[predictions], references=[labels])

{'MISC': {'precision': 1.0,
  'recall': 0.5,
  'f1': 0.6666666666666666,
  'number': 2},
 'ORG': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1},
 'overall_precision': 1.0,
 'overall_recall': 0.6666666666666666,
 'overall_f1': 0.8,
 'overall_accuracy': 0.8888888888888888}

This compute_metrics() function first takes the argmax of the logits to convert them to predictions (as usual, the logits and the probabilities are in the same order, so we don’t need to apply the softmax). Then we have to convert both labels and predictions from integers to strings. We remove all the values where the label is -100, then pass the results to the metric.compute() method:

In [23]:
import numpy as np


def compute_metrics(eval_preds):
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)

    # Remove ignored index (special tokens) and convert to labels
    true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
    true_predictions = [
        [label_names[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": all_metrics["overall_precision"],
        "recall": all_metrics["overall_recall"],
        "f1": all_metrics["overall_f1"],
        "accuracy": all_metrics["overall_accuracy"],
    }

# Defining the model

In [24]:
id2label = {i: label for i, label in enumerate(label_names)}
label2id = {v: k for k, v in id2label.items()}

In [25]:
from transformers import AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained(
    model_checkpoint,
    id2label=id2label,
    label2id=label2id,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cas

In [26]:
model.config.num_labels

9

# Fine-tuning the model

In [27]:
# If you’re working in a notebook, there’s a convenience function to help you with this:
from huggingface_hub import notebook_login

notebook_login()

# If you aren’t working in a notebook, just type the following line in your terminal:
#huggingface-cli login

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [28]:
from transformers import TrainingArguments

args = TrainingArguments(
    "bert-finetuned-ner",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=3,
    weight_decay=0.01,
    push_to_hub=True,
)

In [29]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)
trainer.train()

/kaggle/working/bert-finetuned-ner is already a clone of https://huggingface.co/Arron/bert-finetuned-ner. Make sure you pull the latest changes with `repo.git_pull()`.
***** Running training *****
  Num examples = 14042
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 5268
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[34m[1mwandb[0m: Currently logged in as: [33mw-s-h-z-d[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.0845,0.062636,0.914606,0.933692,0.924051,0.982693
2,0.0414,0.056099,0.932077,0.949175,0.940549,0.986063
3,0.0198,0.060736,0.931946,0.949512,0.940647,0.986048


***** Running Evaluation *****
  Num examples = 3251
  Batch size = 8
Saving model checkpoint to bert-finetuned-ner/checkpoint-1756
Configuration saved in bert-finetuned-ner/checkpoint-1756/config.json
Model weights saved in bert-finetuned-ner/checkpoint-1756/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner/checkpoint-1756/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner/checkpoint-1756/special_tokens_map.json
tokenizer config file saved in bert-finetuned-ner/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 3251
  Batch size = 8
Saving model checkpoint to bert-finetuned-ner/checkpoint-3512
Configuration saved in bert-finetuned-ner/checkpoint-3512/config.json
Model weights saved in bert-finetuned-ner/checkpoint-3512/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner/checkpoint-3512/tokenizer_config.json
Special tokens file saved in bert-fi

TrainOutput(global_step=5268, training_loss=0.06633575277610901, metrics={'train_runtime': 666.9508, 'train_samples_per_second': 63.162, 'train_steps_per_second': 7.899, 'total_flos': 920831298449616.0, 'train_loss': 0.06633575277610901, 'epoch': 3.0})

In [30]:
trainer.push_to_hub(commit_message="Training complete")

Saving model checkpoint to bert-finetuned-ner
Configuration saved in bert-finetuned-ner/config.json
Model weights saved in bert-finetuned-ner/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner/special_tokens_map.json
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 32.0k/411M [00:00<?, ?B/s]

Upload file runs/Feb08_10-03-47_14c0196b45b7/events.out.tfevents.1675850635.14c0196b45b7.6593.0: 100%|########…

remote: Scanning LFS files for validity...        
remote: LFS file scan complete.        
To https://huggingface.co/Arron/bert-finetuned-ner
   bb50ff3..d26a63d  main -> main

To https://huggingface.co/Arron/bert-finetuned-ner
   d26a63d..d902721  main -> main



'https://huggingface.co/Arron/bert-finetuned-ner/commit/d26a63d9b4c7cf581f535f1c287fd1430071942b'

# A custom training loop

In [31]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(
    tokenized_datasets["train"],
    shuffle=True,
    collate_fn=data_collator,
    batch_size=8,
)
eval_dataloader = DataLoader(
    tokenized_datasets["validation"], collate_fn=data_collator, batch_size=8
)

In [32]:
model = AutoModelForTokenClassification.from_pretrained(
    model_checkpoint,
    id2label=id2label,
    label2id=label2id,
)

Exception in thread SystemMonitor:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/system_monitor.py", line 118, in _start
    asset.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/cpu.py", line 166, in start
    self.metrics_monitor.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/interfaces.py", line 168, in start
    logger.info(f"Started {self._process.name}")
AttributeError: 'NoneType' object has no attribute 'name'

loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.a64a22196690e0e82ead56f

In [33]:
from torch.optim import AdamW

optimizer = AdamW(model.parameters(), lr=2e-5)

In [34]:
from accelerate import Accelerator

accelerator = Accelerator()
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader, eval_dataloader
)

Exception in thread SystemMonitor:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/system_monitor.py", line 118, in _start
    asset.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/cpu.py", line 166, in start
    self.metrics_monitor.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/interfaces.py", line 168, in start
    logger.info(f"Started {self._process.name}")
AttributeError: 'NoneType' object has no attribute 'name'



In [35]:
from transformers import get_scheduler

num_train_epochs = 3
num_update_steps_per_epoch = len(train_dataloader)
num_training_steps = num_train_epochs * num_update_steps_per_epoch

lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps,
)

In [36]:
from huggingface_hub import Repository, get_full_repo_name

model_name = "bert-finetuned-ner-accelerate"
repo_name = get_full_repo_name(model_name)
repo_name

'Arron/bert-finetuned-ner-accelerate'

In [37]:
output_dir = "bert-finetuned-ner-accelerate"
repo = Repository(output_dir, clone_from=repo_name)

/kaggle/working/bert-finetuned-ner-accelerate is already a clone of https://huggingface.co/Arron/bert-finetuned-ner-accelerate. Make sure you pull the latest changes with `repo.git_pull()`.


# Training loop

In [38]:
def postprocess(predictions, labels):
    predictions = predictions.detach().cpu().clone().numpy()
    labels = labels.detach().cpu().clone().numpy()

    # Remove ignored index (special tokens) and convert to labels
    true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
    true_predictions = [
        [label_names[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    return true_labels, true_predictions

In [39]:
from tqdm.auto import tqdm
import torch

progress_bar = tqdm(range(num_training_steps))

for epoch in range(num_train_epochs):
    # Training
    model.train()
    for batch in train_dataloader:
        outputs = model(**batch)
        loss = outputs.loss
        accelerator.backward(loss)

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

    # Evaluation
    model.eval()
    for batch in eval_dataloader:
        with torch.no_grad():
            outputs = model(**batch)

        predictions = outputs.logits.argmax(dim=-1)
        labels = batch["labels"]

        # Necessary to pad predictions and labels for being gathered
        predictions = accelerator.pad_across_processes(predictions, dim=1, pad_index=-100)
        labels = accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

        predictions_gathered = accelerator.gather(predictions)
        labels_gathered = accelerator.gather(labels)

        true_predictions, true_labels = postprocess(predictions_gathered, labels_gathered)
        metric.add_batch(predictions=true_predictions, references=true_labels)

    results = metric.compute()
    print(
        f"epoch {epoch}:",
        {
            key: results[f"overall_{key}"]
            for key in ["precision", "recall", "f1", "accuracy"]
        },
    )

    # Save and upload
    accelerator.wait_for_everyone()
    unwrapped_model = accelerator.unwrap_model(model)
    unwrapped_model.save_pretrained(output_dir, save_function=accelerator.save)
    if accelerator.is_main_process:
        tokenizer.save_pretrained(output_dir)
        repo.push_to_hub(
            commit_message=f"Training in progress epoch {epoch}", blocking=False
        )

Exception in thread SystemMonitor:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/system_monitor.py", line 118, in _start
    asset.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/cpu.py", line 166, in start
    self.metrics_monitor.start()
  File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/internal/system/assets/interfaces.py", line 168, in start
    logger.info(f"Started {self._process.name}")
AttributeError: 'NoneType' object has no attribute 'name'



  0%|          | 0/5268 [00:00<?, ?it/s]

Configuration saved in bert-finetuned-ner-accelerate/config.json


epoch 0: {'precision': 0.9335240659710535, 'recall': 0.9031260175838489, 'f1': 0.9180734856007944, 'accuracy': 0.9830605757343851}


Model weights saved in bert-finetuned-ner-accelerate/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner-accelerate/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner-accelerate/special_tokens_map.json
Several commits (2) will be pushed upstream.
Configuration saved in bert-finetuned-ner-accelerate/config.json


epoch 1: {'precision': 0.9432850891955571, 'recall': 0.9197571381686905, 'f1': 0.9313725490196079, 'accuracy': 0.9837228468829105}


Model weights saved in bert-finetuned-ner-accelerate/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner-accelerate/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner-accelerate/special_tokens_map.json
Configuration saved in bert-finetuned-ner-accelerate/config.json


epoch 2: {'precision': 0.9476607202961965, 'recall': 0.9263036683665077, 'f1': 0.9368604941352633, 'accuracy': 0.9860187201977983}


Model weights saved in bert-finetuned-ner-accelerate/pytorch_model.bin
tokenizer config file saved in bert-finetuned-ner-accelerate/tokenizer_config.json
Special tokens file saved in bert-finetuned-ner-accelerate/special_tokens_map.json


In [40]:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(output_dir, save_function=accelerator.save)

Configuration saved in bert-finetuned-ner-accelerate/config.json
Model weights saved in bert-finetuned-ner-accelerate/pytorch_model.bin


# Using the fine-tuned model

In [41]:
from transformers import pipeline

# Replace this with your own checkpoint
model_checkpoint = "huggingface-course/bert-finetuned-ner"
token_classifier = pipeline(
    "token-classification", model=model_checkpoint, aggregation_strategy="simple"
)
token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn.")

https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpvmohmmxu


Downloading:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/89b5cbe52d0bb00707c92604aa32923af1e03e431c6dc7755afa8e12736b8611.61f3554ca83a9cd2ab0687ff37a7d0f4065bc064c4f297e5d2d42e322e4c5f26
creating metadata file for /root/.cache/huggingface/transformers/89b5cbe52d0bb00707c92604aa32923af1e03e431c6dc7755afa8e12736b8611.61f3554ca83a9cd2ab0687ff37a7d0f4065bc064c4f297e5d2d42e322e4c5f26
loading configuration file https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/89b5cbe52d0bb00707c92604aa32923af1e03e431c6dc7755afa8e12736b8611.61f3554ca83a9cd2ab0687ff37a7d0f4065bc064c4f297e5d2d42e322e4c5f26
Model config BertConfig {
  "_name_or_path": "huggingface-course/bert-finetuned-ner",
  "architectures": [
    "BertForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpoint

Downloading:   0%|          | 0.00/411M [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/53be13866b5e6a7bf270c198a504ba00a1d6c99765a5425c8d8a6f67474c59db.34b5567d0f0878a5afa5157f6e9b2fe7742287d066ae18b16a646cba224cb46f
creating metadata file for /root/.cache/huggingface/transformers/53be13866b5e6a7bf270c198a504ba00a1d6c99765a5425c8d8a6f67474c59db.34b5567d0f0878a5afa5157f6e9b2fe7742287d066ae18b16a646cba224cb46f
loading weights file https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/53be13866b5e6a7bf270c198a504ba00a1d6c99765a5425c8d8a6f67474c59db.34b5567d0f0878a5afa5157f6e9b2fe7742287d066ae18b16a646cba224cb46f
All model checkpoint weights were used when initializing BertForTokenClassification.

All the weights of BertForTokenClassification were initialized from the model checkpoint at huggingface-course/bert-finetuned-ner.
If your task 

Downloading:   0%|          | 0.00/320 [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/b0df7b2f0fed938ea1c03e3bfce55b08731d98d1eb6ca196178bfeb9203c7507.0bbe47aa0e39b09ed05a95f7d42a27299232ce8e9ef28608e8f8a1cb57a74c0a
creating metadata file for /root/.cache/huggingface/transformers/b0df7b2f0fed938ea1c03e3bfce55b08731d98d1eb6ca196178bfeb9203c7507.0bbe47aa0e39b09ed05a95f7d42a27299232ce8e9ef28608e8f8a1cb57a74c0a
https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmps1h3rivv


Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/dbadb9cd77a95bc4e921c9457f6c9f87f9654e89c139503e43f3c6abd4aef018.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
creating metadata file for /root/.cache/huggingface/transformers/dbadb9cd77a95bc4e921c9457f6c9f87f9654e89c139503e43f3c6abd4aef018.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpn9nytcfh


Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/450ab56275366591009a03ebe21bfa2523a89a590ac2e4b569920025b849ecf5.1f9d100b22551a7009fb51f1fadb9158af5db04f4c188aceecaa745a1917c983
creating metadata file for /root/.cache/huggingface/transformers/450ab56275366591009a03ebe21bfa2523a89a590ac2e4b569920025b849ecf5.1f9d100b22551a7009fb51f1fadb9158af5db04f4c188aceecaa745a1917c983
https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprv_4u3yk


Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

storing https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/1f7a04f6385a04a9c60686046244d8daaa06489d154d6523ca28c5d8430c74c0.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
creating metadata file for /root/.cache/huggingface/transformers/1f7a04f6385a04a9c60686046244d8daaa06489d154d6523ca28c5d8430c74c0.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
loading file https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/dbadb9cd77a95bc4e921c9457f6c9f87f9654e89c139503e43f3c6abd4aef018.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
loading file https://huggingface.co/huggingface-course/bert-finetuned-ner/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/450ab56275366591009a03ebe21bfa2523a89a590ac2e4b569920025b849ecf5.1f9d100b22551a7009fb5

[{'entity_group': 'PER',
  'score': 0.9988506,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9647625,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9986118,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]