## 1. BERT (Bidirectional Encoder Representations)
https://www.analyticsvidhya.com/blog/2022/09/fine-tuning-bert-with-masked-language-modeling/
- Deeper encoder stack with transformer architecture.
- 
<div>
<img src="03_images/04_bert_01.png" width="500">
</div>


## 2. MLM and NSP
- MLM is pre-training or adaptation method. Some # of tokens are masked and the model is trained to predict the masked token.
- NSP is that the model is trained with classification method. It is wether the second sentence is the next sentence or not.

In [1]:
import torch
# from transformers.utils import logging
# logging.enable_progress_bar()
# from transformers import logging

# logging.set_verbosity_error()

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import pandas as pd

df = pd.read_csv('TalkFile_ner_2.csv.csv').iloc[:300,:]
df['Tag'] = df['Tag'].apply(lambda x: eval(x))

In [3]:
list_all_tag = df.Tag.to_list()

In [4]:
from itertools import chain
list_labels = ['O'] + [i for i in list(set(chain.from_iterable(list_all_tag))) if i !='O']
label2ind = {}
ind2label = {}
for ind,i in enumerate(list_labels):
    label2ind[i]=ind
    ind2label[ind]=i

In [5]:
label2ind

{'O': 0,
 'I-geo': 1,
 'I-gpe': 2,
 'I-art': 3,
 'B-geo': 4,
 'B-per': 5,
 'B-gpe': 6,
 'I-tim': 7,
 'B-eve': 8,
 'B-org': 9,
 'I-org': 10,
 'B-art': 11,
 'I-per': 12,
 'B-tim': 13,
 'I-nat': 14,
 'B-nat': 15,
 'I-eve': 16}

In [6]:
# df['Sentence'].to_list()
labels_ind_list = df['Tag'].apply(lambda x: 
                [label2ind[i] for i in x]
               ).to_list()

text_list = df['Sentence'].apply(lambda x:x.split(' ')).to_list()

data_dict = {'id':list(range(len(text_list))),'tokens':text_list,'ner_tags':labels_ind_list}


In [7]:
new_df = pd.DataFrame(data_dict)
new_df.head()

Unnamed: 0,id,tokens,ner_tags
0,0,"[Thousands, of, demonstrators, have, marched, ...","[0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 0, 0, ..."
1,1,"[Families, of, soldiers, killed, in, the, conf...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,2,"[They, marched, from, the, Houses, of, Parliam...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 1, 0]"
3,3,"[Police, put, the, number, of, marchers, at, 1...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
4,4,"[The, protest, comes, on, the, eve, of, the, a...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 9, ..."


In [8]:
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer

model = AutoModelForTokenClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=17, id2label=ind2label, label2id=label2ind
)
for name, param in model.named_parameters():
#     print(name)
    if name.startswith("distilbert.embeddings"):
        param.requires_grad = False

Some weights of the model checkpoint at distilbert/distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You s

In [9]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)

    labels = []
    for i, label in enumerate(examples[f"ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:  # Set the special tokens to -100.
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:  # Only label the first token of a given word.

                label_ids.append(label[word_idx])

            else:
                label_ids.append(-100)
            previous_word_idx = word_idx
        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs


In [10]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

In [11]:
from sklearn.model_selection import train_test_split
train_df,test_df = train_test_split(new_df,test_size=0.2,random_state=42)

In [12]:
import datasets
dataset_dict = datasets.DatasetDict()
dataset_dict['train'] = datasets.Dataset.from_pandas(train_df)
dataset_dict['test'] = datasets.Dataset.from_pandas(test_df)



tokenized_dataset = dataset_dict.map(tokenize_and_align_labels, batched=True)

Map:   0%|                         | 0/240 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
                                                                  

In [13]:
example = tokenized_dataset['train'][0]


In [14]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
import evaluate

seqeval = evaluate.load("seqeval")
import numpy as np

labels = [ind2label[i] for i in example[f"ner_tags"]]


def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    true_predictions = [
        [ind2label[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [ind2label[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = seqeval.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

In [15]:
training_args = TrainingArguments(
    output_dir=".",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=7,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

The following columns in the training set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 240
  Num Epochs = 7
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 105


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.819988,0.0,0.0,0.0,0.851304
2,No log,0.658659,0.0,0.0,0.0,0.851304
3,No log,0.506105,0.0,0.0,0.0,0.851304
4,No log,0.404733,0.491379,0.398601,0.440154,0.894996
5,No log,0.35348,0.47619,0.48951,0.482759,0.909091
6,No log,0.327936,0.493421,0.524476,0.508475,0.916138
7,No log,0.32124,0.471338,0.517483,0.493333,0.917548


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 60
  Batch size = 16
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to ./checkpoint-15
Configuration saved in ./checkpoint-15/config.json
Model weights saved in ./checkpoint-15/pytorch_model.bin
tokenizer config file saved in ./checkpoint-15/tokenizer_config.json
Special tokens file saved in ./checkpoint-15/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_t

TrainOutput(global_step=105, training_loss=0.6173745291573661, metrics={'train_runtime': 105.4581, 'train_samples_per_second': 15.93, 'train_steps_per_second': 0.996, 'total_flos': 18394411378944.0, 'train_loss': 0.6173745291573661, 'epoch': 7.0})

In [16]:

def tokenize_and_align_labels2(examples):
    tokenized_inputs = tokenizer2(examples["tokens"], truncation=True, is_split_into_words=True)

    labels = []
    for i, label in enumerate(examples[f"ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:  # Set the special tokens to -100.
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:  # Only label the first token of a given word.

                label_ids.append(label[word_idx])

            else:
                label_ids.append(-100)
            previous_word_idx = word_idx
        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

model2 = AutoModelForTokenClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=17, id2label=ind2label, label2id=label2ind
)

for name, param in model2.named_parameters():
#     print(name)
    if name.startswith("distilbert.embeddings"):
        param.requires_grad = False

tokenizer2 = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
data = new_df['tokens'].to_list()
tokenizer2.train_new_from_iterator(data,vocab_size=tokenizer.vocab_size)
# Save the trained tokenizer
tokenizer2.save_pretrained('distilbert_new')
tokenizer2 = AutoTokenizer.from_pretrained("distilbert_new")

import datasets
dataset_dict = datasets.DatasetDict()
dataset_dict['train'] = datasets.Dataset.from_pandas(train_df)
dataset_dict['test'] = datasets.Dataset.from_pandas(test_df)

tokenized_dataset = dataset_dict.map(tokenize_and_align_labels2, batched=True)

training_args = TrainingArguments(
    output_dir=".",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=7,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
)
data_collator2 = DataCollatorForTokenClassification(tokenizer=tokenizer2)
trainer = Trainer(
    model=model2,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer2,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

loading configuration file https://huggingface.co/distilbert/distilbert-base-uncased/resolve/main/config.json from cache at /Users/tchun/.cache/huggingface/transformers/9156cd487ebc07b22755262799b39fcdc0d5ae65bb62a1c8dc21ebe3f74bbf58.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert/distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "O",
    "1": "I-geo",
    "2": "I-gpe",
    "3": "I-art",
    "4": "B-geo",
    "5": "B-per",
    "6": "B-gpe",
    "7": "I-tim",
    "8": "B-eve",
    "9": "B-org",
    "10": "I-org",
    "11": "B-art",
    "12": "I-per",
    "13": "B-tim",
    "14": "I-nat",
    "15": "B-nat",
    "16": "I-eve"
  },
  "initializer_range": 0.02,
  "label2id": {
    "B-art": 11,
    "B-eve": 8,
    "B-geo": 4,
    "B-gpe": 6,
    "B-nat": 






Map:   0%|                         | 0/240 [00:00<?, ? examples/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
PyTorch: setting up devices                                       
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 240
  Num Epochs = 7
  Instantaneous batch size per device = 16
  Total train ba

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.85656,0.0,0.0,0.0,0.851304
2,No log,0.683308,0.0,0.0,0.0,0.851304
3,No log,0.532425,1.0,0.006993,0.013889,0.852008
4,No log,0.41518,0.369565,0.237762,0.289362,0.881607
5,No log,0.349942,0.508333,0.426573,0.463878,0.902044
6,No log,0.319192,0.575758,0.531469,0.552727,0.917548
7,No log,0.311327,0.571429,0.531469,0.550725,0.918252


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 60
  Batch size = 16
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to ./checkpoint-15
Configuration saved in ./checkpoint-15/config.json
Model weights saved in ./checkpoint-15/pytorch_model.bin
tokenizer config file saved in ./checkpoint-15/tokenizer_config.json
Special tokens file saved in ./checkpoint-15/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_t

TrainOutput(global_step=105, training_loss=0.6600258963448661, metrics={'train_runtime': 106.0596, 'train_samples_per_second': 15.84, 'train_steps_per_second': 0.99, 'total_flos': 18394411378944.0, 'train_loss': 0.6600258963448661, 'epoch': 7.0})

### M3(MLM) + Trained_Tokenizer

In [17]:
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorForLanguageModeling,DataCollatorForWholeWordMask


class TokenizedSentencesDataset:
  def __init__(self, sentences, tokenizer, max_length, cache_tokenization=False):
      self.tokenizer = tokenizer
      self.sentences = sentences
      self.max_length = max_length
      self.cache_tokenization = cache_tokenization

  def __getitem__(self, item):
      if not self.cache_tokenization:
          return self.tokenizer(self.sentences[item], add_special_tokens=True, truncation=True, max_length=self.max_length, return_special_tokens_mask=True)

      if isinstance(self.sentences[item], str):
          self.sentences[item] = self.tokenizer(self.sentences[item], add_special_tokens=True, truncation=True, max_length=self.max_length, return_special_tokens_mask=True)
      return self.sentences[item]

  def __len__(self):
      return len(self.sentences)
max_length = 100
mlm_prob=0.15
train_dataset = TokenizedSentencesDataset(df['Sentence'].to_list()[:260], tokenizer2, max_length)
dev_dataset = TokenizedSentencesDataset(df['Sentence'].to_list()[260:], tokenizer2, max_length, cache_tokenization=True) if len(df['Sentence'].to_list()[:260]) > 0 else None


do_whole_word_mask = True
if do_whole_word_mask:
  data_collator = DataCollatorForWholeWordMask(tokenizer=tokenizer2, mlm=True, mlm_probability=mlm_prob)
else:
  data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer2, mlm=True, mlm_probability=mlm_prob)

In [18]:
from transformers import Trainer, TrainingArguments,AutoModelForMaskedLM
model3 = AutoModelForMaskedLM.from_pretrained("distilbert/distilbert-base-uncased")
training_args = TrainingArguments(
    output_dir= ".",
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_gpu_train_batch_size= 16,
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model3,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=dev_dataset
)

trainer.train()

loading configuration file https://huggingface.co/distilbert/distilbert-base-uncased/resolve/main/config.json from cache at /Users/tchun/.cache/huggingface/transformers/9156cd487ebc07b22755262799b39fcdc0d5ae65bb62a1c8dc21ebe3f74bbf58.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
  "_name_or_path": "distilbert/distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.21.2",
  "vocab_size": 30522
}

loading weights file https://huggingface.co/distilbert/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /Users/tchu

Step,Training Loss




Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=34, training_loss=2.4421227399040673, metrics={'train_runtime': 51.3394, 'train_samples_per_second': 10.129, 'train_steps_per_second': 0.662, 'total_flos': 5733270001152.0, 'train_loss': 2.4421227399040673, 'epoch': 2.0})

In [19]:
model3.save_pretrained('./saved_model3')

Configuration saved in ./saved_model3/config.json
Model weights saved in ./saved_model3/pytorch_model.bin


In [20]:
model4 = AutoModelForTokenClassification.from_pretrained(
    'saved_model3', num_labels=17, id2label=ind2label, label2id=label2ind
)

loading configuration file saved_model3/config.json
Model config DistilBertConfig {
  "_name_or_path": "saved_model3",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "O",
    "1": "I-geo",
    "2": "I-gpe",
    "3": "I-art",
    "4": "B-geo",
    "5": "B-per",
    "6": "B-gpe",
    "7": "I-tim",
    "8": "B-eve",
    "9": "B-org",
    "10": "I-org",
    "11": "B-art",
    "12": "I-per",
    "13": "B-tim",
    "14": "I-nat",
    "15": "B-nat",
    "16": "I-eve"
  },
  "initializer_range": 0.02,
  "label2id": {
    "B-art": 11,
    "B-eve": 8,
    "B-geo": 4,
    "B-gpe": 6,
    "B-nat": 15,
    "B-org": 9,
    "B-per": 5,
    "B-tim": 13,
    "I-art": 3,
    "I-eve": 16,
    "I-geo": 1,
    "I-gpe": 2,
    "I-nat": 14,
    "I-org": 10,
    "I-per": 12,
    "I-tim": 7,
    "O": 0
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n

In [None]:



for name, param in model4.named_parameters():
#     print(name)
    if name.startswith("distilbert.embeddings"):
        param.requires_grad = False

import datasets
dataset_dict = datasets.DatasetDict()
dataset_dict['train'] = datasets.Dataset.from_pandas(train_df)
dataset_dict['test'] = datasets.Dataset.from_pandas(test_df)

tokenized_dataset = dataset_dict.map(tokenize_and_align_labels2, batched=True)

training_args = TrainingArguments(
    output_dir=".",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=7,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
)

trainer = Trainer(
    model=model4,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer2,
    data_collator=data_collator2,
    compute_metrics=compute_metrics,
)

trainer.train()

PyTorch: setting up devices                                       
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 240
  Num Epochs = 7
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 105


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,No log,0.833162,0.0,0.0,0.0,0.851304
2,No log,0.657969,0.0,0.0,0.0,0.851304
3,No log,0.49565,1.0,0.006993,0.013889,0.852008
4,No log,0.384723,0.514019,0.384615,0.44,0.892882
5,No log,0.333579,0.519737,0.552448,0.535593,0.914729
6,No log,0.309591,0.559748,0.622378,0.589404,0.926709


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_tags, id, __index_level_0__, tokens are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 60
  Batch size = 16
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to ./checkpoint-15
Configuration saved in ./checkpoint-15/config.json
Model weights saved in ./checkpoint-15/pytorch_model.bin
tokenizer config file saved in ./checkpoint-15/tokenizer_config.json
Special tokens file saved in ./checkpoint-15/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, id, __index_level_0__, tokens. If ner_t

## 3. GPL
- a semi-supervised learning technique that combines labeled data with pseudo-labeled data generated by a model to improve performance and leverage unlabeled data effectively.
- Similar to NSP, but we generate the unlabeled data using the initial model and label it based on the confidence score or threshold and perform classification.

## 4.Reinforcement Learning
<div>
<img src="03_images/04_rl_01.png" width="500">
</div>
Reinforcement learning (RL) involves an agent interacting with an environment, taking actions in states to maximize cumulative rewards. The agent learns from feedback in the form of rewards received after each action, adjusting its decision-making policies to improve performance over time. RL aims to find an optimal strategy (policy) for the agent to make decisions that lead to the highest long-term rewards.