## 1. 분류 문제
이번 문제에서는 KLUE Benchmark에서 Natural Language Inference를 하시게 됩니다 (https://klue-benchmark.com/tasks/68/overview/description). 데이터 섹션 (https://klue-benchmark.com/tasks/68/data/description) 에 설명돼 있듯이, 총 6개의 도메인이 포함돼 있습니다. 저희는 Multilingual BERT-base 모델이 Domain Adaptation (https://en.wikipedia.org/wiki/Domain_adaptation) 에 얼마나 효율적인지 보려고 합니다. `Airbnb` 를 target domain으로 하여, 다른 domain train에 학습 시키고 target domain의 validation에서 성능을 측정해 주세요. 두번째로는 target domain train에만 학습하고 validation에서 성능을 측정해서 두 수치를 비교해 주세요.

Library는 자유롭게 사용이 가능합니다만, Hugging Face Library 를 추천드립니다.

In [None]:
!pip install -q datasets transformers

[K     |████████████████████████████████| 441 kB 4.7 MB/s 
[K     |████████████████████████████████| 5.3 MB 89.0 MB/s 
[K     |████████████████████████████████| 212 kB 85.9 MB/s 
[K     |████████████████████████████████| 115 kB 83.6 MB/s 
[K     |████████████████████████████████| 163 kB 89.4 MB/s 
[K     |████████████████████████████████| 127 kB 90.0 MB/s 
[K     |████████████████████████████████| 7.6 MB 73.7 MB/s 
[?25h

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# KLUE NLI dataset
from datasets import load_dataset
klue_nli = load_dataset('klue', 'nli')
print(klue_nli['train'][0])

Downloading builder script:   0%|          | 0.00/23.3k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/22.7k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/15.9k [00:00<?, ?B/s]

Downloading and preparing dataset klue/nli (download: 1.20 MiB, generated: 6.10 MiB, post-processed: Unknown size, total: 7.30 MiB) to /root/.cache/huggingface/datasets/klue/nli/1.0.0/e0fc3bc3de3eb03be2c92d72fd04a60ecc71903f821619cb28ca0e1e29e4233e...


Downloading data:   0%|          | 0.00/1.26M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24998 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3000 [00:00<?, ? examples/s]

Dataset klue downloaded and prepared to /root/.cache/huggingface/datasets/klue/nli/1.0.0/e0fc3bc3de3eb03be2c92d72fd04a60ecc71903f821619cb28ca0e1e29e4233e. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

{'guid': 'klue-nli-v1_train_00000', 'source': 'NSMC', 'premise': '힛걸 진심 최고다 그 어떤 히어로보다 멋지다', 'hypothesis': '힛걸 진심 최고로 멋지다.', 'label': 0}


In [None]:
# Multilingual BERT base model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = 'bert-base-multilingual-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/625 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/996k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/714M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model ch

# Unsupervised Domain Adaptation through Language Modeling

UDALM (KarouZos et al. https://doi.org/10.48550/arXiv.2104.07078)
 의 아이디어를 일부 참고하여 Multilingual BERT를 MLM을 이용하여 Airbnb data로 Domain Pretraining을 수행하고, 이후 Fine-Tuning에서 Natural Language Inference 를 수행

## *Pretraining*

### Prepare the Dataset

In [None]:
import pandas as pd
import numpy as np
import multiprocessing
import os
import glob
import math

from sklearn.model_selection import train_test_split
from datasets import Dataset, load_metric, ClassLabel, Sequence
from transformers import BertForMaskedLM, DataCollatorForLanguageModeling, Trainer, TrainingArguments,AutoModelForMaskedLM,AutoTokenizer

In [None]:
klue_nli.keys()

dict_keys(['train', 'validation'])

In [None]:
# Airbnb 를 target domain으로 하여, 다른 domain train에 학습 시키고 target domain의 validation에서 성능을 측정하기 위해 데이터셋 가공

## KLUE dataset의 문장들을 모두 사용하기 위해서 DataFrame 통합 후 가공
df= pd.concat([pd.DataFrame.from_dict(klue_nli['train']), pd.DataFrame.from_dict(klue_nli['validation'])])

len(df) == len(klue_nli['train']) + len(klue_nli['validation'])

True

In [None]:
df.head()

Unnamed: 0,guid,source,premise,hypothesis,label
0,klue-nli-v1_train_00000,NSMC,힛걸 진심 최고다 그 어떤 히어로보다 멋지다,힛걸 진심 최고로 멋지다.,0
1,klue-nli-v1_train_00001,NSMC,100분간 잘껄 그래도 소닉붐땜에 2점준다,100분간 잤다.,2
2,klue-nli-v1_train_00002,NSMC,100분간 잘껄 그래도 소닉붐땜에 2점준다,소닉붐이 정말 멋있었다.,1
3,klue-nli-v1_train_00003,NSMC,100분간 잘껄 그래도 소닉붐땜에 2점준다,100분간 자는게 더 나았을 것 같다.,1
4,klue-nli-v1_train_00004,airbnb,101빌딩 근처에 나름 즐길거리가 많습니다.,101빌딩 근처에서 즐길거리 찾기는 어렵습니다.,2


In [None]:
# Source Domain의 데이터를 이용하여 Multilingual Bert 학습을 할 것이기 때문에 Sourcd domain 만을 이용하여 df 형성
source_df = df.loc[df["source"] != "airbnb"]
source_df["source"].unique()

array(['NSMC', 'wikipedia', 'wikinews', 'policy', 'wikitree'],
      dtype=object)

중복된 문장이 보이므로, Source 내에서 중복된 문장을 삭제 후, premise와 hypothesis 컬럼을 병합하여 Domain Pretraining으로 사용할 예정

In [None]:
len(source_df[source_df['premise'].duplicated()])

14997

In [None]:
len(source_df[source_df['hypothesis'].duplicated()])

5

In [None]:
df_premise = df.drop_duplicates(['premise'], keep = 'first')['premise']
df_hypothesis = df.drop_duplicates(['hypothesis'], keep = 'first')['hypothesis']
print("Check premise duplicates : ", df_premise.duplicated().sum() ,"\nCheck data length", len(df_premise), "\n")
print("Check hypothesis duplicates : ", df_hypothesis.duplicated().sum() ,"\nCheck data length", len(df_hypothesis))

Check premise duplicates :  0 
Check data length 9387 

Check hypothesis duplicates :  0 
Check data length 27937


In [None]:
source_domain_df = pd.DataFrame(pd.concat([df_premise, df_hypothesis]), columns=['text'])
len(source_domain_df) == len(df_premise) + len(df_hypothesis)

True

In [None]:
# HYPERPARAMS
SEED_SPLIT = 0
SEED_TRAIN = 0


In [None]:
#Conver to Dataset object
# source: https://discuss.huggingface.co/t/from-pandas-dataframe-to-huggingface-dataset/9322


df_source_train, df_source_valid = train_test_split(
    source_domain_df, test_size=0.15, random_state = SEED_SPLIT
)

print(len(df_source_train), len(df_source_valid))

train_dataset = Dataset.from_pandas(df_source_train[['text']])
valid_dataset = Dataset.from_pandas(df_source_valid[['text']])

31725 5599


### Tokenize

In [None]:
max_len = np.max(df_source_train['text'].str.len())
print(max_len)

103


In [None]:
#추가하는 뎅리터셋의 최대 길이 103
MAX_SEQ_LEN = 128
TRAIN_BATCH_SIZE = 16
EVAL_BATCH_SIZE = 16
LEARNING_RATE = 2e-5 
LR_WARMUP_STEPS = 100
WEIGHT_DECAY = 0.01


In [None]:
train_dataset = Dataset.from_pandas(df_source_train[['text']])
valid_dataset = Dataset.from_pandas(df_source_valid[['text']])

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=MAX_SEQ_LEN)
model = AutoModelForMaskedLM.from_pretrained(model_name)

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
#source https://gist.github.com/March-08/1f61608d0ff8f014ecf2d1d3294c2fb3
def tokenize_function(row):
    return tokenizer(
        row['text'],
        padding='max_length',
        truncation=True,
        max_length=MAX_SEQ_LEN,
        return_special_tokens_mask=True)
  
column_names = train_dataset.column_names

train_dataset = train_dataset.map(
    tokenize_function,
    batched=True,
    # num_proc=multiprocessing.cpu_count(),
    remove_columns=column_names,
)

valid_dataset = valid_dataset.map(
    tokenize_function,
    batched=True,
    # num_proc=multiprocessing.cpu_count(),
    remove_columns=column_names,
)

  0%|          | 0/32 [00:00<?, ?ba/s]

  0%|          | 0/6 [00:00<?, ?ba/s]

### Training

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

In [None]:

steps_per_epoch = int(len(train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./bert-KLUE-NLI',
    logging_dir='./LMlogs',             
    num_train_epochs=2,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    warmup_steps=LR_WARMUP_STEPS,
    save_steps=steps_per_epoch,
    save_total_limit=3,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='loss', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
)


In [None]:
trainer.train()
trainer.save_model("./model") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 31725
  Num Epochs = 2
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 3966
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss


KeyboardInterrupt: ignored

### Perplexity 

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False, do_lower_case=True)
model = AutoModelForMaskedLM.from_pretrained(model_name)

trainer = Trainer(
  model=model,
  data_collator=data_collator,
  #train_dataset=tokenized_dataset_2['train'],
  eval_dataset=valid_dataset,
  tokenizer=tokenizer,
  )

eval_results = trainer.evaluate()

print('Evaluation results: ', eval_results)
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.3f}")
print('----------------\n')

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
The following columns in the evaluation set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5599
  Batch size = 8


Evaluation results:  {'eval_loss': 2.8086740970611572, 'eval_runtime': 77.0522, 'eval_samples_per_second': 72.665, 'eval_steps_per_second': 9.085}
Perplexity: 16.588
----------------



In [None]:
path = "./model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath, use_fast = False, do_lower_case=True)
  model = AutoModelForMaskedLM.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    data_collator=data_collator,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
    )
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Perplexity: {math.exp(eval_results['eval_loss']):.3f}")
  print('----------------\n')

Model:  /content/drive/MyDrive/Supercoder/model


loading file vocab.txt
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/model/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/model",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.23.1"

Evaluation results:  {'eval_loss': 1.930310845375061, 'eval_runtime': 24.9389, 'eval_samples_per_second': 224.509, 'eval_steps_per_second': 28.069}
Perplexity: 6.892
----------------



## Fine-Tuning

In [None]:
import glob
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification,AutoConfig


### HyperParameter

In [None]:
#HyperParameter, Variables
#KLUE NLI에는 총 3개의 클래스가 존재
NUM_LABELS = 3
MAX_SEQ_LEN = 128
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 32
LEARNING_RATE = 2e-5
WEIGHT_DECAY = 0.01
SEED_TRAIN = 0
SEED_SPLIT = 0

In [None]:
path = "./model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  config = AutoConfig.from_pretrained(modelpath)
  config.num_labels = 3
  model = AutoModelForSequenceClassification.from_pretrained(modelpath, config=config)


In [None]:
train_df = pd.DataFrame.from_dict(klue_nli['train'])
val_df =  pd.DataFrame.from_dict(klue_nli['validation'])

(1) Airbnb 를 target domain으로 하여, 다른 domain train에 학습 시키고 target domain의 validation에서 성능을 측정 

(2) 두번째로는 target domain train에만 학습하고 validation에서 성능을 측정해서 두 수치를 비교

In [None]:
print(train_df.info(), "\n\n\n", val_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24998 entries, 0 to 24997
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   guid        24998 non-null  object
 1   source      24998 non-null  object
 2   premise     24998 non-null  object
 3   hypothesis  24998 non-null  object
 4   label       24998 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 976.6+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   guid        3000 non-null   object
 1   source      3000 non-null   object
 2   premise     3000 non-null   object
 3   hypothesis  3000 non-null   object
 4   label       3000 non-null   int64 
dtypes: int64(1), object(4)
memory usage: 117.3+ KB
None 


 None


In [None]:
source_train_df = train_df.loc[train_df["source"] != "airbnb"]
source_val_df = val_df.loc[val_df["source"] != "airbnb"]
target_train_df = train_df[train_df["source"] =="airbnb"]
target_val_df = val_df.loc[val_df["source"] == "airbnb"]


### Preprocessing

In [None]:
def preprocess_tokenizer(row):
  return tokenizer(
      row['premise'],
      row['hypothesis'],
      padding='max_length',
      truncation=True,
      max_length=MAX_SEQ_LEN,
  )

In [None]:
source_train_dataset, source_eval_dataset = train_test_split(source_train_df, test_size=0.2, shuffle=True, random_state = SEED_SPLIT, stratify=source_train_df['label'])

#convert to Datasets
source_train_dataset = Dataset.from_pandas(source_train_dataset)
source_eval_dataset = Dataset.from_pandas(source_eval_dataset)
source_valid_dataset = Dataset.from_pandas(source_val_df)
target_train_dataset = Dataset.from_pandas(target_train_df)
target_valid_dataset = Dataset.from_pandas(target_val_df)

#Tokenize
encoded_source_train_dataset = source_train_dataset.map(preprocess_tokenizer, batched = True)
encoded_source_eval_dataset = source_eval_dataset.map(preprocess_tokenizer, batched = True)
encoded_source_valid_dataset =  source_valid_dataset.map(preprocess_tokenizer, batched = True)

encoded_target_train_dataset = target_train_dataset.map(preprocess_tokenizer, batched = True)
encoded_target_valid_dataset = target_valid_dataset.map(preprocess_tokenizer, batched = True)

  0%|          | 0/17 [00:00<?, ?ba/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

### Train - Epoch 5

In [None]:
metric = load_metric("glue", "qnli")

In [None]:
#클래스 별 예측이 가장 높은 라벨을 argmax()를 통해 뽑아낸 후, 정답 라벨과 비교
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./FineTuned_bert-KLUE-NLI',
    logging_dir='./FineTuned_LMlogs',             
    # 차이가 확연하지 않아 epch 증대
    # num_train_epochs=5,
    num_train_epochs=10,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_source_train_dataset,
    eval_dataset=encoded_source_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer.train()
trainer.save_model("./FineTuned_NLI_model") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: hypothesis, guid, source, __index_level_0__, premise. If hypothesis, guid, source, __index_level_0__, premise are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 16139
  Num Epochs = 5
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 2525
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.7677,0.654242,0.734572
2,0.5589,0.597393,0.756629
3,0.4133,0.667241,0.755886
4,0.3048,0.745804,0.763569
5,0.2267,0.813791,0.760595


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: hypothesis, guid, source, __index_level_0__, premise. If hypothesis, guid, source, __index_level_0__, premise are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 4035
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505
Configuration saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/special_tokens_map.json
T

In [None]:
trainer.evaluate()

### Train - Epoch 10

In [None]:
steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./FineTuned_bert-KLUE-NLI_epoch10',
    logging_dir='./FineTuned_LMlogs',             
    # 차이가 확연하지 않아 epch 증대
    # num_train_epochs=5,
    num_train_epochs=10,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_source_train_dataset,
    eval_dataset=encoded_source_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)

trainer.train()
trainer.save_model("./FineTuned_NLI_model_epoch10") #save your custom model

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 16139
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 5050
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__ca

Epoch,Training Loss,Validation Loss,Accuracy
1,0.7653,0.644086,0.74052
2,0.5582,0.611307,0.757125
3,0.4238,0.673584,0.752416
4,0.3037,0.747325,0.758116
5,0.2156,0.862027,0.754647
6,0.1688,0.949517,0.760347
7,0.1199,1.12361,0.760099
8,0.0941,1.27838,0.752416
9,0.0677,1.364083,0.755638
10,0.0591,1.428429,0.759603


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 4035
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505
Configuration saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/FineTuned_bert-KLUE-NLI/checkpoint-505/special_tokens_map.json
T

### Evaluate with Target Valid Dataset

In [None]:
path = "./FineTuned_NLI_model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

Model:  /content/drive/MyDrive/Supercoder/FineTuned_NLI_model


loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/FineTuned_NLI_model/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/FineTuned_NLI_model",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_att

Evaluation results:  {'eval_loss': 0.8943072557449341, 'eval_accuracy': 0.6033333333333334, 'eval_runtime': 1.407, 'eval_samples_per_second': 426.448, 'eval_steps_per_second': 53.306}
Accuracy: 0.603
----------------



In [None]:
path = "./FineTuned_NLI_model_epoch10"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/FineTuned_NLI_model_epoch10/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/FineTuned_NLI_model_epoch10",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
 

Model:  /content/drive/MyDrive/Supercoder/FineTuned_NLI_model_epoch10


All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/FineTuned_NLI_model_epoch10.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_lev

Evaluation results:  {'eval_loss': 0.8465157151222229, 'eval_accuracy': 0.635, 'eval_runtime': 1.4377, 'eval_samples_per_second': 417.325, 'eval_steps_per_second': 52.166}
Accuracy: 0.635
----------------



# Train in Target Domain





## Pretraining

### Load Naive Mulitilingual Model

In [None]:
model_name = 'bert-base-multilingual-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
"""
encoded_target_train_dataset = target_train_dataset.map(preprocess_tokenizer, batched = True)
encoded_target_valid_dataset = target_valid_dataset.map(preprocess_tokenizer, batched = True)
"""

### Data Preprocessing

In [None]:
#추가하는 뎅리터셋의 최대 길이 103
MAX_SEQ_LEN = 128
TRAIN_BATCH_SIZE = 16
EVAL_BATCH_SIZE = 16
LEARNING_RATE = 2e-5 
LR_WARMUP_STEPS = 100
WEIGHT_DECAY = 0.01


In [None]:
target_df = df.loc[df["source"] == "airbnb"]

df_premise = df.drop_duplicates(['premise'], keep = 'first')['premise']
df_hypothesis = df.drop_duplicates(['hypothesis'], keep = 'first')['hypothesis']

target_domain_df = pd.DataFrame(pd.concat([df_premise, df_hypothesis]), columns=['text'])

df_target_train, df_target_valid = train_test_split(
    target_domain_df, test_size=0.15, random_state = SEED_SPLIT
)


train_dataset = Dataset.from_pandas(df_target_train[['text']])
valid_dataset = Dataset.from_pandas(df_target_valid[['text']])

tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=MAX_SEQ_LEN)
model = AutoModelForMaskedLM.from_pretrained(model_name)

#source https://gist.github.com/March-08/1f61608d0ff8f014ecf2d1d3294c2fb3
def tokenize_function(row):
    return tokenizer(
        row['text'],
        padding='max_length',
        truncation=True,
        max_length=MAX_SEQ_LEN,
        return_special_tokens_mask=True)
  
column_names = train_dataset.column_names

train_dataset = train_dataset.map(
    tokenize_function,
    batched=True,
    # num_proc=multiprocessing.cpu_count(),
    remove_columns=column_names,
)

valid_dataset = valid_dataset.map(
    tokenize_function,
    batched=True,
    # num_proc=multiprocessing.cpu_count(),
    remove_columns=column_names,
)

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-multilingual-cased/snapshots/cf732291d5a8eace7b973ccd13c95ec07b19e734/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-multilingual-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size":

  0%|          | 0/32 [00:00<?, ?ba/s]

  0%|          | 0/6 [00:00<?, ?ba/s]

### PreTrain

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)


steps_per_epoch = int(len(train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./bert-KLUE-NLI',
    logging_dir='./LMlogs',             
    num_train_epochs=2,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    warmup_steps=LR_WARMUP_STEPS,
    save_steps=steps_per_epoch,
    save_total_limit=3,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='loss', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer.train()
trainer.save_model("./target_model") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 31725
  Num Epochs = 2
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 3966
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,2.2888,2.040955
2,2.0882,1.939319


The following columns in the evaluation set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5599
  Batch size = 16
Saving model checkpoint to ./bert-KLUE-NLI/checkpoint-1983
Configuration saved in ./bert-KLUE-NLI/checkpoint-1983/config.json
Model weights saved in ./bert-KLUE-NLI/checkpoint-1983/pytorch_model.bin
tokenizer config file saved in ./bert-KLUE-NLI/checkpoint-1983/tokenizer_config.json
Special tokens file saved in ./bert-KLUE-NLI/checkpoint-1983/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num exa

### Perplexity

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False, do_lower_case=True)
model = AutoModelForMaskedLM.from_pretrained(model_name)

trainer = Trainer(
  model=model,
  data_collator=data_collator,
  #train_dataset=tokenized_dataset_2['train'],
  eval_dataset=valid_dataset,
  tokenizer=tokenizer,
  )

eval_results = trainer.evaluate()

print('Evaluation results: ', eval_results)
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.3f}")
print('----------------\n')

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-multilingual-cased/snapshots/cf732291d5a8eace7b973ccd13c95ec07b19e734/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-multilingual-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size":

Evaluation results:  {'eval_loss': 2.8086743354797363, 'eval_runtime': 18.9734, 'eval_samples_per_second': 295.097, 'eval_steps_per_second': 36.894}
Perplexity: 16.588
----------------



In [None]:
path = "./target_model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath, use_fast = False, do_lower_case=True)
  model = AutoModelForMaskedLM.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    data_collator=data_collator,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
    )
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Perplexity: {math.exp(eval_results['eval_loss']):.3f}")
  print('----------------\n')

loading file vocab.txt
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/target_model/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/target_model",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_vers

Model:  /content/drive/MyDrive/Supercoder/target_model


All model checkpoint weights were used when initializing BertForMaskedLM.

All the weights of BertForMaskedLM were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/target_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForMaskedLM for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `BertForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evalu

Evaluation results:  {'eval_loss': 1.930310845375061, 'eval_runtime': 18.9927, 'eval_samples_per_second': 294.798, 'eval_steps_per_second': 36.856}
Perplexity: 6.892
----------------



## Fine-Tuning

In [None]:
import glob
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification,AutoConfig


### Hyperparameter

In [None]:
#HyperParameter, Variables
#KLUE NLI에는 총 3개의 클래스가 존재
NUM_LABELS = 3
MAX_SEQ_LEN = 128
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 32
LEARNING_RATE = 2e-5
WEIGHT_DECAY = 0.01
SEED_TRAIN = 0
SEED_SPLIT = 0

### Preproecessing

In [None]:
path = "./target_model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  config = AutoConfig.from_pretrained(modelpath)
  config.num_labels = 3
  model = AutoModelForSequenceClassification.from_pretrained(modelpath, config=config)


In [None]:
target_train_dataset, target_eval_dataset = train_test_split(target_train_df, test_size=0.2, shuffle=True, stratify=target_train_df['label'])

target_train_dataset = Dataset.from_pandas(target_train_dataset)
target_eval_dataset = Dataset.from_pandas(target_eval_dataset)
target_valid_dataset = Dataset.from_pandas(target_val_df)

encoded_target_train_dataset = target_train_dataset.map(preprocess_tokenizer, batched = True)
encoded_target_eval_dataset = target_eval_dataset.map(preprocess_tokenizer, batched = True)
encoded_target_valid_dataset = target_valid_dataset.map(preprocess_tokenizer, batched = True)

### Train - Epoch 5

In [None]:
metric = load_metric("glue", "qnli")

#클래스 별 예측이 가장 높은 라벨을 argmax()를 통해 뽑아낸 후, 정답 라벨과 비교
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./target_bert-KLUE-NLI',
    logging_dir='./target_bert_LMlogs',             
    num_train_epochs=5,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_target_train_dataset,
    eval_dataset=encoded_target_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer.train()
trainer.save_model("./target_NLI_model") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3859
  Num Epochs = 5
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 605
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.701819,0.715026
2,No log,0.665266,0.743005
3,No log,0.668233,0.74715
4,No log,0.673179,0.769948
5,0.542100,0.687417,0.773057


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 965
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121
Configuration saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/special_tokens_map.json
The following col

### Train - Epoch 10

In [None]:
metric = load_metric("glue", "qnli")

#클래스 별 예측이 가장 높은 라벨을 argmax()를 통해 뽑아낸 후, 정답 라벨과 비교
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./target_bert-KLUE-NLI_epoch10',
    logging_dir='./target_bert_LMlogs_epoch10',             
    num_train_epochs=10,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_target_train_dataset,
    eval_dataset=encoded_target_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)
trainer.train()
trainer.save_model("./target_NLI_model_epoch10") #save your custom model

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3859
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 1210
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__cal

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.708779,0.716062
2,No log,0.691999,0.708808
3,No log,0.680736,0.745078
4,No log,0.717397,0.75544
5,0.538500,0.762964,0.764767
6,0.538500,0.808569,0.762694
7,0.538500,0.931401,0.776166
8,0.538500,1.001543,0.773057
9,0.144600,1.043525,0.78342
10,0.144600,1.055879,0.787565


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 965
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121
Configuration saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/target_bert-KLUE-NLI/checkpoint-121/special_tokens_map.json
The following col

### Evaluate with Target Valid Dataset

In [None]:
path = "./target_NLI_model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/target_NLI_model/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/target_NLI_model",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention

Model:  /content/drive/MyDrive/Supercoder/target_NLI_model


All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/target_NLI_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, gui

Evaluation results:  {'eval_loss': 0.9398565888404846, 'eval_accuracy': 0.5983333333333334, 'eval_runtime': 1.4204, 'eval_samples_per_second': 422.408, 'eval_steps_per_second': 52.801}
Accuracy: 0.598
----------------



In [None]:
path = "./target_NLI_model_epoch10"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/target_NLI_model_epoch10/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/target_NLI_model_epoch10",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pool

Model:  /content/drive/MyDrive/Supercoder/target_NLI_model_epoch10


All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/target_NLI_model_epoch10.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_

Evaluation results:  {'eval_loss': 0.9618410468101501, 'eval_accuracy': 0.585, 'eval_runtime': 1.4437, 'eval_samples_per_second': 415.587, 'eval_steps_per_second': 51.948}
Accuracy: 0.585
----------------



## Train with Naive Multilingual BERT

### Model Load

In [None]:
model_name = 'bert-base-multilingual-cased'

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
config.num_labels = 3
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)

### Train - Epoch 5

In [None]:
metric = load_metric("glue", "qnli")

#클래스 별 예측이 가장 높은 라벨을 argmax()를 통해 뽑아낸 후, 정답 라벨과 비교
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='.Naive_bert-KLUE-NLI',
    logging_dir='./Naive_bert_LMlogs',             
    num_train_epochs=5,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_target_train_dataset,
    eval_dataset=encoded_target_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer.train()
trainer.save_model("./Naive_NLI_model") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3859
  Num Epochs = 5
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 605
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.836391,0.678756
2,No log,0.685501,0.72228
3,No log,0.652994,0.748187
4,No log,0.675071,0.770984
5,0.619100,0.691889,0.768912


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 965
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI/checkpoint-121
Configuration saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI/checkpoint-121/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI/checkpoint-121/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI/checkpoint-121/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI/checkpoint-121/special_tokens_map.json
The following columns 

### Train - Epoch 10

In [None]:
metric = load_metric("glue", "qnli")

#클래스 별 예측이 가장 높은 라벨을 argmax()를 통해 뽑아낸 후, 정답 라벨과 비교
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

steps_per_epoch = int(len(encoded_source_train_dataset) / TRAIN_BATCH_SIZE)

training_args = TrainingArguments(
    output_dir='./Naive_bert-KLUE-NLI_epoch10',
    logging_dir='./Naive_bert_LMlogs_epoch10',             
    num_train_epochs=10,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    save_steps=steps_per_epoch,
    save_total_limit=10,
    weight_decay=WEIGHT_DECAY,
    learning_rate=LEARNING_RATE, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy', 
    greater_is_better=False,
    seed=SEED_TRAIN
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_target_train_dataset,
    eval_dataset=encoded_target_eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,

)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
trainer.train()
trainer.save_model("./Naive_NLI_model_epoch10") #save your custom model

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3859
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 1210
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.832497,0.674611
2,No log,0.694039,0.720207
3,No log,0.675939,0.740933
4,No log,0.671364,0.765803
5,0.611700,0.747111,0.767876
6,0.611700,0.867967,0.754404
7,0.611700,0.939724,0.754404
8,0.611700,1.012962,0.759585
9,0.170500,1.038779,0.761658
10,0.170500,1.057998,0.767876


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid, premise, hypothesis are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 965
  Batch size = 32
Saving model checkpoint to /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI_epoch10/checkpoint-121
Configuration saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI_epoch10/checkpoint-121/config.json
Model weights saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI_epoch10/checkpoint-121/pytorch_model.bin
tokenizer config file saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI_epoch10/checkpoint-121/tokenizer_config.json
Special tokens file saved in /content/drive/MyDrive/Supercoder/Naive_bert-KLUE-NLI_epoch10/checkpoint-121/specia

### Evaluation

In [None]:
path = "./Naive_NLI_model"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/Naive_NLI_model/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/Naive_NLI_model",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_h

Model:  /content/drive/MyDrive/Supercoder/Naive_NLI_model


All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/Naive_NLI_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0__, guid

Evaluation results:  {'eval_loss': 1.0084370374679565, 'eval_accuracy': 0.5583333333333333, 'eval_runtime': 1.4516, 'eval_samples_per_second': 413.345, 'eval_steps_per_second': 51.668}
Accuracy: 0.558
----------------



In [None]:
path = "./Naive_NLI_model_epoch10"

for modelpath in glob.iglob(path):
  print('Model: ', modelpath)
  tokenizer = AutoTokenizer.from_pretrained(modelpath)
  model = AutoModelForSequenceClassification.from_pretrained(modelpath)

  trainer = Trainer(
    model=model,
    #train_dataset=tokenized_dataset_2['train'],
    eval_dataset=encoded_target_valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics

    )
  
  
  eval_results = trainer.evaluate()

  print('Evaluation results: ', eval_results)
  print(f"Accuracy: {eval_results['eval_accuracy']:.3f}")
  print('----------------\n')

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading configuration file /content/drive/MyDrive/Supercoder/Naive_NLI_model_epoch10/config.json
Model config BertConfig {
  "_name_or_path": "/content/drive/MyDrive/Supercoder/Naive_NLI_model_epoch10",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler

Model:  /content/drive/MyDrive/Supercoder/Naive_NLI_model_epoch10


All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Supercoder/Naive_NLI_model_epoch10.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.
No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: source, __index_level_0__, guid, premise, hypothesis. If source, __index_level_0

Evaluation results:  {'eval_loss': 1.0064477920532227, 'eval_accuracy': 0.5566666666666666, 'eval_runtime': 1.4185, 'eval_samples_per_second': 422.967, 'eval_steps_per_second': 52.871}
Accuracy: 0.557
----------------



# Test Result

## 5 Epoch
실험은 총 3가지 방향으로 진행하였다.

(1) Airbnb를 Target으로 나머지 Domain을 Source로 하여 Multilingual Bert를 Domain Pretrain을 수행하고, Source를 이용하여 Fine Tuning한 후 Target Validation을 이용하여 Accuracy 측정

(2) Target을 이용하여 Domain Pretrain과 Fine Tuning을 수행 한 후 Target Validation을 이용하여 Accuracy 측정

(3) Naive한 Multilingual BERT를 Target train으로만 Finetuning후, Validation에 대하여 Accuracy 측정. 

결과는 다음과 같다


### Result

(1) ***Accuracy: 0.603***

  *   'eval_loss': 0.8943072557449341
  *   'eval_accuracy': 0.6033333333333334

(2) ***Accuracy: 0.598***


*   'eval_loss': 0.9398565888404846 
*   'eval_accuracy': 0.5983333333333334

(3) ***Accuracy: 0.558***


*   'eval_loss': 1.0084370374679565
*   'eval_accuracy': 0.5583333333333333


방법론 (3)과 비교 했을때, 
미세한 차이로 Domain Adaptation이 효과가 있어 보이나, 큰 차이로 보이지 않아, Epoch을 10으로 늘려 시험하였다.



## 10 Epoch

### Result

(1) ***Accuracy: 0.635***

  *   'eval_loss': 0.8465157151222229
  *  'eval_accuracy': 0.635

(2) ***Accuracy: 0.585***


*   'eval_loss': 0.9618410468101501
*  'eval_accuracy': 0.585

(3) ***Accuracy: 0.557***


*   'eval_loss': 1.0064477920532227 
* 'eval_accuracy': 0.5566666666666666,

Epoch 을 늘린 이후, (1) 방법론이 다른 방법론 보다 비교적 높은 성능을 보였다. 