# 24-1. 프로젝트 : 커스텀 프로젝트 직접 만들기
실습 코드에서 수행해 본 내용을 토대로, 이번에는 한국어 데이터셋에 도전해보겠습니다.

앞서 본 GLUE benchmark의 한국어 버전 [KLUE benchmark](https://klue-benchmark.com/)를 들어보신 적 있나요?

GLUE와 마찬가지로 한국어 자연어처리에 대한 이해도를 높이기 위해 만들어진 데이터셋 benchmark입니다. 총 8가지의 데이터셋이 있습니다. 다만 이번 시간에 진행할 프로젝트는 KLUE의 dataset을 활용하는 것이 아닌, model(klue/ber-base)를 활용하여 NSMC(Naver Sentiment Movie Corpus) task를 도전해보겠습니다.

모델과 데이터에 관한 정보는 링크를 참조해주세요.
- [KLUE/Bert-base](https://huggingface.co/klue/bert-base)
- [NSMC](https://github.com/e9t/nsmc)
  
## 루브릭
1. 모델과 데이터를 정상적으로 불러오고, 작동하는 것을 확인하였다.  
   klue/bert-base를 NSMC 데이터셋으로 fine-tuning 하여, 모델이 정상적으로 작동하는 것을 확인하였다.  
2. Preprocessing을 개선하고, fine-tuning을 통해 모델의 성능을 개선시켰다.  
   Validation accuracy를 90% 이상으로 개선하였다.
3. 모델 학습에 Bucketing을 성공적으로 적용하고, 그 결과를 비교분석하였다.  
   Bucketing task을 수행하여 fine-tuning 시 연산 속도와 모델 성능 간의 trade-off 관계가 발생하는지 여부를 확인하고, 분석한 결과를 제시하였다.

## 라이브러리

In [1]:
import datasets
from datasets import load_dataset, Dataset

import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification

import os
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import torch
from transformers import EarlyStoppingCallback

from datasets import load_metric

## STEP 1. NSMC 데이터 분석 및 Huggingface dataset 구성
데이터셋은 깃허브에서 다운받거나, [Huggingface datasets](https://huggingface.co/datasets)에서 가져올 수 있습니다. 앞에서 배운 방법들을 활용해봅시다!

In [2]:
# NSMC 데이터셋 로드
huggingface_nsmc_dataset = load_dataset('Blpeng/nsmc')

# 데이터셋 확인
print(huggingface_nsmc_dataset)

Using custom data configuration Blpeng___nsmc-55757a98c8abea78


Downloading and preparing dataset csv/Blpeng___nsmc to /aiffel/.cache/huggingface/datasets/csv/Blpeng___nsmc-55757a98c8abea78/0.0.0/bf68a4c4aefa545d0712b2fcbb1b327f905bbe2f6425fbc5e8c25234acb9e14a...


  0%|          | 0/1 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.19M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/15.7M [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

Dataset csv downloaded and prepared to /aiffel/.cache/huggingface/datasets/csv/Blpeng___nsmc-55757a98c8abea78/0.0.0/bf68a4c4aefa545d0712b2fcbb1b327f905bbe2f6425fbc5e8c25234acb9e14a. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'id', 'document', 'label'],
        num_rows: 400000
    })
})


In [3]:
train = huggingface_nsmc_dataset['train']
cols = train.column_names
cols

['Unnamed: 0', 'id', 'document', 'label']

In [4]:
for i in range(5):
    for col in cols:
        print(col, ":", train[col][i])
        if col == 'document':
            print('type:', isinstance(train[col][i], str))
    print('\n')

Unnamed: 0 : 0
id : 8112052
document : 어릴때보고 지금다시봐도 재밌어요ㅋㅋ
type: True
label : 1


Unnamed: 0 : 1
id : 8132799
document : 디자인을 배우는 학생으로, 외국디자이너와 그들이 일군 전통을 통해 발전해가는 문화산업이 부러웠는데. 사실 우리나라에서도 그 어려운시절에 끝까지 열정을 지킨 노라노 같은 전통이있어 저와 같은 사람들이 꿈을 꾸고 이뤄나갈 수 있다는 것에 감사합니다.
type: True
label : 1


Unnamed: 0 : 2
id : 4655635
document : 폴리스스토리 시리즈는 1부터 뉴까지 버릴께 하나도 없음.. 최고.
type: True
label : 1


Unnamed: 0 : 3
id : 9251303
document : 와.. 연기가 진짜 개쩔구나.. 지루할거라고 생각했는데 몰입해서 봤다.. 그래 이런게 진짜 영화지
type: True
label : 1


Unnamed: 0 : 4
id : 10067386
document : 안개 자욱한 밤하늘에 떠 있는 초승달 같은 영화.
type: True
label : 1




In [5]:
# not_str_idecies = []
# for sample in huggingface_nsmc_dataset['train']:
#     if not isinstance(sample[document], str):
#         not_str_idecies.append(sample['Unnamed: 0'])
# #         print(sample[document])
# print(not_str_idecies)

None data Unnamed: 0: [46471, 60735, 77665, 84098, 127017, 172375, 173526, 197279, 5746, 7899, 27097, 25857, 55737, 110014, 126782, 140721]

In [3]:
def filter_none_examples(example):
    return isinstance(example['document'], str) and example['document'].strip() != ''

hf_dataset = huggingface_nsmc_dataset['train'].filter(filter_none_examples)
hf_dataset = hf_dataset.shuffle(seed = 526).select(range(200000))

  0%|          | 0/400 [00:00<?, ?ba/s]

In [7]:
print(hf_dataset)

Dataset({
    features: ['Unnamed: 0', 'id', 'document', 'label'],
    num_rows: 200000
})


In [4]:
del huggingface_nsmc_dataset

## STEP 2. klue/bert-base model 및 tokenizer 불러오기

In [5]:
model_id = 'klue/bert-base'

huggingface_tokenizer = AutoTokenizer.from_pretrained(model_id)
huggingface_model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels = 2)

Downloading:   0%|          | 0.00/289 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/425 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/243k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/424M [00:00<?, ?B/s]

Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized

## STEP 3. 위에서 불러온 tokenizer으로 데이터셋을 전처리하고, model 학습 진행해 보기

In [6]:
def transform(data):
    return huggingface_tokenizer(
        data['document'],
        truncation = True,
        padding = 'max_length',
        max_length = 20,
        return_token_type_ids = False,
        )

In [7]:
hf_dataset = hf_dataset.map(transform, batched = True)

# 먼저 전체 데이터셋을 85%/15%로 나눕니다.
train_test_split = hf_dataset.train_test_split(test_size=0.15)

# 나눠진 85%의 train 데이터셋을 다시 90%/10%로 나눔.
# validation 데이터셋이 전체의 10%가 되도록 하기 위해 비율을 0.1176으로 설정
train_validation_split = train_test_split['train'].train_test_split(test_size=0.10 / 0.85)

hf_train_dataset = train_validation_split['train']
hf_val_dataset = train_validation_split['test']
hf_test_dataset = train_test_split['test']

  0%|          | 0/200 [00:00<?, ?ba/s]

In [8]:
del hf_dataset, train_test_split, train_validation_split

In [9]:
output_dir = os.getenv('HOME')+'/aiffel/transformers'

training_arguments = TrainingArguments(
    output_dir,                                         # output이 저장될 경로
    evaluation_strategy="epoch",           #evaluation하는 빈도
    learning_rate = 2e-5,                         #learning_rate
    per_device_train_batch_size = 8,   # 각 device 당 batch size
    per_device_eval_batch_size = 8,    # evaluation 시에 batch size
    num_train_epochs = 3,                     # train 시킬 총 epochs
    weight_decay = 0.01,                        # weight decay
    group_by_length = True,
    gradient_accumulation_steps = 16,
)

In [10]:
# 정확도와 F1 점수를 계산할 메트릭 로드
accuracy_metric = load_metric("accuracy")
f1_metric = load_metric("f1")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    # 정확도와 F1 점수 계산
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    f1 = f1_metric.compute(predictions=predictions, references=labels)
    
    return {
        'accuracy': accuracy['accuracy'],
        'f1': f1['f1']
    }

Downloading:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.07k [00:00<?, ?B/s]

In [15]:
# 훈련 전 메모리 비우기
torch.cuda.empty_cache()

In [16]:
trainer = Trainer(
    model=huggingface_model,           # 학습시킬 model
    args=training_arguments,           # TrainingArguments을 통해 설정한 arguments
    train_dataset=hf_train_dataset,    # training dataset
    eval_dataset=hf_val_dataset,       # evaluation dataset
    compute_metrics=compute_metrics,
)
trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running training *****
  Num examples = 149999
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 16
  Total optimization steps = 3513


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,0.2955,0.268303,0.890255,0.888187
1,0.2216,0.245865,0.903655,0.904429
2,0.1632,0.251165,0.908105,0.908838


Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-500
Configuration saved in /aiffel/aiffel/transformers/checkpoint-500/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-500/pytorch_model.bin
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-1000
Configuration saved in /aiffel/aiffel/transformers/checkpoint-1000/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-1000/pytorch_model.bin
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running Evaluation *****
  Num examples = 20001
  Batch size = 8
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-1500
Configuration saved in /aiffel/aiffel/transformers/checkpoint-1500/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-1500/pytorch_model.bin
Saving model checkpoint to /aiffel/aiffel/transformers

TrainOutput(global_step=3513, training_loss=0.2347221043996922, metrics={'train_runtime': 2662.1823, 'train_samples_per_second': 169.033, 'train_steps_per_second': 1.32, 'total_flos': 4623827353581600.0, 'train_loss': 0.2347221043996922, 'epoch': 3.0})

In [17]:
trainer.evaluate(hf_test_dataset)

The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running Evaluation *****
  Num examples = 30000
  Batch size = 8


{'eval_loss': 0.24725118279457092,
 'eval_accuracy': 0.9088666666666667,
 'eval_f1': 0.909434212269776,
 'eval_runtime': 51.8528,
 'eval_samples_per_second': 578.561,
 'eval_steps_per_second': 72.32,
 'epoch': 3.0}

In [18]:
del huggingface_model

## STEP 4. Fine-tuning을 통하여 모델 성능(accuarcy) 향상시키기
데이터 전처리, TrainingArguments 등을 조정하여 모델의 정확도를 90% 이상으로 끌어올려봅시다.

In [11]:
huggingface_model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels = 2)

Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized

In [12]:
# 훈련 전 메모리 비우기
torch.cuda.empty_cache()

In [13]:
# 데이터 콜레이터 설정
data_collator = DataCollatorWithPadding(tokenizer=huggingface_tokenizer)

training_arguments = TrainingArguments(
    output_dir,                                         # output이 저장될 경로
    evaluation_strategy="epoch",           #evaluation하는 빈도
    save_strategy = 'epoch',
#     learning_rate = 2e-5,                         #learning_rate
    per_device_train_batch_size = 16,   # 각 device 당 batch size
    per_device_eval_batch_size = 16,    # evaluation 시에 batch size
    num_train_epochs = 3,                     # train 시킬 총 epochs
    weight_decay = 0.02,                        # weight decay
    group_by_length = True,
    gradient_accumulation_steps = 8,
    lr_scheduler_type = 'linear',
    load_best_model_at_end = True,
    metric_for_best_model = 'eval_loss'
)

trainer = Trainer(
    model=huggingface_model,           # 학습시킬 model
    args=training_arguments,           # TrainingArguments을 통해 설정한 arguments
    train_dataset=hf_train_dataset,    # training dataset
    eval_dataset=hf_val_dataset,       # evaluation dataset
    compute_metrics=compute_metrics,
    data_collator=data_collator,  # 데이터 콜레이터 추가
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)
trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: document, Unnamed: 0, id.
***** Running training *****
  Num examples = 149999
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 8
  Total optimization steps = 3513


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,0.2814,0.245959,0.895805,0.896913
1,0.1768,0.227906,0.918304,0.918136
2,0.0878,0.259274,0.921204,0.921475


The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: document, Unnamed: 0, id.
***** Running Evaluation *****
  Num examples = 20001
  Batch size = 16
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-1171
Configuration saved in /aiffel/aiffel/transformers/checkpoint-1171/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-1171/pytorch_model.bin
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: document, Unnamed: 0, id.
***** Running Evaluation *****
  Num examples = 20001
  Batch size = 16
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-2342
Configuration saved in /aiffel/aiffel/transformers/checkpoint-2342/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-2342/pytorch_model.bin
The following columns in the evaluation 

TrainOutput(global_step=3513, training_loss=0.19134802086007857, metrics={'train_runtime': 1976.2941, 'train_samples_per_second': 227.697, 'train_steps_per_second': 1.778, 'total_flos': 4623827353581600.0, 'train_loss': 0.19134802086007857, 'epoch': 3.0})

In [14]:
trainer.evaluate(hf_test_dataset)

The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: document, Unnamed: 0, id.
***** Running Evaluation *****
  Num examples = 30000
  Batch size = 16


{'eval_loss': 0.22612252831459045,
 'eval_accuracy': 0.9190666666666667,
 'eval_f1': 0.9193194656742207,
 'eval_runtime': 38.5354,
 'eval_samples_per_second': 778.505,
 'eval_steps_per_second': 48.657,
 'epoch': 3.0}

In [20]:
del huggingface_model

## STEP 5. Bucketing을 적용하여 학습시키고, STEP 4의 결과와의 비교
아래 링크를 바탕으로 bucketing과 dynamic padding이 무엇인지 알아보고, 이들을 적용하여 model을 학습시킵니다.

- [Data Collator](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/data_collator)

- [Trainer.TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments) 의 group_by_length

STEP 4에 학습한 결과와 bucketing을 적용하여 학습시킨 결과를 비교해보고, 모델 성능 향상과 훈련 시간 두 가지 측면에서 각각 어떤 이점이 있는지 비교해봅시다.

In [26]:
huggingface_model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels = 2)

loading configuration file https://huggingface.co/klue/bert-base/resolve/main/config.json from cache at /aiffel/.cache/huggingface/transformers/fbd0b2ef898c4653902683fea8cc0dd99bf43f0e082645b913cda3b92429d1bb.99b3298ed554f2ad731c27cdb11a6215f39b90bc845ff5ce709bb4e74ba45621
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.11.3",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 32000
}

loading weights file https://huggingface.co/klue/bert-base/resolve/main/pytorch_model.bin from cache at /aiffel/.cache/huggingface/transform

In [27]:
# 훈련 전 메모리 비우기
torch.cuda.empty_cache()

In [28]:
# 데이터 콜레이터 설정
data_collator = DataCollatorWithPadding(tokenizer=huggingface_tokenizer)

# 훈련 인수 설정
training_arguments = TrainingArguments(
    output_dir=output_dir,
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    group_by_length=True,  # 길이에 따라 버킷화
    gradient_accumulation_steps = 16,
)

# Trainer 설정
trainer = Trainer(
    model=huggingface_model,
    args=training_arguments,
    train_dataset=hf_train_dataset,
    eval_dataset=hf_val_dataset,
    data_collator=data_collator,  # 데이터 콜레이터 추가
    compute_metrics=compute_metrics,
)

# 모델 학습
trainer.train()

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running training *****
  Num examples = 149999
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 16
  Total optimization steps = 3513


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,0.2946,0.267894,0.891105,0.889643
1,0.2208,0.244721,0.904105,0.904899
2,0.1639,0.248071,0.908055,0.908857


Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-500
Configuration saved in /aiffel/aiffel/transformers/checkpoint-500/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-500/pytorch_model.bin
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-1000
Configuration saved in /aiffel/aiffel/transformers/checkpoint-1000/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-1000/pytorch_model.bin
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running Evaluation *****
  Num examples = 20001
  Batch size = 8
Saving model checkpoint to /aiffel/aiffel/transformers/checkpoint-1500
Configuration saved in /aiffel/aiffel/transformers/checkpoint-1500/config.json
Model weights saved in /aiffel/aiffel/transformers/checkpoint-1500/pytorch_model.bin
Saving model checkpoint to /aiffel/aiffel/transformers

TrainOutput(global_step=3513, training_loss=0.23348650659484602, metrics={'train_runtime': 2679.2162, 'train_samples_per_second': 167.958, 'train_steps_per_second': 1.311, 'total_flos': 4623827353581600.0, 'train_loss': 0.23348650659484602, 'epoch': 3.0})

In [29]:
trainer.evaluate(hf_test_dataset)

The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: Unnamed: 0, document, id.
***** Running Evaluation *****
  Num examples = 30000
  Batch size = 8


{'eval_loss': 0.2473093867301941,
 'eval_accuracy': 0.9100666666666667,
 'eval_f1': 0.9103237386159675,
 'eval_runtime': 52.1918,
 'eval_samples_per_second': 574.803,
 'eval_steps_per_second': 71.85,
 'epoch': 3.0}

In [30]:
del huggingface_model

연산시간을 줄이기 위해 bucketing을 미리 진행하여서 bucketing을 진행하지 않은 모델과 비교할 수 없음. dynamic padding을 bucketing을 동시에 진행할 경우 기존모델보다 학습시간이 약간 줄어들었음. bucketing은 길이가 비슷한 샘플끼리 bucket으로 묶어서 padding의 길이를 줄이고, dynamic padding은 각 배치마다 가장 긴 샘플에 맞춰 padding을 하기 때문에 둘을 사용하면 연산량이 줄어드는 효과가 있음.  
성능은 큰 차이가 없지만 bucketing과 dynamic padding을 사용한 모델의 accuracy가 조금 더 올랐음.