<div class="alert alert-block" style="border: 2px solid #1976D2;background-color:#E3F2FD;padding:5px;font-size:0.9em;">
본 자료는 저작권법 제25조 2항에 의해 보호를 받습니다. 본 자료를 외부에 공개하지 말아주세요.<br>
<b><a href="https://school.fun-coding.org/">잔재미코딩 (https://school.fun-coding.org/)</a> 에서 본 강의를 포함하는 최적화된 로드맵도 확인하실 수 있습니다</b></div>

### CIFAR10 데이터셋
- ‘비행기(airplane)’, ‘자동차(automobile)’, ‘새(bird)’, ‘고양이(cat)’, ‘사슴(deer)’, ‘개(dog)’, ‘개구리(frog)’, ‘말(horse)’, ‘배(ship)’, ‘트럭(truck)’ 개의 3채널(컬러), 32x32 이미지와 레이블로 구성 (60000개) 
- https://huggingface.co/datasets/cifar10

### Install packages

pip install -q 옵션: 더 적은 출력표시

In [None]:
!pip install -q transformers==4.22.1 datasets==2.4.0

In [None]:
import warnings
warnings.filterwarnings('ignore')

## Loading the data
- datasets.load_dataset(데이터셋이름, split=['train[:x]', 'test[:y]'])
  - 전체 데이터셋 사이즈 중, train 데이터에서 5000까지, test 데이터에서 2000개까지 가져옴
    - 해당 데이터에 train 명으로 분리된 데이터셋과 test 명으로 분리된 데이터셋이 이미 존재함
  - train_test_split() 을 사용하여, validiation set 도 구성 가능
    - train 과 test 키로 각 데이터셋이 구성됨
- https://huggingface.co/datasets/cifar10

In [None]:
from datasets import load_dataset

train_ds, test_ds = load_dataset('cifar10', split=['train[:5000]', 'test[:2000]'])
splits = train_ds.train_test_split(test_size=0.1)
train_ds = splits['train']
val_ds = splits['test']
train_ds[0].keys()



  0%|          | 0/2 [00:00<?, ?it/s]

dict_keys(['img', 'label'])

## Preprocessing the data

- Vision Transformer 는 동일 이미지 사이즈와 동일 채널별 Normalization 시에 성능이 좋음
- ViTFeatureExtractor() 를 통해, 해당 Pre-Trained 모델의 학습시 적용된 config 을 확인할 수 있음
- 채널별 픽셀값은 'pixel_values', 해당 이미지의 분류값은 'labels' 에 넣어주면, 해당 Pre-Trained 모델로 학습 및 예측 가능

In [None]:
from transformers import ViTFeatureExtractor

# vit 모델: https://huggingface.co/google/vit-base-patch16-224-in21k
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
feature_extractor

loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/preprocessor_config.json
Feature extractor ViTFeatureExtractor {
  "do_normalize": true,
  "do_resize": true,
  "feature_extractor_type": "ViTFeatureExtractor",
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "resample": 2,
  "size": 224
}



ViTFeatureExtractor {
  "do_normalize": true,
  "do_resize": true,
  "feature_extractor_type": "ViTFeatureExtractor",
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "resample": 2,
  "size": 224
}

### Augmentation
- 적은 데이터를 증강하는 기법으로 이미지 모델에서 성능을 높이는데 기여한 기법
- 또한 다양한 test 데이터에 대해서도 성능을 낼 수 있도록, 데이터를 임의로 다양하게 변형하여, 학습시키기 위해서도 많이 사용함
- pytorch torchvision 에서 제공하는 데이터셋은 데이터 변경을 용이하게 할 수 있도록 몇 가지 변형을 제공함
- 주요 함수
   - torchvision.transforms.ToTensor() : PIL 이미지 또는 ndarray 데이터를 텐서 형태로 변형시켜줌
   - torchvision.transforms.Normalize(mean, std)
      - mean, std 는 각 채널별 평균과 표준편차 (데이터 정규화를 위한 기법), **텐서에만 적용 가능**
      - 예: 3 채널 데이터라면,
         - 각 채널의 평균, 표준편차를 0.5 로 셋한다면, 
         - transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
   - torchvision.transforms.Resize(size)
      - 이미지의 사이즈 변경
      - 예: 이미지를 224 x 224 로 변경하고자 한다면,
         - transforms.Resize((224, 224))
   - torchvision.transforms.Compose()
      - 여러 transform 을 하나로 구성하는 기능
   - https://pytorch.org/vision/stable/transforms.html

In [None]:
from torchvision import transforms

normalize = transforms.Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
transforms_for_train = transforms.Compose(
        [
            transforms.Resize(feature_extractor.size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]
    )

transforms_for_val = transforms.Compose(
        [
            transforms.Resize(feature_extractor.size),
            transforms.CenterCrop(feature_extractor.size),
            transforms.ToTensor(),
            normalize,
        ]
    )

def train_transforms(imagedata):
    # 파이썬 Comprehension 참고: https://www.fun-coding.org/PL&OOP5-2.html
    imagedata['pixel_values'] = [transforms_for_train(image.convert("RGB")) for image in imagedata['img']]
    return imagedata

def test_transforms(imagedata):
    # 파이썬 Comprehension 참고: https://www.fun-coding.org/PL&OOP5-2.html
    imagedata['pixel_values'] = [transforms_for_val(image.convert("RGB")) for image in imagedata['img']]
    return imagedata

# Set the transforms
train_ds.set_transform(train_transforms)
val_ds.set_transform(test_transforms)
test_ds.set_transform(test_transforms)

print (train_ds[0].keys())
print (type(train_ds[0]['img']))
print (type(train_ds[0]['label']), train_ds[0]['label'])
print (type(train_ds[0]['pixel_values']), train_ds[0]['pixel_values'].shape)

dict_keys(['img', 'label', 'pixel_values'])
<class 'PIL.PngImagePlugin.PngImageFile'>
<class 'int'> 5
<class 'torch.Tensor'> torch.Size([3, 224, 224])


### Trainer 활용을 위해 필요한 data_collator 함수 
  - 인덱스 번호 기반 데이터셋(map-style dadaset) 을 기반으로 mini-batch 구성시 샘플을 리스트로 합쳐주는 기능을 구현해야 함

In [None]:
from torch.utils.data import DataLoader
import torch

def collate_fn(imagedata):
    # 파이썬 Comprehension 참고: https://www.fun-coding.org/PL&OOP5-2.html
    pixel_values = torch.stack([example["pixel_values"] for example in imagedata])
    labels = torch.tensor([example["label"] for example in imagedata])
    return {"pixel_values": pixel_values, "labels": labels}

# DataLoader 사용시에는 다음과 같이 사용할 수 있음
# train_dataloader = DataLoader(train_ds, collate_fn=collate_fn, batch_size=16)

## Define the model

- https://huggingface.co/google/vit-base-patch16-224-in21k

In [None]:
id2label = {id:label for id, label in enumerate(train_ds.features['label'].names)}
label2id = {label:id for id, label in id2label.items()}
id2label

{0: 'airplane',
 1: 'automobile',
 2: 'bird',
 3: 'cat',
 4: 'deer',
 5: 'dog',
 6: 'frog',
 7: 'horse',
 8: 'ship',
 9: 'truck'}

In [None]:
label2id

{'airplane': 0,
 'automobile': 1,
 'bird': 2,
 'cat': 3,
 'deer': 4,
 'dog': 5,
 'frog': 6,
 'horse': 7,
 'ship': 8,
 'truck': 9}

In [None]:
from transformers import ViTForImageClassification

model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224-in21k',
                                                  num_labels=10,
                                                  id2label=id2label,
                                                  label2id=label2id)

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/config.json
Model config ViTConfig {
  "_name_or_path": "google/vit-base-patch16-224-in21k",
  "architectures": [
    "ViTModel"
  ],
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "airplane",
    "1": "automobile",
    "2": "bird",
    "3": "cat",
    "4": "deer",
    "5": "dog",
    "6": "frog",
    "7": "horse",
    "8": "ship",
    "9": "truck"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "airplane": 0,
    "automobile": 1,
    "bird": 2,
    "cat": 3,
    "deer": 4,
    "dog": 5,
    "frog": 6,
    "horse": 7,
    "ship": 8,
    "truck": 9
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channe

In [None]:
model.config

ViTConfig {
  "_name_or_path": "google/vit-base-patch16-224-in21k",
  "architectures": [
    "ViTModel"
  ],
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "airplane",
    "1": "automobile",
    "2": "bird",
    "3": "cat",
    "4": "deer",
    "5": "dog",
    "6": "frog",
    "7": "horse",
    "8": "ship",
    "9": "truck"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "airplane": 0,
    "automobile": 1,
    "bird": 2,
    "cat": 3,
    "deer": 4,
    "dog": 5,
    "frog": 6,
    "horse": 7,
    "ship": 8,
    "truck": 9
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "qkv_bias": true,
  "transformers_version": "4.22.1"
}

### Trainer 실행을 위해 필요한 아규먼트 설정

In [None]:
from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    output_dir="test-cifar-10", # 모델 예측과 체크포인트가 저장되는 폴더명 (모델마다 임의 폴더명으로 작성하면 됨)
    save_strategy="epoch", # epoch 마다 모델 학습 전략
    evaluation_strategy="epoch", # evalution 시, epoch 마다 모델 학습 전략
    learning_rate=2e-5, # learning rate
    per_device_train_batch_size=16, # CPU/GPU 당 mini-batch 사이즈 
    per_device_eval_batch_size=16, # evaluation 시, CPU/GPU 당 mini-batch 사이즈
    num_train_epochs=10, # total epoch num
    weight_decay=0.01, # optimizer 에 들어갈 weight decay
    load_best_model_at_end=True, # 학습 종료시 자동으로 베스트 모델을 로드함
    metric_for_best_model="accuracy", # 베스트 모델 측정을 위한 매트릭 (정확도)
    logging_dir='logs', # Tensorboard 를 위한 logs 를 저장할 폴더명
    remove_unused_columns=False, # 자동으로 모델에서 쓰지 않는 컬럼 삭제 여부
    optim="adamw_torch", # 최근 변경(pytorch 에서 제공하는 AdamW optimizer 사용)
    lr_scheduler_type="constant", # learning rate scheduler (디폴트: linear)
    save_total_limit=10 # 저장할 checkpoints 최대 갯수 (저장용량 초과 에러 방지) 
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


### Metric 설정

In [None]:
from datasets import load_metric
import numpy as np

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

### Trainer 정의 및 실행

In [None]:
import torch

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    tokenizer=feature_extractor,
)

In [None]:
trainer.train()

***** Running training *****
  Num examples = 4500
  Num Epochs = 10
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 2820


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.614396,0.954
2,0.905100,0.289129,0.96
3,0.905100,0.194291,0.972
4,0.148200,0.179638,0.96
5,0.148200,0.16811,0.962
6,0.052100,0.179907,0.956
7,0.052100,0.19263,0.952
8,0.027200,0.19103,0.954
9,0.016500,0.201624,0.954
10,0.016500,0.213412,0.952


***** Running Evaluation *****
  Num examples = 500
  Batch size = 16
Saving model checkpoint to test-cifar-10/checkpoint-282
Configuration saved in test-cifar-10/checkpoint-282/config.json
Model weights saved in test-cifar-10/checkpoint-282/pytorch_model.bin
Feature extractor saved in test-cifar-10/checkpoint-282/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 500
  Batch size = 16
Saving model checkpoint to test-cifar-10/checkpoint-564
Configuration saved in test-cifar-10/checkpoint-564/config.json
Model weights saved in test-cifar-10/checkpoint-564/pytorch_model.bin
Feature extractor saved in test-cifar-10/checkpoint-564/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 500
  Batch size = 16
Saving model checkpoint to test-cifar-10/checkpoint-846
Configuration saved in test-cifar-10/checkpoint-846/config.json
Model weights saved in test-cifar-10/checkpoint-846/pytorch_model.bin
Feature extractor saved in test-cifar-10/checkpoint-846/pr

TrainOutput(global_step=2820, training_loss=0.20501939345758857, metrics={'train_runtime': 1913.4984, 'train_samples_per_second': 23.517, 'train_steps_per_second': 1.474, 'total_flos': 3.48738956568576e+18, 'train_loss': 0.20501939345758857, 'epoch': 10.0})

## Evaluation

In [None]:
outputs = trainer.predict(test_ds)

***** Running Prediction *****
  Num examples = 2000
  Batch size = 16


In [None]:
print(outputs.metrics)

{'test_loss': 0.19178876280784607, 'test_accuracy': 0.97, 'test_runtime': 30.8608, 'test_samples_per_second': 64.807, 'test_steps_per_second': 4.05}


<div class="alert alert-block" style="border: 2px solid #1976D2;background-color:#E3F2FD;padding:5px;font-size:0.9em;">
본 자료는 저작권법 제25조 2항에 의해 보호를 받습니다. 본 자료를 외부에 공개하지 말아주세요.<br>
<b><a href="https://school.fun-coding.org/">잔재미코딩 (https://school.fun-coding.org/)</a> 에서 본 강의를 포함하는 최적화된 로드맵도 확인하실 수 있습니다</b></div>