# Nano-Challenge-2
Apple Developer Academy 활동 중 Nano Challenge 2에서 coremltools를 이용하여 Pytorch로 만든 모델을 iOS app에서 배포하여 사용할 수 있도록 convert하는 과정을 시도한 기록..



In [2]:
!pip install transformers
!pip install coremltools

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# 캡션 생성 모델

In [3]:
import requests
import torch
from PIL import Image
from transformers import (
    VisionEncoderDecoderModel, 
    ViTFeatureExtractor, 
    PreTrainedTokenizerFast,
)

# device setting
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# load feature extractor and tokenizer
encoder_model_name_or_path = "ddobokki/vision-encoder-decoder-vit-gpt2-coco-ko"
feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_model_name_or_path)
tokenizer = PreTrainedTokenizerFast.from_pretrained(encoder_model_name_or_path)

# load model
model = VisionEncoderDecoderModel.from_pretrained(encoder_model_name_or_path)
model.to(device)

# inference
# url = 'https://i.pinimg.com/736x/44/17/96/441796cc66ffefc7100ae6e13978781b.jpg'
url = 'https://t1.daumcdn.net/thumb/R720x0/?fname=http://t1.daumcdn.net/brunch/service/user/5xq2/image/0lp8RLaJ2IgctTWVl2nEa-JRCSc.jpg'
with Image.open(requests.get(url, stream=True).raw) as img:
    pixel_values = feature_extractor(images=img, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values.to(device),num_beams=5)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

Downloading:   0%|          | 0.00/228 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/109 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/4.71k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/985M [00:00<?, ?B/s]



In [4]:
generated_text

['해 질 녘에 해변에 앉아 있는 바다']

## 모델 배포

In [None]:
example = torch.rand(1, 3, 224, 224, device="cuda")
input_ids = tokenizer.encode("바람이 분다.", return_tensors="pt").to(device)

model = torch.jit.trace(model, (example, input_ids))

# 시 생성 모델

In [5]:
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForCausalLM
import coremltools as ct



In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# device setting
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# load model and tokenizer
model_name_or_path = "ddobokki/gpt2_poem"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model.to(device)

keyword_start_token = "<k>"
keyword_end_token = "</k>"
text = generated_text[0]
input_text = keyword_start_token + text + keyword_end_token

input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
gen_ids = model.generate(
    input_ids, max_length=64, num_beams=100, no_repeat_ngram_size=2
)

generated = tokenizer.decode(gen_ids[0, :].tolist(), skip_special_tokens=True)

Downloading:   0%|          | 0.00/226 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/155 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/513M [00:00<?, ?B/s]

In [7]:
caption_size = len(text)

generated[caption_size:]

' 해가 뜨고\n해가 지고\n해 질녘이 되면\n어디선가 들려오는\n파도소리\n그 파도소리 들으며\n바다가 그리워진다\n바닷가에 앉아\n바다를 그리워하는\n내 모습이 떠오른다'

## 모델 배포

In [8]:
class PeomGenerate(torch.nn.Module):
    def __init__(self):
        super(PeomGenerate, self).__init__()
        self.next_token_predictor = AutoModelForCausalLM.from_pretrained(model_name_or_path)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
    
    def forward(self, x):
        input_ids = x
        
        keyword_start_token = torch.tensor([[51200]])
        keyword_end_token = torch.tensor([[51201]])
        input_ids = torch.cat([keyword_start_token,input_ids], dim=0)
        input_ids = torch.cat([input_ids,keyword_end_token], dim=0)

        predictions = self.next_token_predictor.generate(
            input_ids, max_length=64, num_beams=100, no_repeat_ngram_size=2
        )
        
        sentence = self.tokenizer.decode(predictions[0, :].tolist(), skip_special_tokens=True)
        
        return sentence

In [None]:
# token_predictor = AutoModelForCausalLM.from_pretrained(model_name_or_path, torchscript=True).eval()

In [None]:
# random_tokens = torch.randint(10000, (5,))
# traced_token_predictor = torch.jit.trace(token_predictor, random_tokens)

In [None]:
model = PeomGenerate()

random_tokens = torch.randint(10000, (5,1))
traced_model = torch.jit.trace(model, random_tokens)
print(traced_model)

  # Remove the CWD from sys.path while we load stuff.
  # This is added back by InteractiveShellApp.init_path()
  if input_ids_seq_length >= max_length:
  self._done = torch.tensor([False for _ in range(batch_size)], dtype=torch.bool, device=self.device)
  if num_beams * batch_size != batch_beam_size:
  if batch_size <= 0:
  value.size(-1) ** 0.5, dtype=attn_weights.dtype, device=attn_weights.device
  value.size(-1) ** 0.5, dtype=attn_weights.dtype, device=attn_weights.device
  mask_value = torch.tensor(mask_value, dtype=attn_weights.dtype).to(attn_weights.device)
  if cur_len + 1 < ngram_size:
  gen_tokens = prev_input_ids[idx].tolist()
  ngram_idx = tuple(prev_input_ids[start_idx:cur_len].tolist())
  if not (batch_size == (input_ids.shape[0] // self.group_size)):
  if not (batch_size == (input_ids.shape[0] // self.group_size)):
  if self._done[batch_idx]:
  zip(next_tokens[batch_idx], next_scores[batch_idx], next_indices[batch_idx])
  if (eos_token_id is not None) and (next_token.ite

In [None]:
scripted_model = torch.jit.script(model)

In [None]:
sentence_fragment = "바람이 분다."

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
context = torch.tensor(tokenizer.encode(sentence_fragment))
context

In [None]:
torch_out = scripted_model(context)
generated_text_torch = tokenizer.decode(torch_out)
print("Fragment: {}".format(sentence_fragment))
print("Completed: {}".format(generated_text_torch))

# 처음부터 다시 만들어 보기

In [None]:
import requests
import torch
from PIL import Image
from transformers import (
    VisionEncoderDecoderModel, 
    ViTFeatureExtractor, 
    PreTrainedTokenizerFast,
)

# device setting
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# load feature extractor and tokenizer
encoder_model_name_or_path = "ddobokki/vision-encoder-decoder-vit-gpt2-coco-ko"
feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_model_name_or_path)
tokenizer = PreTrainedTokenizerFast.from_pretrained(encoder_model_name_or_path)

# load model
model = VisionEncoderDecoderModel.from_pretrained(encoder_model_name_or_path)
model.to(device)

# inference
url = 'https://modo-phinf.pstatic.net/20170208_281/14865453315606kNKk_JPEG/mosa7CEoze.jpeg?type=w1100'
with Image.open(requests.get(url, stream=True).raw) as img:
    pixel_values = feature_extractor(images=img, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values.to(device),num_beams=5)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)



In [None]:
generated_text

['두명의 소녀와 한명의 소녀가 크리스마스 트리 앞에서 포즈를 취하고 있다.']

In [None]:
!pip install coremltools

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting coremltools
  Downloading coremltools-5.2.0-cp37-none-manylinux1_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 4.1 MB/s 
Installing collected packages: coremltools
Successfully installed coremltools-5.2.0


In [None]:
import coremltools as ct

mlmodel = ct.convert(model, convert_to="mlprogram")



ValueError: ignored