
# 🚀 OpenFlamingo 이미지 설명 생성 실습

이 노트북은 **OpenFlamingo** 모델을 사용하여 이미지를 설명하는 멀티모달(Multimodal) AI 예제입니다.

**환경**: RunPod A40 GPU, PyTorch 2.0.1



In [1]:

# 필요한 라이브러리 설치
!pip install torch==2.0.1 torchvision==0.15.2 transformers==4.30.0 einops sentencepiece open-flamingo open_clip_torch einops-exts


Collecting transformers==4.30.0
  Downloading transformers-4.30.0-py3-none-any.whl.metadata (113 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.6/113.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting einops
  Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting sentencepiece
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting open-flamingo
  Downloading open_flamingo-2.0.1-py3-none-any.whl.metadata (14 kB)
Collecting open_clip_torch
  Downloading open_clip_torch-2.32.0-py3-none-any.whl.metadata (31 kB)
Collecting einops-exts
  Downloading einops_exts-0.0.4-py3-none-any.whl.metadata (621 bytes)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.30.0)
  Downloading huggingface_hub-0.33.0-py3-none-any.whl.metadata (14 kB)
Collecting regex!=2019.12.17 (from transformers==4.30.0)
  Downloading regex-2024.11.6-cp310-cp310-manylinux_2_17_x8

In [2]:
!mkdir -p /workspace/OpenFlamingo

!wget https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b/resolve/main/checkpoint.pt \
  -O /workspace/OpenFlamingo/checkpoint.pt

--2025-06-13 07:03:50--  https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b/resolve/main/checkpoint.pt
Resolving huggingface.co (huggingface.co)... 3.161.213.11, 3.161.213.58, 3.161.213.25, ...
Connecting to huggingface.co (huggingface.co)|3.161.213.11|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.hf.co/repos/ab/66/ab66332de56d053921676567f6b428824815059fb070c094011b59e21d3d0852/ed5a634ff8c022cf437ec245838a00b0c05bef6963524c5d0dfabe75ce701514?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27checkpoint.pt%3B+filename%3D%22checkpoint.pt%22%3B&Expires=1749801830&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0OTgwMTgzMH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy9hYi82Ni9hYjY2MzMyZGU1NmQwNTM5MjE2NzY1NjdmNmI0Mjg4MjQ4MTUwNTlmYjA3MGMwOTQwMTFiNTllMjFkM2QwODUyL2VkNWE2MzRmZjhjMDIyY2Y0MzdlYzI0NTgzOGEwMGIwYzA1YmVmNjk2MzUyNGM1ZDBkZmFiZTc1Y2U3MDE1MTQ%7EcmVzcG9uc2UtY29udGV

In [3]:

# 필수 라이브러리 임포트
import torch
from open_flamingo import create_model_and_transforms
from PIL import Image
import requests
from io import BytesIO

# GPU 설정
device = torch.device('cuda')

# 모델 로딩 (정확한 모델 조합 사용: ViT-L-14 + MPT-7B)
model, image_processor, tokenizer = create_model_and_transforms(
    clip_vision_encoder_path='ViT-L-14',
    clip_vision_encoder_pretrained='openai',
    lang_encoder_path='anas-awadalla/mpt-7b',
    tokenizer_path='anas-awadalla/mpt-7b',
    cross_attn_every_n_layers=4
)
    
model.to(device)

# OpenFlamingo 체크포인트 로딩 (사전 다운로드 필수)
checkpoint_path = '/workspace/OpenFlamingo/checkpoint.pt'
model.load_state_dict(torch.load(checkpoint_path), strict=False)
model.eval()


open_clip_model.safetensors:   0%|          | 0.00/1.71G [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Using pad_token, but it is not set yet.


config.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

configuration_mpt.py:   0%|          | 0.00/9.20k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- configuration_mpt.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_mpt.py:   0%|          | 0.00/18.4k [00:00<?, ?B/s]

meta_init_context.py:   0%|          | 0.00/3.64k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- meta_init_context.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


param_init_fns.py:   0%|          | 0.00/12.6k [00:00<?, ?B/s]

norm.py:   0%|          | 0.00/2.56k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- norm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- param_init_fns.py
- norm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


flash_attn_triton.py:   0%|          | 0.00/28.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- flash_attn_triton.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


adapt_tokenizer.py:   0%|          | 0.00/1.75k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- adapt_tokenizer.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


attention.py:   0%|          | 0.00/17.7k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- attention.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


blocks.py:   0%|          | 0.00/2.55k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- blocks.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


hf_prefixlm_converter.py:   0%|          | 0.00/27.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- hf_prefixlm_converter.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/anas-awadalla/mpt-7b:
- modeling_mpt.py
- meta_init_context.py
- param_init_fns.py
- flash_attn_triton.py
- adapt_tokenizer.py
- attention.py
- blocks.py
- hf_prefixlm_converter.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

pytorch_model-00001-of-00003.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00003.bin:   0%|          | 0.00/9.93G [00:00<?, ?B/s]

pytorch_model-00003-of-00003.bin:   0%|          | 0.00/6.71G [00:00<?, ?B/s]

You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Flamingo model initialized with 1384781840 trainable parameters


Flamingo(
  (vision_encoder): VisionTransformer(
    (conv1): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
    (patch_dropout): Identity()
    (ln_pre): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    (transformer): Transformer(
      (resblocks): ModuleList(
        (0-23): 24 x ResidualAttentionBlock(
          (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=1024, out_features=4096, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=4096, out_features=1024, bias=True)
          )
          (ls_2): Identity()
        )
      )
    )
    (ln_post): LayerNorm((1024,), eps=1e-0

In [4]:

# 이미지 불러오기
image_url = 'https://images.unsplash.com/photo-1518791841217-8f162f1e1131'
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert('RGB')

# 이미지 전처리 (모델 입력 형태로 변환)
vision_x = image_processor(image).unsqueeze(0).unsqueeze(0).unsqueeze(2).to(device)
print("이미지 처리 완료, shape:", vision_x.shape)


이미지 처리 완료, shape: torch.Size([1, 1, 1, 3, 224, 224])


In [5]:

# 프롬프트 설정 ('<image>'는 필수 입력 토큰)
prompt = "<image>Describe this image in detail:"
tokenized_prompt = tokenizer(prompt, return_tensors='pt').to(device)

# 설명 생성
with torch.no_grad():
    outputs = model.generate(
        vision_x=vision_x,
        lang_x=tokenized_prompt['input_ids'],
        attention_mask=tokenized_prompt['attention_mask'],
        max_new_tokens=50,
        num_beams=3,
        do_sample=True, 
        temperature=0.7,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
    )

# 생성 결과 출력
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print('📝 OpenFlamingo가 생성한 이미지 설명:', generated_text)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


📝 OpenFlamingo가 생성한 이미지 설명: Describe this image in detail: A tabby cat sits on a couch
