# midjourney-mini 모델 활용 텍스트-이미지 생성

### inference API 호출

https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
<br>에서 제공하는 방법 - stable-difusion-xl-base-1.0

```python
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="nscale",
    api_key=os.environ["HF_TOKEN"],
)

# output is a PIL.Image object
image = client.text_to_image(
    "Astronaut riding a horse",
    model="stabilityai/stable-diffusion-xl-base-1.0",
)
```

---

https://huggingface.co/openskyml/midjourney-mini
<br> 에서 제공하는 API 하는 방법에서 API_URL, headers 를 활용한다.

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/midjourney-community/midjourney-mini"
headers = {"Authorization": "Bearer hf_token"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content
image_bytes = query({
    "inputs": "Astronaut riding a horse",
})
```

---

In [20]:
from dotenv import load_dotenv
load_dotenv()

True

In [21]:
import os
HF_TOKEN = os.getenv("HF_TOKEN")

In [28]:
API_URL = "https://router.huggingface.co/hf-inference/models/stabilityai/stable-diffusion-xl-base-1.0"
headers = {"Authorization": f"Bearer {HF_TOKEN}"}

In [36]:
import requests

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content

In [41]:
results = query({
    "inputs": "Paparazzi style close-up shot, caught off-guard moment, woman with striking features turning toward camera, extreme close-up face and shoulders, motion blur crowd, grainy high ISO photography, harsh flash lighting, raw candid energy, wearing designer grey coat, street style fashion, shot from below angle, Paris Fashion Week vibe --ar 4:5 --s 250 -raw"
})

In [42]:
results

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\' ",#\x1c\x1c(7),01444\x1f\'9=82<.342\xff\xdb\x00C\x01\t\t\t\x0c\x0b\x0c\x18\r\r\x182!\x1c!22222222222222222222222222222222222222222222222222\xff\xc0\x00\x11\x08\x04\x00\x04\x00\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\

In [43]:
from PIL import Image
import io

image =Image.open(io.BytesIO(results))
image.save("result.png")

In [44]:
image.show()

In [45]:
results = query({
    "inputs": "Phone photo from inside fish tank. Woman's face reflected/distorted on glass, pensive, dark hair. Orange goldfish, green plants, air bubbles inside. Warm light, authentic, underwater perspective, slight blur.",
})
image = Image.open(io.BytesIO(results))
image.save("result1.png")
image.show()

In [46]:
results = query({
    "inputs": "A photo of a woman in a red dress, taken from behind, with a red background. The woman is wearing a red dress and has long red hair. The background is red. The photo is taken from behind. The woman is wearing a red dress and has long red hair. The background is red. The photo is taken from behind.",
})
image = Image.open(io.BytesIO(results))
image.save("result2.png")
image.show()

### DiffusionPipeline 이용

In [47]:
!pip install diffusers

Collecting diffusers
  Downloading diffusers-0.35.2-py3-none-any.whl.metadata (20 kB)
Downloading diffusers-0.35.2-py3-none-any.whl (4.1 MB)
   ---------------------------------------- 0.0/4.1 MB ? eta -:--:--
   ---------------------------------------- 4.1/4.1 MB 49.5 MB/s  0:00:00
Installing collected packages: diffusers
Successfully installed diffusers-0.35.2


https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

```python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt).images[0]
```
---

In [48]:
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("midjourney-community/midjourney-mini")

Exception in thread Thread-98 (_readerthread):
Traceback (most recent call last):
  File "c:\Users\Playdata\anaconda3\envs\llm_env\Lib\threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "c:\Users\Playdata\anaconda3\envs\llm_env\Lib\threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "c:\Users\Playdata\anaconda3\envs\llm_env\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 6: invalid start byte


model_index.json:   0%|          | 0.00/578 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

text_encoder/pytorch_model.bin:   0%|          | 0.00/492M [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

unet/diffusion_pytorch_model.bin:   0%|          | 0.00/2.32G [00:00<?, ?B/s]

vae/diffusion_pytorch_model.bin:   0%|          | 0.00/335M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

An error occurred while trying to fetch C:\Users\Playdata\.cache\huggingface\hub\models--midjourney-community--midjourney-mini\snapshots\87f7e660d8ca59c19f2f6e60792ce32492a0bffc\vae: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\Playdata\.cache\huggingface\hub\models--midjourney-community--midjourney-mini\snapshots\87f7e660d8ca59c19f2f6e60792ce32492a0bffc\vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch C:\Users\Playdata\.cache\huggingface\hub\models--midjourney-community--midjourney-mini\snapshots\87f7e660d8ca59c19f2f6e60792ce32492a0bffc\unet: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\Playdata\.cache\huggingface\hub\models--midjourney-community--midjourney-mini\snapshots\87f7e660d8ca59c19f2f6e60792ce32492a0bffc\unet.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.


In [51]:
import os
from uuid import uuid4

def generate_image(prompt: str, save_dir: str = "./generated_img"):
    try:
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
        
        file_name = f"{uuid4()}.png"
        file_path = os.path.join(save_dir, file_name)
        
        image = pipeline(prompt).images[0]
        image.save(file_path)
        
        return file_path
    
    except Exception as e:
        print(f"Error generating image: {e}")
        
        return None


In [50]:
generate_image(input("이미지로 만들 prompt를 입력해주세요 :"))

Token indices sequence length is longer than the specified maximum sequence length for this model (290 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ["a playful and slightly dreamy expression, as if savoring dessert. pout happily to the camera. on the table in front of her are a glass goblet filled with tiramisu dusted with cocoa powder, a cup of cappuccino with latte art on the foam placed on a white saucer, a spoon resting beside the dessert, a napkin, and her modern smartphone placed flat on the table. a small silver holder with sugar packets is also visible. the background shows a bustling european cobblestone street with several pedestrians walking, dressed casual. behind heris a historic stone building with arched wooden doors, ornate architectural details, and carved signs above the entrance, reflecting classical european style. other buildings o

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
prompt ="""A candid street cafe photography scene in a European old town setting, during daylight.A young elegant woman is sitting at a small round outdoor cafe table covered with a crisp white tablecloth.She is wearing a simple, stylish black long-sleeve top, with minimal but chic jewelry small hoop earrings and rings on her fingers.She is holding a small spoon on her lips with a playful and slightly dreamy expression, as if savoring dessert. Pout happily to the camera.

On the table in front of her are a glass goblet filled with tiramisu dusted with cocoa powder, a cup of cappuccino with latte art on the foam placed on a white saucer, a spoon resting beside the dessert, a napkin, and her modern smartphone placed flat on the table. A small silver holder with sugar packets is also visible.

The background shows a bustling European cobblestone street with several pedestrians walking, dressed casual. Behind heris a historic stone building with arched wooden doors, ornate architectural details, and carved signs above the entrance, reflecting classical European style. Other buildings on the street have a mix of rustic stone facades and modern shopfronts, some with signage, windows, and balconies. The atmosphere feels lively yet relaxed, with warm natural sunlight illuminating the scene.

Do not change the face and maintain the correct proportions of the character's facial features such as eyes, nose and mouth. Dont distort the face. Ultra realistic photo."""

In [55]:
image = Image.open("./generated_img/3f1268ed-3623-4aea-8ab2-8ad72cd8b84c.png")
image.show()

In [56]:
prompt = """A photorealistic frontal editorial portrait of a Gen Z boy with short buzz-cut hair and silver hoop earrings, shot on a 50 mm lens under harsh direct flash. He wears a dark-green vintage hoodie with small white logo text. The wall behind him is a solid soft beige color, evenly lit to make his facial features stand out clearly without glare. His calm expression and skin texture are sharp and realistic. Lens smudges and fine film grain give the photo an authentic analog feel."""
generate_image(prompt)

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['and skin texture are sharp and realistic. lens smudges and fine film grain give the photo an authentic analog feel.']


  0%|          | 0/50 [00:00<?, ?it/s]

'./generated_img\\f9717956-00a8-43b6-b5c4-1b462905de91.png'

In [None]:
image = Image.open(file_path)
image.show()

NameError: name 'file_path' is not defined