In [None]:
!pip install openai


In [2]:
import os
from PIL import Image
from transformers import pipeline

#unimodal: 'image-to-text'
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
#print(captioner.model)
#print("==="*20)
# remote http file
rs = captioner("https://github.com/websolutespa/ai-crash-course/blob/11b8f7fd04638c35536f8827e4dc667ed5b84ed2/02/iris.jpg?raw=true",)
print(f"http: {rs[0]['generated_text']}")
# local file
image_path = os.path.join(os.getcwd(), "../02/iris.jpg")
rs = captioner(image_path)
print(f"local: {rs[0]['generated_text']}")
# load image with PIL
image = Image.open(image_path)
rs = captioner(image)
print(f"PIL: {rs[0]['generated_text']}")
#image.show()


Device set to use cpu


http: structure of a flower
local: structure of a flower
PIL: structure of a flower


In [None]:
from PIL import Image
from transformers import pipeline

#multi-modal: 'visual-question-answering'
vqa_pipeline = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
image =  Image.open("../02/iris.jpg")
question ="What flower is this?"
vqa_pipeline(image, question, top_k=5)

Device set to use cpu


[{'score': 0.29455986618995667, 'answer': 'rose'},
 {'score': 0.24966226518154144, 'answer': 'lily'},
 {'score': 0.21617119014263153, 'answer': 'tulip'},
 {'score': 0.16161340475082397, 'answer': 'tulips'},
 {'score': 0.09409495443105698, 'answer': 'orange'}]

In [6]:
from transformers import pipeline

nlp = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
#multi-modal: 'document-question-answering'
nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?",
)

Device set to use cpu


[{'score': 0.9998127222061157, 'answer': 'us-001', 'start': 15, 'end': 15}]

In [None]:
from openai import OpenAI
client = OpenAI()

prompt = "Cite a Walt Whitman verse about human connection."

# unimodal: implicit role, implicit text-only
response = client.responses.create(
    model="gpt-4.1-mini",
    input=prompt
)
print(response.output_text)

print("-----"*30)

# unimodal: explicit role, implicit text-only
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": prompt
    }],
)
print(response.output_text)

print("-----"*30)

# unimodal: explicit role, explicit text-only
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": { "type": "text", "text": prompt }
    }],
)
print(response.output_text)

One well-known verse by Walt Whitman that touches on themes of human connection is from "Leaves of Grass." In the poem "Song of Myself," he writes:

"For every atom belonging to me as good belongs to you."

This line emphasizes the interconnectedness of all people, highlighting the shared essence of humanity.
------------------------------------------------------------------------------------------------------------------------------------------------------
Certainly! Here’s a verse from Walt Whitman’s *Leaves of Grass* that reflects human connection:

**“I am large, I contain multitudes.”**

This line, from the poem *Song of Myself*, emphasizes the complexity and inclusiveness of the self, suggesting an intrinsic connection between individuals through shared human experiences.


In [None]:
from openai import OpenAI

client = OpenAI()

# multi-modal model: different ways to send image inputs

# text-only message with image url
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": "Describe the image at this URL: https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    }],
)
print(response.output_text)

print("==="*20)

# multi-modal message with text and image
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "what's in this image?"},
            {
                "type": "input_image",
                "image_url": "https://github.com/websolutespa/ai-crash-course/blob/11b8f7fd04638c35536f8827e4dc667ed5b84ed2/02/iris.jpg?raw=true",
            },
        ],
    }],
)

print(response.output_text)

The image shows a scenic nature boardwalk in a lush, green environment. The boardwalk is made of wooden planks and stretches into the distance, curving slightly to the left. On both sides of the boardwalk, there is dense greenery including tall grasses, shrubs, and various plants, creating a natural, vibrant atmosphere. The sky above is mostly cloudy but bright, casting a soft light over the landscape. The setting appears to be peaceful and inviting, perfect for a leisurely walk to enjoy nature. This image is identified as "Gfp-wisconsin-madison-the-nature-boardwalk," suggesting it is located in Wisconsin, possibly near Madison.
This image is a labeled diagram illustrating the structure of a flower. The diagram shows the internal and external parts of a flower with labels pointing to each part. The labeled parts are:

- Anther
- Filament
- Stamen (indicated with a male symbol)
- Petal
- Sepal
- Receptacle
- Pedicel
- Stigma
- Style
- Pistil (indicated with a female symbol)
- Ovary
- Ov

In [None]:
from openai import OpenAI
import base64

client = OpenAI() 

# multi-modal model: output image generation

# image generation with text-only input and image output using tool call
response = client.responses.create(
    model="gpt-4.1-mini",
    input="Generate a labeled diagram illustrating the structure of a flower. The diagram shows the internal and external parts of a flower with labels pointing to each part.",
    tools=[{"type": "image_generation"}]
)

# save image
image_data = [output.result for output in response.output if output.type == "image_generation_call"]
if image_data:
    image_base64 = image_data[0]
    with open("../04.ignore/flower_diagram.png", "wb") as f:
        f.write(base64.b64decode(image_base64))


In [1]:
from openai import OpenAI
import base64

client = OpenAI()  

# multi-modal output
# input text: output fragmented (text + image)
prompt = (
    "Produce one short caption (1 sentence) and a create small image "
    "of a red paper boat on a calm pond in flat illustration style, 256x256."
)

response = client.responses.create(
    model="gpt-4.1-mini",
    input=prompt,
    tools=[{"type": "image_generation"}]    
)

print(response)  

# text content fragments
if response.output_text:
    print("CAPTION:", response.output_text)
# image content: save image
image_data = [output.result for output in response.output if output.type == "image_generation_call"]
if image_data:
    image_base64 = image_data[0]
    with open("./tmp/out_red_boat.png", "wb") as f:
        f.write(base64.b64decode(image_base64))

Response(id='resp_059050027dfb184c0069048d89e8408196a19d469401b10319', created_at=1761906058.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-mini-2025-04-14', object='response', output=[ImageGenerationCall(id='ig_059050027dfb184c0069048d8b164c81969eb440b73c02877d', result='iVBORw0KGgoAAAANSUhEUgAABAAAAAQACAIAAADwf7zUAADSHGNhQlgAANIcanVtYgAAAB5qdW1kYzJwYQARABCAAACqADibcQNjMnBhAAAANvdqdW1iAAAAR2p1bWRjMm1hABEAEIAAAKoAOJtxA3VybjpjMnBhOmQ2ODgwYTkwLTc1YzktNDYxOS05YWRjLTQ3YzFhYjhkMjRlOQAAAAHBanVtYgAAAClqdW1kYzJhcwARABCAAACqADibcQNjMnBhLmFzc2VydGlvbnMAAAAA5Wp1bWIAAAApanVtZGNib3IAEQAQgAAAqgA4m3EDYzJwYS5hY3Rpb25zLnYyAAAAALRjYm9yoWdhY3Rpb25zgqNmYWN0aW9ubGMycGEuY3JlYXRlZG1zb2Z0d2FyZUFnZW50v2RuYW1lZkdQVC00b/9xZGlnaXRhbFNvdXJjZVR5cGV4Rmh0dHA6Ly9jdi5pcHRjLm9yZy9uZXdzY29kZXMvZGlnaXRhbHNvdXJjZXR5cGUvdHJhaW5lZEFsZ29yaXRobWljTWVkaWGhZmFjdGlvbm5jMnBhLmNvbnZlcnRlZAAAAKtqdW1iAAAAKGp1bWRjYm9yABEAEIAAAKoAOJtxA2MycGEuaGFzaC5kYXRhAAAAAHtjYm9ypWpleGNsdXNpb25zgaJlc3RhcnQYIWZ

In [None]:
from openai import OpenAI

client = OpenAI() 
# multi-modal: input with text (with history) and multiple images
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[
        {"role": "system", "content": "You are a helpful assistant that analyzes images."},
        {"role": "user", "content": "Hi, my name is Iris and I have some images to analyze."},
        {"role": "assistant", "content": "Sure Iris! Please provide the images you would like me to analyze."},
        {
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Describe similarities and differences of this images, and... do you find any affinity with my first name?."},
            {
                "type": "input_image",
                "image_url": "https://github.com/websolutespa/ai-crash-course/blob/11b8f7fd04638c35536f8827e4dc667ed5b84ed2/02/iris.jpg?raw=true",
            },
            {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Iris_sanguinea_cultivar%2C_Wakehurst_Place%2C_UK_-_Diliff.jpg/800px-Iris_sanguinea_cultivar%2C_Wakehurst_Place%2C_UK_-_Diliff.jpg",
            },            
        ],
    }],
)

print(response.output_text)

Here is the analysis of the similarities and differences between the two images:

Similarities:
1. Both images are related to flowers.
2. Each flower has petals, stamens, and pistils.
3. They both have vibrant colors.
4. Both images display parts of a flower essential for its reproduction.

Differences:
1. The first image is a labeled diagram showing the structure of a flower, with parts such as stamen, pistil, petal, sepal, receptacle, ovary, and ovule clearly labeled. It gives a botanical and educational representation.
2. The second image is a photograph of a real flower, specifically an iris flower, showcasing its natural beauty and colors.
3. The flower in the diagram is orange with a simple shape, while the iris flower is purple with a more complex and delicate shape.
4. The diagram is symbolic and simplified to explain components, whereas the iris photograph is detailed and natural.

Affinity with your name "Iris":
Yes, there is a strong affinity with your name. The second image