In [None]:
import os
from openai import OpenAI
import base64
from google.colab import userdata
from google.colab import drive

In [None]:
api_key = userdata.get('OPENAI_API_KEY')
MODEL = "gpt-4o-mini"

openai = OpenAI(api_key=api_key)

**Sending images as input prompt and generating response based on the data inside thoese images**

# Chat Completion API

https://platform.openai.com/docs/guides/images?api-mode=chat

In [None]:
response = openai.chat.completions.create(
    model=MODEL,
    messages= [
        { "role": "user", "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ]}
    ]
)

In [None]:
print(response.choices[0].message.content)

The image depicts a scenic landscape featuring a wooden pathway winding through a lush green field. The pathway extends into the distance, leading towards a horizon filled with greenery and trees under a blue sky adorned with some clouds. The scene conveys a peaceful and natural environment, ideal for walking or enjoying the outdoors.


### Base64 encoded image

In [None]:
# Upload image in your googld drive
# Connect this Colab to my Google Drive
# It will ask permissions to connect your google drive with colab

drive.mount("/content/drive")
image_path = "/content/drive/MyDrive/Temp/car.jpg"

Mounted at /content/drive


In [None]:
# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# If you run this example on your local machine then you just need provide image file path
# Path to your image
# image_path = "./car.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "what's in this image?" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

The image shows a sleek, high-performance sports car surrounded by dramatic lighting and colored smoke effects. The car has an aerodynamic design, featuring a prominent rear wing and stylish wheels. The lighting and smoke create a dynamic, almost cinematic atmosphere.


# Responses API

https://platform.openai.com/docs/guides/images?api-mode=responses

Following values of property `type` are changed in Responses API

text -> input_text

image_url -> input_image

```
"content": [
    { "type": "input_text", "text": "what's in this image?" },
    {
        "type": "input_image",
        "image_url": f"data:image/jpeg;base64,{base64_image}",
    },
],
```

In [None]:
response = openai.responses.create(
    model=MODEL,
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "what's in this image?"},
            {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            },
        ],
    }],
)

In [None]:
print(response.output_text)

The image features a scenic natural landscape with a wooden path or boardwalk running through a lush green area. On either side of the path, there are tall grasses and some vegetation. The sky above is partly cloudy with a blue hue, indicating a bright day. This setting is likely near a wetland or grassy field, suggesting a tranquil outdoor environment.


### Base64 encoded image

Google Drive is already mounted in above example

In [None]:
# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
# image_path = "./car.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)


response = openai.responses.create(
    model="gpt-4o",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "what's in this image?" },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                },
            ],
        }
    ],
)

print(response.output_text)

The image shows a sleek, high-performance sports car enveloped in colorful smoke. The car is likely parked or moving slowly as the smoke swirls around it, creating a dramatic and dynamic effect. The lighting highlights the car's aerodynamic features and vibrant tail lights.
