# READ ME

1. The notebook should be on L4 GPU.
2. Upload input images from images, img-test (from github) for which captions are to be generated.
3. Run all the cells.
4. The model used is blip2-flan-t5-xl.
5. Caption will be generated for the input images.


**NOTE:**

This is the code to generate captions from blip2-flan-t5-xl model that is developed by Salesforce. This code file gets the pre trained model and processor to generate captions for uploaded input images.


**Additional:**

If you want to test the model on a particular image then at the end of examples the last cell has the code to execute in comments. Remove the comments and execute it.

# Packages

In [1]:
!pip install torch torchvision transformers pillow



In [2]:
from transformers import Blip2Processor, Blip2ForConditionalGeneration
from PIL import Image
import os

# Loading the pre trained model

In [3]:
# Load BLIP-2 processor and model
processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl")  # Use a suitable BLIP-2 variant
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl").to("cuda")  # Move to GPU if available

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Testing the model on the example images

In [4]:
def generate_caption(image_path):
    """
    Generate a caption for a single image using BLIP-2.

    Args:
        image_path (str): Path to the input image.

    Returns:
        str: Generated caption.
    """
    image = Image.open(image_path).convert("RGB")  # Ensure the image is in RGB format
    inputs = processor(images=image, text="a picture of", return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs)
    caption = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
    return caption

# Examples

## Images from img-test folder

In [5]:
# Example: Captioning a single image
image_path = "image-1.png"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")



Generated Caption for image-1.png: a cabin in the snow with trees surrounding it


In [6]:
# Example: Captioning a single image
image_path = "image-2.png"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for image-2.png: zebras in the grass at sunset


In [7]:
# Example: Captioning a single image
image_path = "image-3.png"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for image-3.png: the university of maryland with a statue in front


## Images from images folder

In [8]:
# Example: Captioning a single image
image_path = "img1.jpeg"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for img1.jpeg: a woman with glasses and a hoodie


In [9]:
# Example: Captioning a single image
image_path = "img3.jpeg"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for img3.jpeg: a woman with blue eyes


In [10]:
# Example: Captioning a single image
image_path = "img4.jpeg"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for img4.jpeg: an older woman with gray hair


In [11]:
# Example: Captioning a single image
image_path = "img8.jpeg"  # Replace with your image path
caption = generate_caption(image_path)
print(f"Generated Caption for {image_path}: {caption}")

Generated Caption for img8.jpeg: a man with a beard
