## Automatic Image Captioning for LoRA Dataset Generation

This notebook generates text captions for the preprocessed image dataset
using the BLIP (Bootstrapped Language-Image Pretraining) model. The generated
captions are further enhanced using controlled random variations in viewing
angles, locations, and color descriptions. These captions are saved as
individual `.txt` files and are later used for LoRA and PEFT fine-tuning of
the Stable Diffusion model.

In [9]:
import random
from pathlib import Path
from tqdm import tqdm
import torch
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re

#### PATH CONFIG

In [10]:
IMG_DIR = Path(r"D:\work_space\projects\deep_learning\CAP6415_F25_project-Finding-and-solving-hard-to-generate-examples\data\processed\lora_ready")
DEBUG = True
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)

cuda


#### EXTRA CAPTIONS 

In [11]:
# SAFE VARIATION POOLS
VIEWS = [
    "a close-up of",
    "a detailed view of",
    "a wide view of",
    "a clear view of",
    "a ground-level view of"
]

LOCATIONS = [
    "on an asphalt road surface",
    "on an asphalt street",
    "on a paved roadway",
    "in a parking area on asphalt",
    "on a residential asphalt road"
]

COLORS = [
    "",  # allow some captions without color
    "yellow and black",
    "black and yellow"
]

#### BLIP FOR COLOR HINTS ONLY

In [None]:
# BLIP 
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained(
    "Salesforce/blip-image-captioning-base"
).to(DEVICE)

In [13]:
def extract_color_hint(blip_text):
    blip_text = blip_text.lower()
    if "yellow" in blip_text and "black" in blip_text:
        return random.choice(["yellow and black", "black and yellow"])
    if "yellow" in blip_text:
        return "yellow"
    return ""

#### MAIN CAPTION GENERATER

In [14]:
# CAPTION GENERATOR
def generate_caption(raw_blip):
    view = random.choice(VIEWS)
    location = random.choice(LOCATIONS)
    color = extract_color_hint(raw_blip)

    if color:
        caption = f"{view} {color} speed bump {location}, realistic photo"
    else:
        caption = f"{view} a speed bump {location}, realistic photo"

    caption = re.sub(r"\s+", " ", caption).strip()
    return caption

In [None]:
# CAPTION LOOP
image_paths = sorted(list(IMG_DIR.glob("*.jpg")))
print(f"\nFound {len(image_paths)} images\n")

count = 0
for img_path in tqdm(image_paths, desc="Safe Varied Captioning"):

    try:
        image = Image.open(img_path).convert("RGB")
        inputs = processor(image, return_tensors="pt").to(DEVICE)
        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens=40)
        raw_caption = processor.decode(output[0], skip_special_tokens=True)
        final_caption = generate_caption(raw_caption)
        txt_path = img_path.with_suffix(".txt")
        with open(txt_path, "w", encoding="utf-8") as f:
            f.write(final_caption)
        if DEBUG:
            print(f"\nIMAGE: {img_path.name}")
            print(f"BLIP : {raw_caption}")
            print(f"FINAL: {final_caption}")

        count += 1
    except Exception as e:
        print(f"[ERROR] {img_path.name}: {e}")
print(f"\nCaptioning completed for {count} images.")


Found 61 images



Safe Varied Captioning:   2%|▏         | 1/61 [00:00<00:33,  1.77it/s]


IMAGE: speedbump_00000.jpg
BLIP : a yellow traffic light on the side of a road
FINAL: a close-up of yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:   3%|▎         | 2/61 [00:00<00:27,  2.18it/s]


IMAGE: speedbump_00001.jpg
BLIP : a yellow and black traffic light on the ground
FINAL: a wide view of yellow and black speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:   5%|▍         | 3/61 [00:01<00:23,  2.47it/s]


IMAGE: speedbump_00002.jpg
BLIP : a yellow and black traffic light on a street
FINAL: a clear view of black and yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:   7%|▋         | 4/61 [00:01<00:20,  2.85it/s]


IMAGE: speedbump_00003.jpg
BLIP : a yellow and black rubber floor mat
FINAL: a detailed view of black and yellow speed bump on a paved roadway, realistic photo


Safe Varied Captioning:   8%|▊         | 5/61 [00:01<00:21,  2.65it/s]


IMAGE: speedbump_00004.jpg
BLIP : a yellow and black metal rail with a black and yellow stripe
FINAL: a ground-level view of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  10%|▉         | 6/61 [00:02<00:20,  2.64it/s]


IMAGE: speedbump_00005.jpg
BLIP : a yellow and black plastic plate with a black stripe
FINAL: a close-up of yellow and black speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  11%|█▏        | 7/61 [00:02<00:19,  2.75it/s]


IMAGE: speedbump_00006.jpg
BLIP : a yellow and black object on a gray surface
FINAL: a ground-level view of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  13%|█▎        | 8/61 [00:02<00:18,  2.92it/s]


IMAGE: speedbump_00007.jpg
BLIP : a person using a wheel to remove a tire
FINAL: a wide view of a speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  15%|█▍        | 9/61 [00:03<00:18,  2.87it/s]


IMAGE: speedbump_00008.jpg
BLIP : a pair of black and yellow rubber floor mats
FINAL: a clear view of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  16%|█▋        | 10/61 [00:03<00:17,  2.87it/s]


IMAGE: speedbump_00009.jpg
BLIP : a yellow and black plastic cover on a concrete floor
FINAL: a clear view of black and yellow speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  18%|█▊        | 11/61 [00:04<00:18,  2.68it/s]


IMAGE: speedbump_00010.jpg
BLIP : a floor with a wooden floor and a black and brown floor
FINAL: a wide view of a speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  20%|█▉        | 12/61 [00:04<00:21,  2.29it/s]


IMAGE: speedbump_00011.jpg
BLIP : a pair of yellow and black plastic clips for the side of a black and yellow plastic clip
FINAL: a ground-level view of black and yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  21%|██▏       | 13/61 [00:05<00:19,  2.47it/s]


IMAGE: speedbump_00012.jpg
BLIP : two pieces of wood sit on the floor
FINAL: a detailed view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  23%|██▎       | 14/61 [00:05<00:17,  2.69it/s]


IMAGE: speedbump_00013.jpg
BLIP : a yellow and black plastic floor mat
FINAL: a detailed view of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  25%|██▍       | 15/61 [00:05<00:17,  2.67it/s]


IMAGE: speedbump_00014.jpg
BLIP : a yellow and black traffic cone on a wooden floor
FINAL: a detailed view of yellow and black speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  26%|██▌       | 16/61 [00:06<00:16,  2.66it/s]


IMAGE: speedbump_00015.jpg
BLIP : a pair of yellow and black diamond treads
FINAL: a wide view of yellow and black speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  28%|██▊       | 17/61 [00:06<00:17,  2.51it/s]


IMAGE: speedbump_00016.jpg
BLIP : a yellow and black plastic ramp with two black and yellow stripes
FINAL: a detailed view of black and yellow speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  30%|██▉       | 18/61 [00:06<00:17,  2.50it/s]


IMAGE: speedbump_00017.jpg
BLIP : a yellow and black floor with a black and yellow floor
FINAL: a close-up of yellow and black speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  31%|███       | 19/61 [00:07<00:15,  2.63it/s]


IMAGE: speedbump_00018.jpg
BLIP : a pair of yellow and black plastic blocks
FINAL: a wide view of yellow and black speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  33%|███▎      | 20/61 [00:07<00:16,  2.55it/s]


IMAGE: speedbump_00019.jpg
BLIP : a yellow and black plastic ramp with a black and yellow stripe
FINAL: a ground-level view of black and yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  34%|███▍      | 21/61 [00:08<00:14,  2.69it/s]


IMAGE: speedbump_00020.jpg
BLIP : a car with a ramp attached to it
FINAL: a detailed view of a speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  36%|███▌      | 22/61 [00:08<00:13,  2.81it/s]


IMAGE: speedbump_00021.jpg
BLIP : a yellow car parked in a parking lot
FINAL: a ground-level view of yellow speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  38%|███▊      | 23/61 [00:08<00:13,  2.76it/s]


IMAGE: speedbump_00022.jpg
BLIP : a car is parked on the side of a road
FINAL: a clear view of a speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  39%|███▉      | 24/61 [00:09<00:13,  2.78it/s]


IMAGE: speedbump_00023.jpg
BLIP : a street with a white line painted on it
FINAL: a ground-level view of a speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  41%|████      | 25/61 [00:09<00:12,  2.82it/s]


IMAGE: speedbump_00024.jpg
BLIP : a street with a tree in the middle
FINAL: a close-up of a speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  43%|████▎     | 26/61 [00:09<00:13,  2.60it/s]


IMAGE: speedbump_00025.jpg
BLIP : a road with a sign on it that says, ' no parking '
FINAL: a wide view of a speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  44%|████▍     | 27/61 [00:10<00:15,  2.21it/s]


IMAGE: speedbump_00026.jpg
BLIP : a road with trees and a sign that says, ' the road is empty '
FINAL: a detailed view of a speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  46%|████▌     | 28/61 [00:10<00:13,  2.36it/s]


IMAGE: speedbump_00027.jpg
BLIP : a street with a yellow line on the side
FINAL: a close-up of yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  48%|████▊     | 29/61 [00:11<00:12,  2.58it/s]


IMAGE: speedbump_00028.jpg
BLIP : a road with a yellow line on it
FINAL: a ground-level view of yellow speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  49%|████▉     | 30/61 [00:11<00:12,  2.53it/s]


IMAGE: speedbump_00029.jpg
BLIP : a car is parked in front of a gas station
FINAL: a detailed view of a speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  51%|█████     | 31/61 [00:11<00:11,  2.64it/s]


IMAGE: speedbump_00030.jpg
BLIP : a yellow caution line on the side of a road
FINAL: a wide view of yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  52%|█████▏    | 32/61 [00:12<00:12,  2.25it/s]


IMAGE: speedbump_00031.jpg
BLIP : a speed limiter is seen on a street in the town of new canaan, n y
FINAL: a close-up of a speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  54%|█████▍    | 33/61 [00:12<00:11,  2.39it/s]


IMAGE: speedbump_00032.jpg
BLIP : a street with a yellow and black striped line
FINAL: a detailed view of yellow and black speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  56%|█████▌    | 34/61 [00:13<00:10,  2.65it/s]


IMAGE: speedbump_00033.jpg
BLIP : the new city for gtp
FINAL: a wide view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  57%|█████▋    | 35/61 [00:13<00:10,  2.55it/s]


IMAGE: speedbump_00034.jpg
BLIP : a man is painting a sidewalk with a paint roller
FINAL: a clear view of a speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  59%|█████▉    | 36/61 [00:13<00:09,  2.68it/s]


IMAGE: speedbump_00035.jpg
BLIP : a street with yellow tape and trash bins
FINAL: a clear view of yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  61%|██████    | 37/61 [00:14<00:08,  2.68it/s]


IMAGE: speedbump_00036.jpg
BLIP : a street with a bunch of flowers and a sign
FINAL: a clear view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  62%|██████▏   | 38/61 [00:14<00:08,  2.76it/s]


IMAGE: speedbump_00037.jpg
BLIP : a road with a yellow and black striped line
FINAL: a wide view of yellow and black speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  64%|██████▍   | 39/61 [00:14<00:07,  2.86it/s]


IMAGE: speedbump_00038.jpg
BLIP : a road with a yellow and black stripe
FINAL: a close-up of yellow and black speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  66%|██████▌   | 40/61 [00:15<00:07,  2.72it/s]


IMAGE: speedbump_00039.jpg
BLIP : a yellow and black street marking strip on the road
FINAL: a close-up of yellow and black speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  67%|██████▋   | 41/61 [00:15<00:07,  2.52it/s]


IMAGE: speedbump_00040.jpg
BLIP : a yellow and black street sign on the side of a road
FINAL: a wide view of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  69%|██████▉   | 42/61 [00:16<00:07,  2.44it/s]


IMAGE: speedbump_00041.jpg
BLIP : a parking lot with a yellow and black stripe
FINAL: a close-up of black and yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  70%|███████   | 43/61 [00:16<00:07,  2.47it/s]


IMAGE: speedbump_00042.jpg
BLIP : a yellow and black striped ramp with a black stripe
FINAL: a detailed view of yellow and black speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  72%|███████▏  | 44/61 [00:16<00:06,  2.65it/s]


IMAGE: speedbump_00043.jpg
BLIP : a yellow and black striped road marking strip
FINAL: a detailed view of yellow and black speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  74%|███████▍  | 45/61 [00:17<00:06,  2.65it/s]


IMAGE: speedbump_00044.jpg
BLIP : a yellow and black striped road sign on a paved road
FINAL: a ground-level view of yellow and black speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  75%|███████▌  | 46/61 [00:17<00:05,  2.81it/s]


IMAGE: speedbump_00045.jpg
BLIP : a yellow and black striped road marking strip
FINAL: a close-up of yellow and black speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  77%|███████▋  | 47/61 [00:18<00:05,  2.67it/s]


IMAGE: speedbump_00046.jpg
BLIP : a person walking across a crosswalk on a street
FINAL: a clear view of a speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  79%|███████▊  | 48/61 [00:18<00:04,  2.62it/s]


IMAGE: speedbump_00047.jpg
BLIP : a road with a white line painted on it
FINAL: a clear view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  80%|████████  | 49/61 [00:18<00:04,  2.68it/s]


IMAGE: speedbump_00048.jpg
BLIP : a long bridge with a train going over it
FINAL: a clear view of a speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  82%|████████▏ | 50/61 [00:19<00:03,  2.83it/s]


IMAGE: speedbump_00049.jpg
BLIP : a road with a white line on it
FINAL: a close-up of a speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  84%|████████▎ | 51/61 [00:19<00:03,  2.91it/s]


IMAGE: speedbump_00050.jpg
BLIP : a man riding a motorcycle down a street
FINAL: a clear view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  85%|████████▌ | 52/61 [00:19<00:03,  2.99it/s]


IMAGE: speedbump_00051.jpg
BLIP : a road with a fence and a car
FINAL: a ground-level view of a speed bump in a parking area on asphalt, realistic photo


Safe Varied Captioning:  87%|████████▋ | 53/61 [00:20<00:02,  2.99it/s]


IMAGE: speedbump_00052.jpg
BLIP : a road with a fence and a sign
FINAL: a ground-level view of a speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  89%|████████▊ | 54/61 [00:20<00:02,  2.95it/s]


IMAGE: speedbump_00053.jpg
BLIP : a yellow and black pedestrian crossing on a street
FINAL: a close-up of black and yellow speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  90%|█████████ | 55/61 [00:20<00:02,  2.71it/s]


IMAGE: speedbump_00054.jpg
BLIP : a parking lot with a white car parked in the middle
FINAL: a ground-level view of a speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  92%|█████████▏| 56/61 [00:21<00:01,  2.66it/s]


IMAGE: speedbump_00055.jpg
BLIP : a yellow and black striped road with a white line
FINAL: a clear view of yellow and black speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning:  93%|█████████▎| 57/61 [00:21<00:01,  2.57it/s]


IMAGE: speedbump_00056.jpg
BLIP : a yellow and black traffic barrier on the side of a road
FINAL: a ground-level view of yellow and black speed bump on an asphalt road surface, realistic photo


Safe Varied Captioning:  95%|█████████▌| 58/61 [00:22<00:01,  2.64it/s]


IMAGE: speedbump_00057.jpg
BLIP : a yellow and black rubber pad on the road
FINAL: a clear view of black and yellow speed bump on a paved roadway, realistic photo


Safe Varied Captioning:  97%|█████████▋| 59/61 [00:22<00:00,  2.59it/s]


IMAGE: speedbump_00058.jpg
BLIP : a man is walking down the street with a yellow tape
FINAL: a detailed view of yellow speed bump on an asphalt street, realistic photo


Safe Varied Captioning:  98%|█████████▊| 60/61 [00:22<00:00,  2.73it/s]


IMAGE: speedbump_00059.jpg
BLIP : a street with a yellow line on the side
FINAL: a close-up of yellow speed bump on a residential asphalt road, realistic photo


Safe Varied Captioning: 100%|██████████| 61/61 [00:23<00:00,  2.65it/s]


IMAGE: speedbump_00060.jpg
BLIP : a person walking down a street with a bike
FINAL: a ground-level view of a speed bump in a parking area on asphalt, realistic photo

Captioning completed for 61 images.





### Code Explanation

This script automatically generates high-quality captions for each image in
the processed LoRA-ready dataset using a pretrained BLIP image-captioning
model. It begins by importing required libraries including PyTorch for GPU
acceleration, PIL for image loading, Hugging Face Transformers for the BLIP
model, and utility libraries such as `random`, `re`, and `tqdm`.

The input image directory is defined using a `Path` object. The script
automatically selects the computing device as CUDA if a GPU is available;
otherwise, it falls back to the CPU. Debug mode is enabled to display
intermediate caption outputs for verification.

To increase caption diversity and reduce overfitting during training, three
controlled variation pools are defined:
- **VIEWS**: Describes camera perspectives.
- **LOCATIONS**: Describes where the speed bump is located.
- **COLORS**: Adds optional color descriptions.

A pretrained BLIP processor and model (`Salesforce/blip-image-captioning-base`)
are loaded from Hugging Face and moved to the selected device for efficient
inference.

The `extract_color_hint` function extracts color cues from the raw BLIP output
to maintain realistic color consistency in the final caption. If a yellow and
black pattern is detected, a randomized color ordering is applied to increase
variation.

The `generate_caption` function constructs the final training caption by
combining a randomized camera view, a randomized road location, and the
derived color hint from the BLIP model. The caption is cleaned using a regular
expression to remove extra spaces and ensure correct formatting.

The script then scans the processed dataset directory for all `.jpg` images
and applies BLIP-based captioning to each image inside a progress-tracked
loop. For every image:
- The image is loaded and converted to RGB.
- The BLIP model generates a raw descriptive caption.
- A controlled final caption is created using the custom generator.
- The caption is saved as a `.txt` file with the same filename as the image.

If debug mode is active, the script prints the image name, raw BLIP caption,
and final generated caption for visual inspection. At the end of the loop, a
summary is printed showing the total number of images successfully captioned.

This automated captioning step produces high-quality, diverse, and
training-ready text labels for effective LoRA + PEFT fine-tuning of the Stable
Diffusion model.