# Trend-to-Design AI Challenge

#Objective



This notebook displays an end-to-end pipeline that recognizes fashion trends from product images and creates new designs automatically.  

The workflow:  
- Download a fashion dataset (images + metadata)  
- Extract trends by embedding and clustering images  
- Generate descriptive prompts for each cluster  
- Create new fashion designs with a generative model  
- Score the generated designs with simple yes/no feedback  
- Refine prompts for weaker designs  
- Repeat the cycle until final designs are obtained  

At the end we want to deliver: A final design per trend cluster, with the prompts, scores, and logs that document the process.  


#Executive Summary

This project shows how AI can identify fashion trends from thousands of product photos and automatically create new design concepts.  
By clustering image embeddings, we uncovered 9 different style groups and created short text prompts that capture each trend.  
Stable Diffusion Turbo was then used to produce fashion designs, which were scored and refined through a simple feedback loop.  
One design per cluster, prompts, scores, and logs are all included in the final output, which offers both creative visuals and clear process documentation.

#Step 0: Install the needed libraries

We installed the required libraries including kaggle for dataset access, sentence-transformers for embeddings, and diffusers for image generation.  
This setup guarantees our environment supports the full pipeline from downloading data to generating and scoring fashion designs.


In [None]:
!pip -q install kaggle tqdm pillow scikit-learn sentence-transformers \
               diffusers accelerate safetensors

#Step 1: Get the Data

## Step 1.1: Kaggle authorization

By uploading kaggle.json and setting up permissions, we gave Colab permission to utilize our Kaggle API token.  
This makes it possible for us to access and download datasets straight into the notebook environment via Kaggle.

In [None]:
from google.colab import files
import os, stat

uploaded = files.upload()
assert 'kaggle.json' in uploaded, "Please upload kaggle.json"

os.makedirs('/root/.kaggle', exist_ok=True)
with open('/root/.kaggle/kaggle.json', 'wb') as f:
    f.write(uploaded['kaggle.json'])
os.chmod('/root/.kaggle/kaggle.json', stat.S_IRUSR | stat.S_IWUSR)

!kaggle --version


Saving kaggle.json to kaggle.json
Kaggle API 1.7.4.5


##Step 1.2: Download dataset files from Kaggle

We used the Kaggle API to download the *Fashion Product Images (Small)* dataset, which includes both product images and the styles.csv metadata.  

In [None]:
# Download the full dataset (CSV + all images together as one zip)
DATASET = "paramaggarwal/fashion-product-images-small"
!kaggle datasets download -d {DATASET} -p .

Dataset URL: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small
License(s): MIT
Downloading fashion-product-images-small.zip to .
 99% 559M/565M [00:01<00:00, 317MB/s]
100% 565M/565M [00:01<00:00, 415MB/s]


##Step 1.3: Unzip dataset

We extracted the downloaded archive into a local data/ folder, which contains the images/ directory and styles.csv.  
This organizes the dataset into a structured format ready for metadata loading and image preprocessing.


In [None]:
# Unzip into "data/" folder
import zipfile

with zipfile.ZipFile("fashion-product-images-small.zip","r") as z:
    z.extractall("data")

print("Extracted. Check data/ for images + styles.csv")

Extracted. Check data/ for images + styles.csv


#Step 2: Find Trends (Embedding + Clustering)

We identify fashion trends by embedding product images into a semantic vector space and clustering them.   
This enables us to classify visually related or similar  products according to color, style, and vibe.

##Step 2.1: Load metadata, link images, and sample

We loaded the styles.csv metadata, linked each product to its corresponding image, and performed stratified sampling by type, color, and gender.  
After testing N=2000 and N=4000, we finalized with N=4000 for a better variety and more stable clusters, at the expense of longer runtime.

Result: 4,000 images were sampled out of approx. 44k available, guaranteeing diversity across categories while keeping the task computationally feasible.  


In [None]:
import pandas as pd, glob, os

CSV_PATH = "data/styles.csv"
IMAGES_ROOT = "data/images"

df = pd.read_csv(CSV_PATH, on_bad_lines="skip")

def find_image_path(item_id):
    p1 = os.path.join(IMAGES_ROOT, f"{item_id}.jpg")
    if os.path.exists(p1):
        return p1
    hits = glob.glob(os.path.join(IMAGES_ROOT, "**", f"{item_id}.jpg"), recursive=True)
    return hits[0] if hits else None

df["img_path"] = df["id"].apply(find_image_path)
df = df.dropna(subset=["img_path"]).reset_index(drop=True)
print("Rows with images available:", len(df))

#Stratified samplinggg

# target sample size
N = 4000

if len(df) > N:
    #choose stratification keys if present
    keys = [c for c in ["articleType", "baseColour", "gender"] if c in df.columns]
    if keys:
        frac = N / len(df)
        # proportional draw per group (at least 1 per non-empty group)
        parts = []
        for name, g in df.groupby(keys, dropna=False):
            k = max(1, int(round(len(g) * frac)))
            parts.append(g.sample(min(k, len(g)), random_state=42))
        sampled = pd.concat(parts, ignore_index=True).drop_duplicates(subset=["id"])
        # clamp to exactly N
        if len(sampled) >= N:
            df = sampled.sample(N, random_state=42).reset_index(drop=True)
        else:
            # top up from the remainder if we undershot
            remainder = df.loc[~df["id"].isin(sampled["id"])]
            need = N - len(sampled)
            topup = remainder.sample(min(need, len(remainder)), random_state=42)
            df = pd.concat([sampled, topup], ignore_index=True).reset_index(drop=True)
    else:
        # fallback: simple random sample
        df = df.sample(N, random_state=42).reset_index(drop=True)

print("Final sample size:", len(df))
df.head()


Rows with images available: 44419
Final sample size: 4000


Unnamed: 0,id,gender,masterCategory,subCategory,articleType,baseColour,season,year,usage,productDisplayName,img_path
0,15190,Men,Apparel,Topwear,Tshirts,Navy Blue,Fall,2011.0,Casual,Arrow Sport Men Solid Navy Blue Polo Tshirts,data/images/15190.jpg
1,23833,Women,Footwear,Shoes,Flats,Silver,Fall,2011.0,Casual,Puma Women Silver Gladiator Sandals,data/images/23833.jpg
2,38134,Women,Accessories,Wallets,Wallets,Blue,Winter,2015.0,Casual,Wildcraft Women Blue Wallet,data/images/38134.jpg
3,3023,Men,Apparel,Topwear,Tshirts,Blue,Summer,2011.0,Casual,Lee Men's Shot Midnight Blue T-shirt,data/images/3023.jpg
4,31317,Women,Apparel,Topwear,Jackets,Teal,Summer,2012.0,Casual,W Women Teal Jacket,data/images/31317.jpg


##Step 2.2: CLIP setup + image preprocessing helpers

We set up the CLIP model (ViT-B/32) on GPU and defined preprocessing functions to resize and crop images into a consistent 224×224 format.  
This ensures that all sampled images are standardized for embedding extraction with CLIP.


In [None]:
import torch
from PIL import Image

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
CLIP_MODEL = "clip-ViT-B-32"

def open_and_preprocess(path, size=224):
    img = Image.open(path).convert("RGB")
    w,h = img.size
    scale = size / min(w,h)
    img = img.resize((int(round(w*scale)), int(round(h*scale))), Image.BICUBIC)
    w,h = img.size
    left = (w - size)//2; top = (h - size)//2
    return img.crop((left, top, left+size, top+size))


##Step 2.3: Compute image embeddings (CLIP)

We encoded the 4,000 sampled product images into 512-dimensional CLIP embeddings using the ViT-B/32 model on GPU.  
These embeddings capture semantic information about style, color, and vibe, serving as the foundation for clustering.

The embedding matrix's shape (4000,512) indicates that every image was correctly converted into a feature vector.   


In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np
from tqdm import tqdm

def compute_image_embeddings(paths, batch=32):
    model = SentenceTransformer(CLIP_MODEL, device=DEVICE)
    embs = []
    buf = []
    for p in tqdm(paths, desc="Embedding images"):
        buf.append(open_and_preprocess(p))
        if len(buf) == batch:
            e = model.encode(buf, convert_to_tensor=True, device=DEVICE, normalize_embeddings=True, show_progress_bar=False)
            embs.append(e.cpu()); buf=[]
    if buf:
        e = model.encode(buf, convert_to_tensor=True, device=DEVICE, normalize_embeddings=True, show_progress_bar=False)
        embs.append(e.cpu())
    return torch.cat(embs, dim=0).numpy()

X = compute_image_embeddings(df["img_path"].tolist(), batch=24)
X.shape

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/604 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

0_CLIPModel/pytorch_model.bin:   0%|          | 0.00/605M [00:00<?, ?B/s]

0_CLIPModel/model.safetensors:   0%|          | 0.00/605M [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Embedding images: 100%|██████████| 4000/4000 [00:22<00:00, 175.94it/s]


(4000, 512)

##Step 2.4: Auto-select K and cluster

To identify the most significant grouping, we used KMeans clustering with an automated search over K (6–10) using silhouette scores.  
**K=9 clusters** with sizes varying from 194 to 911 items were chosen by the algorithm.

Stable and representative fashion trend groupings at N=4000 are confirmed by the clusters' good balance and lack of tiny groups (<50).  



In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import pandas as pd
import numpy as np

def pick_k_auto(X, k_min=6, k_max=10, min_cluster_size=40, sample=4000):
    idx = np.random.choice(len(X), min(sample, len(X)), replace=False)
    Xs = X[idx]
    best_k, best_sc = None, -1
    for k in range(k_min, k_max+1):
        km = KMeans(n_clusters=k, n_init="auto", random_state=42)
        labels = km.fit_predict(Xs)
        if len(set(labels)) < 2:
            continue
        sc = silhouette_score(Xs, labels)
        if sc > best_sc:
            best_k, best_sc = k, sc
    km_full = KMeans(n_clusters=best_k, n_init="auto", random_state=42).fit(X)
    labels = km_full.labels_
    while True:
        sizes = pd.Series(labels).value_counts()
        if (sizes < min_cluster_size).any() and best_k > max(k_min,2):
            best_k -= 1
            km_full = KMeans(n_clusters=best_k, n_init="auto", random_state=42).fit(X)
            labels = km_full.labels_
        else:
            break
    return km_full, labels, best_k

kmeans, labels, K = pick_k_auto(X, k_min=6, k_max=10, min_cluster_size=50)
df["cluster"] = labels
print(f"Chosen K={K}\n", df["cluster"].value_counts().sort_index())


Chosen K=9
 cluster
0    194
1    640
2    396
3    402
4    274
5    911
6    327
7    491
8    365
Name: count, dtype: int64


#Step 3: Make Prompts

We transform each cluster’s “vibe” into a short, descriptive text prompt that a generative model can use.  
This bridges unsupervised trend discovery with controllable design generation.


##Step 3.1: Prompt building helpers (tags + color)

We extracted dominant colors and used zero-shot tag ranking (styles, garments, textures) from CLIP text embeddings.  
Result: compact building blocks (e.g., “bohemian”, “dress”, “linen”, “white/beige”) that capture each cluster’s identity.


In [None]:
from collections import Counter
import math, json
from sentence_transformers import util as s_util

STYLE_TAGS = ["streetwear","minimalist","bohemian","vintage","formal","casual",
              "sporty","elegant","grunge","preppy","retro","chic","y2k",
              "avant-garde","classic","smart casual","business casual"]
GARMENT_TAGS = ["dress","blazer","jacket","t-shirt","shirt","jeans","trousers",
                "skirt","sneakers","heels","boots","hoodie","sweater","coat",
                "handbag","scarf","sunglasses"]
TEXTURE_TAGS = ["denim","leather","silk","satin","linen","wool","knit","cotton",
                "velvet","lace","sequin","mesh","pleated"]
ADJ = ["clean","sleek","tailored","oversized","fitted","flowy",
       "layered","structured","monochrome","patterned","bold","subtle"]

_BASIC = {
    "red":(220,20,60),"orange":(255,140,0),"yellow":(255,215,0),
    "green":(34,139,34),"blue":(30,144,255),"purple":(138,43,226),
    "pink":(255,105,180),"brown":(139,69,19),"black":(10,10,10),
    "white":(245,245,245),"gray":(128,128,128),"beige":(222,203,180)
}

def _dist(a,b): return math.sqrt(sum((x-y)**2 for x,y in zip(a,b)))
def nearest_color_name(rgb):
    return min(_BASIC.keys(), key=lambda n: _dist(rgb, _BASIC[n]))

def dominant_color_name(img, k=3):
    import numpy as np
    from sklearn.cluster import KMeans
    small = img.resize((64,64))
    arr = np.array(small).reshape(-1,3).astype(np.float32)
    km = KMeans(n_clusters=k, n_init="auto", random_state=42).fit(arr)
    counts = Counter(km.labels_)
    dom_idx = counts.most_common(1)[0][0]
    rgb = km.cluster_centers_[dom_idx]
    return nearest_color_name(tuple(rgb))


##Step 3.2: Create prompts per cluster

We composed one concise prompt per cluster (style + garment + colors + texture + 1–2 adjectives) and saved them to outputs/prompts/base_prompts.json.  
As a result, nine prompts—one for each cluster from the 4k run—are ready to drive image generation.


In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np, os

clip_text_model = SentenceTransformer(CLIP_MODEL, device=DEVICE)

def rank_tags(image_emb, tag_list, topk=3):
    tag_embs = clip_text_model.encode(tag_list, convert_to_tensor=True, device=DEVICE, normalize_embeddings=True)
    sims = s_util.cos_sim(torch.tensor(image_emb).to(DEVICE), tag_embs).squeeze(0)
    idx = torch.topk(sims, k=min(topk, len(tag_list))).indices.tolist()
    return [tag_list[i] for i in idx]

cluster_meta = {}
for cid, g in df.groupby("cluster"):
    idx = g.index
    centroid = X[idx].mean(0, keepdims=True)

    colors = []
    for p in g["img_path"].sample(min(12, len(g)), random_state=42):
        colors.append(dominant_color_name(open_and_preprocess(p)))
    top_colors = [c for c,_ in Counter(colors).most_common(2)] or ["neutral tones"]

    styles   = rank_tags(centroid[0], STYLE_TAGS, topk=3)
    garments = rank_tags(centroid[0], GARMENT_TAGS, topk=3)
    textures = rank_tags(centroid[0], TEXTURE_TAGS, topk=2)
    adjectives = np.random.choice(ADJ, size=2, replace=False).tolist()

    color_txt = ", ".join(top_colors)
    style_txt = ", ".join(styles[:2]) if styles else "casual"
    garment_txt = ", ".join(garments[:2]) if garments else "outfit"
    texture_txt = textures[0] if textures else "fabric"
    adj_txt = ", ".join(adjectives)

    prompt = f"{style_txt} {garment_txt} in {color_txt}, {texture_txt} texture, {adj_txt}, fashion design sheet, clean studio background"

    cluster_meta[int(cid)] = {
        "centroid": centroid[0].tolist(),
        "colors": top_colors,
        "styles": styles,
        "garments": garments,
        "textures": textures,
        "adjectives": adjectives,
        "prompt": prompt
    }

os.makedirs("outputs/prompts", exist_ok=True)
with open("outputs/prompts/base_prompts.json","w") as f:
    json.dump({k:v["prompt"] for k,v in cluster_meta.items()}, f, indent=2)

print("Built prompts for clusters.")


Built prompts for clusters.


#Step 4: Create Designs

In order to build **one v1 image per cluster** for speed on a Colab GPU T4, we loaded Stable Diffusion Turbo and followed the cluster-specific prompts (512×384, 4 steps).  
As a result, 9 preliminary or initial designs (clusters 0–8) were produced and saved as outputs/gens/ as c{cluster}_v1.png.


##Load SD-Turbo and generate 1 image per cluster (v1)

In [None]:
import os
from diffusers import AutoPipelineForText2Image

os.makedirs("outputs/gens", exist_ok=True)

dtype = torch.float16 if torch.cuda.is_available() else torch.float32
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=dtype).to(DEVICE)

def generate_one(prompt, seed=42, height=512, width=384, steps=4, guidance_scale=0.0):
    g = torch.Generator(device=DEVICE).manual_seed(seed)
    img = pipe(prompt=prompt, height=height, width=width,
               num_inference_steps=steps, guidance_scale=guidance_scale,
               generator=g).images[0]
    return img

gen_paths = {}
for cid, meta in cluster_meta.items():
    img = generate_one(meta["prompt"], seed=42+cid)
    out = f"outputs/gens/c{cid}_v1.png"
    img.save(out)
    gen_paths[cid] = out

print("Generated v1 images (one per cluster).")


model_index.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/618 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/553 [00:00<?, ?B/s]

text_encoder/model.safetensors:   0%|          | 0.00/1.36G [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/574 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

vae/diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

unet/diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.46G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .


  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

Generated v1 images (one per cluster).


#Step 5: Score Designs

We used cosine similarity to compare each generated image to its cluster centroid and prompt.  
This gives each image a scalar score, which we then convert to [0,1] and use it for easy yes/no labeling.

##Step 5.1: Scoring helpers (image/text embeddings)

In order to reliably compute cosine similarity (same dtype/device) and incorporate images and prompts with CLIP, we set up compact helpers.  
Strong scoring components that avoid common CUDA/dtype mismatches, are the end result.

In [None]:
from sentence_transformers import SentenceTransformer
import torch
import torch.nn.functional as F

# for image embeddings
img_model = SentenceTransformer(CLIP_MODEL, device=DEVICE)

def embed_image_t(path):
    e = img_model.encode(
        [open_and_preprocess(path)],
        convert_to_tensor=True, device=DEVICE,
        normalize_embeddings=True, show_progress_bar=False
    )
    return e.to(dtype=torch.float32)  # [1, D]

def embed_text_t(text):
    e = clip_text_model.encode(
        [text],
        convert_to_tensor=True, device=DEVICE,
        normalize_embeddings=True
    )
    return e.to(dtype=torch.float32)  # [1, D]

def score_image_for_cluster(path, cid):
    t_img = embed_image_t(path)  # [1, D]
    # centroid from cluster_meta
    t_cen = torch.tensor(cluster_meta[cid]["centroid"], dtype=torch.float32, device=t_img.device).unsqueeze(0)
    t_pr  = embed_text_t(cluster_meta[cid]["prompt"]).to(t_img.device)

    s1 = float(F.cosine_similarity(t_img, t_cen, dim=1).item())  # img - centroid
    s2 = float(F.cosine_similarity(t_img, t_pr,  dim=1).item())  # img - prompt
    return (0.7*s1 + 0.3*s2 + 1) / 2  #maping [-1,1] into [0,1]


##Step 5.2: Score v1 and Label yes/no

Using an initial threshold, we assigned labels and scored every v1 image.  
As a result, a first pass of scores weere saved to outputs/logs/scores.csv with one row per (cluster, version).


In [None]:
import pandas as pd

THRESH = 0.70
results = []

for cid, path in gen_paths.items():
    sc = score_image_for_cluster(path, cid)
    label = "yes" if sc >= THRESH else "no"
    results.append({"cluster": cid, "version": "v1", "path": path, "score": sc, "label": label})

pd.DataFrame(results).head()
print("Scored v1 images.")


Scored v1 images.


##Step 5.3: Re-label v1 scores with adaptive threshold


In order to make the weakest slice "no" and move on to refining, we re-labeled using a percentile threshold.  
7 "no" and 2 "yes" were the results (N=4000, K=9), guaranteeing a significant improvement loop.

In [None]:
import numpy as np, pandas as pd

assert len(results) > 0, "No results yet. Run the 'Score v1 images' cell first."

scores = np.array([r["score"] for r in results])
# Lets say bottom 15% become "no"
THRESH = float(np.percentile(scores, 85))

for r in results:
    r["label"] = "yes" if r["score"] >= THRESH else "no"

print("yes/no counts:", pd.Series([r["label"] for r in results]).value_counts().to_dict())


yes/no counts: {'no': 7, 'yes': 2}


Result: Out of 9 initial designs, 7 were flagged as "no" and 2 as "yes".  
This ensures that most clusters will go through the refinement loop (v2 generation).  

#Step 6: Improve Prompts

To automatically improve poor prompts by including elements like lighting, tailoring, or textile focus, we built a refine_prompt function.  
As a result, the 7 clusters with the label "no" received improved prompts, getting them ready for a second image generation cycle (v2).

##Define refine_prompt

In [None]:
import random

QUALITY = [
    "high fashion editorial lighting","sharp textile detail","well-structured silhouette",
    "professional lookbook style","studio lighting"
]

def refine_prompt(prompt, meta, avg_score):
    if avg_score < 0.50:
        color = meta["colors"][0] if meta["colors"] else "neutral tones"
        garment = meta["garments"][0] if meta["garments"] else "outfit"
        booster = random.choice(QUALITY)
        return f"{prompt}, {color} emphasis, refined {garment} tailoring, {booster}"
    elif avg_score < 0.62:
        return f"{prompt}, refined details"
    return prompt

#Step 7: Repeat (Generate v2 + Re-Score)

Using improved prompts, we created new images (v2) for the 7 "no" clusters and re-scored them next to the v1 images.  
Result: Despite the modifications, all final selections maintained their v1 status because they outscored their v2 counterparts. Both rounds' logs and the final selections were stored in outputs/logs/scores.csv and final_summary.csv, respectively.

##Step 7.1: Regenerate v2 for “no” clusters

We generated v2 images for the 7 clusters flagged as “no,” applying the refined prompts.  
Result: New designs were created and stored in outputs/gens/ with version tags.

In [None]:
avg_by_cluster = {r["cluster"]: r["score"] for r in results}

gen_paths_v2 = {}
for r in results:
    cid = r["cluster"]
    if r["label"] == "no":
        new_p = refine_prompt(cluster_meta[cid]["prompt"], cluster_meta[cid], avg_by_cluster[cid])
        cluster_meta[cid]["prompt"] = new_p
        img2 = generate_one(new_p, seed=1024+cid)
        out2 = f"outputs/gens/c{cid}_v2.png"
        img2.save(out2)
        gen_paths_v2[cid] = out2

print(f"Regenerated v2 for {len(gen_paths_v2)} clusters.")

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

Regenerated v2 for 7 clusters.


##Step 7.2: Re-score v2, log, and pick finals

We re-scored v2 images and compared them against v1, keeping the better option per cluster.  
As a result, all final picks remained v1, with logs saved in scores.csv and final selections stored in final_picks.json and final_prompts.json.

In [None]:
# add v2 scores
for cid, path in gen_paths_v2.items():
    sc = score_image_for_cluster(path, cid)
    label = "yes" if sc >= THRESH else "no"
    results.append({"cluster":cid, "version":"v2", "path":path, "score":sc, "label":label})

res_df = pd.DataFrame(results).sort_values(["cluster","version"])
os.makedirs("outputs/logs", exist_ok=True)
res_df.to_csv("outputs/logs/scores.csv", index=False)

# choose final per cluster
final_paths = {}
for cid in sorted(cluster_meta.keys()):
    v2 = res_df[(res_df.cluster==cid) & (res_df.version=="v2")]
    v1 = res_df[(res_df.cluster==cid) & (res_df.version=="v1")]
    if len(v2) and v2.iloc[0]["label"]=="yes":
        final_paths[cid] = v2.iloc[0]["path"]
    else:
        final_paths[cid] = v1.iloc[0]["path"]

# save prompts and picks
os.makedirs("outputs/prompts", exist_ok=True)
with open("outputs/prompts/final_prompts.json","w") as f:
    json.dump({cid:cluster_meta[cid]["prompt"] for cid in sorted(cluster_meta.keys())}, f, indent=2)

with open("outputs/logs/final_picks.json","w") as f:
    json.dump(final_paths, f, indent=2)

print("Saved final picks and logs.")
list(final_paths.items())[:10]


Saved final picks and logs.


[(0, 'outputs/gens/c0_v1.png'),
 (1, 'outputs/gens/c1_v1.png'),
 (2, 'outputs/gens/c2_v1.png'),
 (3, 'outputs/gens/c3_v1.png'),
 (4, 'outputs/gens/c4_v1.png'),
 (5, 'outputs/gens/c5_v1.png'),
 (6, 'outputs/gens/c6_v1.png'),
 (7, 'outputs/gens/c7_v1.png'),
 (8, 'outputs/gens/c8_v1.png')]

#Step 8: Compact final table

We built a compact summary table combining each cluster’s final image path, prompt, score, and label.  

In [None]:
import json, pandas as pd

# Load logs and prompts
scores = pd.read_csv("outputs/logs/scores.csv")
with open("outputs/logs/final_picks.json") as f:
    final = {int(k):v for k,v in json.load(f).items()}
with open("outputs/prompts/final_prompts.json") as f:
    prompts = {int(k):v for k,v in json.load(f).items()}

# Build compact summary
final_df = (
    pd.DataFrame({"cluster": list(final.keys()), "final_path": list(final.values())})
      .merge(pd.DataFrame({"cluster": list(prompts.keys()), "prompt": list(prompts.values())}),
             on="cluster", how="left")
)

# Attach the final score/label for each cluster
agg = (scores.sort_values(["cluster","version"])
              .groupby("cluster").tail(1)[["cluster","score","label"]])
final_df = final_df.merge(agg, on="cluster", how="left").sort_values("cluster")

# Show table
display(final_df)

# Export to CSV inside outputs/logs/
final_df.to_csv("outputs/logs/final_summary.csv", index=False)
print("Exported compact final table to outputs/logs/final_summary.csv")


Unnamed: 0,cluster,final_path,prompt,score,label
0,0,outputs/gens/c0_v1.png,"chic, vintage blazer, handbag in white, black,...",0.78066,no
1,1,outputs/gens/c1_v1.png,"chic, bohemian dress, t-shirt in white, beige,...",0.780346,no
2,2,outputs/gens/c2_v1.png,"smart casual, preppy t-shirt, shirt in white, ...",0.794144,no
3,3,outputs/gens/c3_v1.png,"smart casual, chic sneakers, boots in white, b...",0.808409,yes
4,4,outputs/gens/c4_v1.png,"chic, bohemian handbag, blazer in white, pleat...",0.750057,no
5,5,outputs/gens/c5_v1.png,"chic, smart casual handbag, blazer in white, p...",0.77212,no
6,6,outputs/gens/c6_v1.png,"chic, smart casual heels, sneakers in white, p...",0.791913,no
7,7,outputs/gens/c7_v1.png,"smart casual, business casual trousers, t-shir...",0.78008,no
8,8,outputs/gens/c8_v1.png,"streetwear, smart casual t-shirt, shirt in whi...",0.830858,yes


Exported compact final table to outputs/logs/final_summary.csv


Result (N=4000, K=9):  
This table shows one selected design per cluster, with its prompt, score, and yes/no label. Out of 9 clusters, 2 designs (Clusters 3 and 8) were labeled "yes" while the rest were "no," indicating refinement did not surpass the v1 quality in most cases.  
Scores ranged from 0.75 to 0.83, suggesting overall stable similarity values across clusters, with Cluster 8 achieving the highest alignment to its prompt.


#Step 9: Zip everything

We zipped the full outputs/ folder, including images, prompts, logs, and the final summary, into trend_to_design_outputs.zip.  

In [None]:
import shutil

# Zip the entire outputs folder (includes gens, prompts, logs, summary)
shutil.make_archive("trend_to_design_outputs", "zip", "outputs")
print("Created trend_to_design_outputs.zip")

Created trend_to_design_outputs.zip


# Limitations

- Sample Size vs. Full Dataset: We sampled 4,000 images out of an approx. of 44k for runtime feasibility on Colab. While this gave stable clusters, the full dataset may yield richer trends.  
- No Fine-Tuning: Both CLIP and Stable Diffusion Turbo were used in zero-shot mode without model fine-tuning, which limits design fidelity.  
- Binary Feedback: The yes/no scoring is simple; more nuanced evaluation could better capture design quality.  
- Generative Output Quality: Some v2 refinements didn't outperform v1, showing the limits of simple prompt engineering.

# Future Work




- Larger Samples or Full Dataset: Scale up to more images for more diverse and representative clusters.  
- Model Fine-Tuning: Fine-tune generative or embedding models for fashion-specific domains to improve alignment.  
- Rich Feedback Loops: Move beyond binary scoring to include style similarity metrics, human-in-the-loop ratings, or multi-label evaluation.  
- Multiple Candidates per Cluster: Generate several designs per trend and keep top-k instead of just one.  


# Conclusion


- We built a complete Trend-to-Design pipeline that goes from raw fashion images to embeddings to clustering to prompt generation to design creation to scoring to refinement to final picks.  
- Using N=4k images, the system identified 9 trend clusters, generated candidate designs, and refined weak prompts in an iterative loop.  
- While most final selections remained at v1, the process successfully demonstrated automatic feedback-driven refinement, with outputs logged and summarized in a compact table.  
- The project shows that embedding-driven clustering combined with generative AI can uncover fashion trends and translate them into new designs in an automated and interpretable way.