<a href="https://colab.research.google.com/github/MehediAhamed/vlmrun-cookbook/blob/mehedi/notebooks/06_fashion_images_hybrid_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center"><a href="https://docs.vlm.run"><b>Website</b></a> | <a href="https://docs.vlm.run/"><b>API Docs</b></a> | <a href="https://docs.vlm.run/blog"><b>Blog</b></a> | <a href="https://discord.gg/AMApC2UzVY"><b>Discord</b></a>
</p>
<p align="center">
<a href="https://discord.gg/AMApC2UzVY"><img alt="Discord" src="https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord"></a>
<a href="https://twitter.com/vlmrun"><img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/vlmrun.svg?style=social&logo=twitter"></a>
</p>
</div>

Welcome to **[VLM Run Cookbooks](https://github.com/vlm-run/vlmrun-cookbook)**, a comprehensive collection of examples and notebooks demonstrating the power of structured visual understanding using the [VLM Run Platform](https://app.vlm.run).

## Case Study: Fashion Product Catalog with Hybrid Search
This notebook demonstrates how to build search for a fashion product catalog system using:
- VLM Run for structured image understanding
- LanceDB for hybrid vector + text search
- CLIP embeddings for visual similarity

### Environment Setup

To get started, install the VLM Run Python SDK and sign-up for an API key on the [VLM Run App](https://app.vlm.run).
- Store the VLM Run API key under the `VLMRUN_API_KEY` environment variable.

### Prerequisites

* Python 3.9+
* VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
* Basic understanding of vector databases and embeddings

## Setup

First, let's install the required packages:

In [1]:
! pip install lancedb --quiet
! pip install open_clip_torch --quiet
! pip install vlmrun --upgrade --quiet
! pip install vlmrun-hub --upgrade --quiet
! pip install datasets --quiet
! pip install tantivy --quiet  # For full-text search
! pip install -U datasets --quiet
! pip install pylance --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.2/39.2 MB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m256.8/256.8 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.4/88.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.9/62.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.6/61.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Set up CLIP Embeddings

In [2]:
from lancedb.embeddings import EmbeddingFunctionRegistry

registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


open_clip_model.safetensors:   0%|          | 0.00/605M [00:00<?, ?B/s]

In [3]:
clip

OpenClipEmbeddings(max_retries=7, name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)

## Define Data Schema

In [4]:
from PIL import Image
from lancedb.pydantic import LanceModel, Vector

class FashionImages(LanceModel):
    vector: Vector(clip.ndims()) = clip.VectorField()
    image_uri: str = clip.SourceField()
    description: str
    category: str
    season: str
    gender: str

    @property
    def image(self):
        return Image.open(self.image_uri)

## Configure VLM Run

In [5]:
import os
import getpass

VLMRUN_BASE_URL = os.getenv("VLMRUN_BASE_URL", "https://api.vlm.run/v1")
VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", None)
if VLMRUN_API_KEY is None:
    VLMRUN_API_KEY = getpass.getpass()

··········


In [6]:
from vlmrun.client import VLMRun

vlm_client = VLMRun(base_url=VLMRUN_BASE_URL, api_key=VLMRUN_API_KEY)

In [7]:
import lancedb

db = lancedb.connect("fashion_imagesdb")

## Load and Process Dataset

In [8]:
from datasets import load_dataset
import logging

def load_fashion_dataset(sample_size="1%"):
    try:
        print(f"Loading {sample_size} of fashion dataset...")
        ds = load_dataset("ashraq/fashion-product-images-small",
                         split=f"train[:{sample_size}]")
        print(f"Loaded {len(ds)} images successfully")
        return ds
    except Exception as e:
        logging.error(f"Failed to load dataset: {str(e)}")
        raise

ds = load_fashion_dataset("2%")

Loading 2% of fashion dataset...


README.md:   0%|          | 0.00/867 [00:00<?, ?B/s]

data/train-00000-of-00002-6cff4c59f91661(…):   0%|          | 0.00/136M [00:00<?, ?B/s]

data/train-00001-of-00002-bb459e5ac5f01e(…):   0%|          | 0.00/135M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/44072 [00:00<?, ? examples/s]

Loaded 881 images successfully


## Understanding the retail.product-catalog Domain

The [`retail.product-catalog`](https://github.com/vlm-run/vlmrun-hub/blob/main/vlmrun/hub/schemas/retail/product_catalog.py) domain in VLM Run is specifically designed for analyzing fashion and retail product images. When processing an image, it extracts the following structured information:

- `description`: A two-sentence visual description of the product
- `category`: One or two-word product category (e.g., apparel, accessories, footwear)
- `season`: The intended season (fall, spring, summer, or winter)
- `gender`: Target audience (men, women, boys, or girls)

This structured output helps create rich, searchable product catalogs with consistent metadata across your entire inventory.

VLM Run simplifies the process of extracting structured metadata from fashion images through its pre-built domain schemas. Instead of writing complex prompts or training custom models, VLM Run's `retail.product-catalog` domain automatically analyzes fashion images and returns consistent, structured data. This eliminates the need for manual annotation and ensures standardized metadata across your entire product catalog.

In [9]:
def get_image_metadata(image):
    response = vlm_client.image.generate(
            images=[image],
            domain="retail.product-catalog"
        )
    return response.response

In [10]:
import pandas as pd
from pathlib import Path
import os
from concurrent.futures import ThreadPoolExecutor
from tqdm.auto import tqdm
import functools

image_dir = Path("~/fashion_images").expanduser()
image_dir.mkdir(exist_ok=True)

def process_batch(batch_data):
    batch_records = []
    for idx, img in batch_data:
        try:
            image_path = str(image_dir / f"image_{idx}.jpg")
            img.save(image_path)

            metadata = get_image_metadata(img)

            batch_records.append({
                "image_uri": image_path,
                "description": metadata["description"],
                "category": metadata["category"],
                "season": metadata["season"],
                "gender": metadata["gender"]
            })
        except Exception as e:
            print(f"Error processing image {idx}: {e}")
    return batch_records

In [11]:
if "fashion_images" in db:
    table = db["fashion_images"]
else:
    # Create batches of images
    BATCH_SIZE = 32
    MAX_WORKERS = 4

    # Prepare batches
    all_images = list(enumerate(ds["image"]))
    batches = [all_images[i:i + BATCH_SIZE]
              for i in range(0, len(all_images), BATCH_SIZE)]

    # Process batches in parallel
    records = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Map batches to workers and show progress
        futures = list(tqdm(
            executor.map(process_batch, batches),
            total=len(batches),
            desc="Processing images"
        ))

        # Combine results
        for batch_records in futures:
            records.extend(batch_records)

    # Create table and add all records at once
    table = db.create_table("fashion_images", schema=FashionImages)
    df = pd.DataFrame(records)
    table.add(df)

Processing images:   0%|          | 0/28 [00:00<?, ?it/s]

100%|██████████| 64/64 [00:13<00:00,  4.73it/s]
100%|██████████| 64/64 [00:13<00:00,  4.75it/s]
100%|██████████| 64/64 [00:13<00:00,  4.74it/s]
100%|██████████| 64/64 [00:13<00:00,  4.80it/s]
100%|██████████| 64/64 [00:13<00:00,  4.80it/s]
100%|██████████| 64/64 [00:13<00:00,  4.82it/s]
100%|██████████| 64/64 [00:13<00:00,  4.79it/s]
100%|██████████| 64/64 [00:13<00:00,  4.82it/s]
100%|██████████| 64/64 [00:13<00:00,  4.69it/s]
100%|██████████| 64/64 [00:13<00:00,  4.74it/s]
100%|██████████| 64/64 [00:13<00:00,  4.79it/s]
100%|██████████| 64/64 [00:13<00:00,  4.78it/s]
100%|██████████| 64/64 [00:13<00:00,  4.73it/s]
100%|██████████| 49/49 [00:10<00:00,  4.81it/s]


The batch processing approach showcases VLM Run's efficiency in handling large product catalogs. By processing images in parallel batches, VLM Run can analyze hundreds or thousands of fashion items quickly while maintaining consistent quality. Each image is automatically categorized with the same structured schema, making the results immediately usable for search indexing and product management systems.

In [12]:
# Create full-text search index on text fields
table.create_fts_index(["description", "category", "season", "gender"], use_tantivy=True, replace=True)

## Implement Hybrid Search

In [13]:
def search_fashion(query: str, limit: int = 3):
    """
    Hybrid search combining vector similarity with text search
    """
    return (
        table.search(query, query_type="hybrid")
        .limit(limit)
        .to_pydantic(FashionImages)
    )

## Display Results

In [14]:
from vlmrun.common.viz import show_results

# Example usage
results = search_fashion("red shirt men", limit=3)
show_results(results, [r.image for r in results], image_width=150)

Unnamed: 0,Image,description,category,season,gender
0,,"A man is wearing a long-sleeved, solid red dress shirt with a matching red tie. He is also wearing dark pants, presenting a formal look.",apparel,fall,men
1,,"A man wears a long-sleeved, collared olive green shirt with the sleeves rolled up. The shirt features a casual design.",Shirt,spring,men
2,,"A man is shown wearing a long-sleeved maroon or reddish-brown collared shirt with a subtle pattern. He is also wearing dark trousers and a dark belt, presenting a semi-formal or business casual look.",apparel,fall,men


## Advanced Search with Metadata Filtering

In [15]:
def search_fashion_with_filters(
    query: str,
    gender: str = None,
    season: str = None,
    category: str = None,
    limit: int = 3
):
    """
    Vector similarity search with metadata filtering

    Args:
        query: Search query text (will be converted to vector using CLIP)
        gender: Filter by gender (men, women, boys, girls)
        season: Filter by season (fall, spring, summer, winter)
        category: Filter by product category
        limit: Maximum number of results to return
    """
    # Vector search using CLIP embeddings
    search = table.search(query)

    # Add metadata filters if provided
    conditions = []
    if gender:
        conditions.append(f"gender = '{gender}'")
    if season:
        conditions.append(f"season = '{season}'")
    if category:
        conditions.append(f"category = '{category}'")

    if conditions:
        search = search.where(" AND ".join(conditions))

    return search.limit(limit).to_pydantic(FashionImages)

In [16]:
print("Searching for summer dresses for women...")
results = search_fashion_with_filters(
    query="floral dress",
    gender="women",
    season="summer",
    limit=3
)
show_results(results, [r.image for r in results], image_width=150)

Searching for summer dresses for women...


Unnamed: 0,Image,description,category,season,gender
0,,"A woman models a sleeveless top with a light background adorned with vibrant pink and red floral patterns. The top features a relaxed fit with a gathered waist, suitable for casual wear.",Apparel,summer,women
1,,"A woman models a short-sleeved, round-neck top featuring a delicate floral pattern. The casual top appears to be tunic-length and is styled with blue jeans.",apparel,summer,women
2,,"A sleeveless, loose-fitting tunic top with an intricate paisley-like pattern. It features a V-neckline with decorative trim and a contrasting border along the hem, primarily in shades of blue, orange, and brown.",Apparel,summer,women


### Conclusion

This notebook demonstrates how VLM Run transforms the challenge of building intelligent fashion search into a straightforward process. Key benefits include:

1. **Automated Metadata Extraction**: VLM Run's retail.product-catalog domain eliminates manual tagging
2. **Consistent Categorization**: Every image receives structured metadata in the same format
3. **Hybrid Search Capabilities**: Combine visual similarity with text-based filtering
4. **Scalable Pipeline**: Process thousands of images efficiently with batch operations

## Additional Resources
- [VLM Run Documentation](https://docs.vlm.run)
- [API Reference](https://docs.vlm.run/)
- [More Examples](https://github.com/vlm-run/vlmrun-cookbook)
- [Lance Hybrid Search](https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/)
- [Fashion Dataset](https://huggingface.co/datasets/ashraq/fashion-product-images-small)