<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center"><a href="https://docs.vlm.run"><b>Website</b></a> | <a href="https://docs.vlm.run/"><b>API Docs</b></a> | <a href="https://docs.vlm.run/blog"><b>Blog</b></a> | <a href="https://discord.gg/AMApC2UzVY"><b>Discord</b></a>
</p>
<p align="center">
<a href="https://discord.gg/AMApC2UzVY"><img alt="Discord" src="https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord"></a>
<a href="https://twitter.com/vlmrun"><img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/vlmrun.svg?style=social&logo=twitter"></a>
</p>
</div>

Welcome to **[VLM Run Cookbooks](https://github.com/vlm-run/vlmrun-cookbook)**, a comprehensive collection of examples and notebooks demonstrating the power of structured visual understanding using the [VLM Run Platform](https://app.vlm.run). 

## Case Study: Fashion Product Catalog with Hybrid Search
This notebook demonstrates how to build search for a fashion product catalog system using:
- VLM Run for structured image understanding
- LanceDB for hybrid vector + text search
- CLIP embeddings for visual similarity

### Environment Setup

To get started, install the VLM Run Python SDK and sign-up for an API key on the [VLM Run App](https://app.vlm.run).
- Store the VLM Run API key under the `VLMRUN_API_KEY` environment variable.

### Prerequisites

* Python 3.9+
* VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
* Basic understanding of vector databases and embeddings

## Setup

First, let's install the required packages:

In [26]:
! pip install lancedb --quiet
! pip install open_clip_torch --quiet
! pip install vlmrun --upgrade --quiet
! pip install vlmrun-hub --upgrade --quiet
! pip install datasets --quiet
! pip install tantivy --quiet  # For full-text search
! pip install -U datasets --quiet
! pip install pylance --quiet

## Set up CLIP Embeddings

In [3]:
from lancedb.embeddings import EmbeddingFunctionRegistry

registry = EmbeddingFunctionRegistry.get_instance()
clip = registry.get("open-clip").create()

In [4]:
clip

OpenClipEmbeddings(max_retries=7, name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)

## Define Data Schema

In [5]:
from PIL import Image
from lancedb.pydantic import LanceModel, Vector

class FashionImages(LanceModel):
    vector: Vector(clip.ndims()) = clip.VectorField()
    image_uri: str = clip.SourceField()
    description: str
    category: str
    season: str
    gender: str

    @property
    def image(self):
        return Image.open(self.image_uri)

## Configure VLM Run

In [6]:
import os
import getpass

VLMRUN_BASE_URL = os.getenv("VLMRUN_BASE_URL", "https://api.vlm.run/v1")
VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", None)
if VLMRUN_API_KEY is None:
    VLMRUN_API_KEY = getpass.getpass()

 ········


In [7]:
from vlmrun.client import VLMRun

vlm_client = VLMRun(base_url=VLMRUN_BASE_URL, api_key=VLMRUN_API_KEY)

In [12]:
import lancedb

db = lancedb.connect("fashion_imagesdb")

## Load and Process Dataset

In [9]:
from datasets import load_dataset
import logging

def load_fashion_dataset(sample_size="1%"):
    try:
        print(f"Loading {sample_size} of fashion dataset...")
        ds = load_dataset("ashraq/fashion-product-images-small", 
                         split=f"train[:{sample_size}]")
        print(f"Loaded {len(ds)} images successfully")
        return ds
    except Exception as e:
        logging.error(f"Failed to load dataset: {str(e)}")
        raise

ds = load_fashion_dataset("2%")

Loading 2% of fashion dataset...
Loaded 881 images successfully


## Understanding the retail.product-catalog Domain

The [`retail.product-catalog`](https://github.com/vlm-run/vlmrun-hub/blob/main/vlmrun/hub/schemas/retail/product_catalog.py) domain in VLM Run is specifically designed for analyzing fashion and retail product images. When processing an image, it extracts the following structured information:

- `description`: A two-sentence visual description of the product
- `category`: One or two-word product category (e.g., apparel, accessories, footwear)
- `season`: The intended season (fall, spring, summer, or winter)
- `gender`: Target audience (men, women, boys, or girls)

This structured output helps create rich, searchable product catalogs with consistent metadata across your entire inventory.

VLM Run simplifies the process of extracting structured metadata from fashion images through its pre-built domain schemas. Instead of writing complex prompts or training custom models, VLM Run's `retail.product-catalog` domain automatically analyzes fashion images and returns consistent, structured data. This eliminates the need for manual annotation and ensures standardized metadata across your entire product catalog.

In [10]:
def get_image_metadata(image):
    response = vlm_client.image.generate(
            images=[image],
            domain="retail.product-catalog"
        )
    return response.response

In [15]:
import pandas as pd
from pathlib import Path
import os
from concurrent.futures import ThreadPoolExecutor
from tqdm.auto import tqdm
import functools

image_dir = Path("~/fashion_images").expanduser()
image_dir.mkdir(exist_ok=True)

def process_batch(batch_data):
    batch_records = []
    for idx, img in batch_data:
        try:
            image_path = str(image_dir / f"image_{idx}.jpg")
            img.save(image_path)
            
            metadata = get_image_metadata(img)
            
            batch_records.append({
                "image_uri": image_path,
                "description": metadata["description"],
                "category": metadata["category"],
                "season": metadata["season"],
                "gender": metadata["gender"]
            })
        except Exception as e:
            print(f"Error processing image {idx}: {e}")
    return batch_records

In [16]:
if "fashion_images" in db:
    table = db["fashion_images"]
else:
    # Create batches of images
    BATCH_SIZE = 32
    MAX_WORKERS = 4
    
    # Prepare batches
    all_images = list(enumerate(ds["image"]))
    batches = [all_images[i:i + BATCH_SIZE] 
              for i in range(0, len(all_images), BATCH_SIZE)]
    
    # Process batches in parallel
    records = []
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        # Map batches to workers and show progress
        futures = list(tqdm(
            executor.map(process_batch, batches),
            total=len(batches),
            desc="Processing images"
        ))
        
        # Combine results
        for batch_records in futures:
            records.extend(batch_records)
    
    # Create table and add all records at once
    table = db.create_table("fashion_images", schema=FashionImages)
    df = pd.DataFrame(records)
    table.add(df)

Processing images: 100%|█████████████████████████████████████████████████████████| 28/28 [46:05<00:00, 98.76s/it]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 22.80it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 29.46it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:01<00:00, 33.74it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 29.21it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 30.86it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:01<00:00, 38.03it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64 [00:01<00:00, 41.36it/s]
100%|████████████████████████████████████████████████████████████████████████████| 64/64

The batch processing approach showcases VLM Run's efficiency in handling large product catalogs. By processing images in parallel batches, VLM Run can analyze hundreds or thousands of fashion items quickly while maintaining consistent quality. Each image is automatically categorized with the same structured schema, making the results immediately usable for search indexing and product management systems.

In [22]:
# Create full-text search index on text fields
table.create_fts_index(["description", "category", "season", "gender"], replace=True)

## Implement Hybrid Search

In [23]:
def search_fashion(query: str, limit: int = 3):
    """
    Hybrid search combining vector similarity with text search
    """
    return (
        table.search(query, query_type="hybrid")
        .limit(limit)
        .to_pydantic(FashionImages)
    )

## Display Results

In [24]:
from vlmrun.common.viz import show_results

# Example usage
results = search_fashion("red shirt men", limit=3)
show_results(results, [r.image for r in results], image_width=150)

Unnamed: 0,Image,description,category,season,gender
0,,"A man wears a long-sleeved, deep red button-up shirt with a matching tie. The shirt has a smooth, possibly satin or silk texture.",Apparel,fall,men
1,,A woman wears a blue patterned shirt with rolled-up short sleeves and a collared neckline. The shirt has buttons down the front and is paired with blue jeans.,shirt,summer,women
2,,A man models a long-sleeved reddish-brown dress shirt. He is also wearing a black belt and dark trousers.,apparel,fall,men


## Advanced Search with Metadata Filtering

In [20]:
def search_fashion_with_filters(
    query: str, 
    gender: str = None,
    season: str = None,
    category: str = None,
    limit: int = 3
):
    """
    Vector similarity search with metadata filtering
    
    Args:
        query: Search query text (will be converted to vector using CLIP)
        gender: Filter by gender (men, women, boys, girls)
        season: Filter by season (fall, spring, summer, winter)
        category: Filter by product category
        limit: Maximum number of results to return
    """
    # Vector search using CLIP embeddings
    search = table.search(query)
    
    # Add metadata filters if provided
    conditions = []
    if gender:
        conditions.append(f"gender = '{gender}'")
    if season:
        conditions.append(f"season = '{season}'")
    if category:
        conditions.append(f"category = '{category}'")
        
    if conditions:
        search = search.where(" AND ".join(conditions))
    
    return search.limit(limit).to_pydantic(FashionImages)

In [21]:
print("Searching for summer dresses for women...")
results = search_fashion_with_filters(
    query="floral dress",
    gender="women",
    season="summer",
    limit=3
)
show_results(results, [r.image for r in results], image_width=150) 

Searching for summer dresses for women...


Unnamed: 0,Image,description,category,season,gender
0,,A short-sleeved dress featuring a colorful floral pattern on a light background. The dress reaches knee length and has a V-neckline.,Dress,summer,women
1,,"A woman models a blue and white patterned short-sleeved tunic or top. The garment has a round neckline and a buttoned placket, with a repeating geometric or floral design.",Apparel,summer,women
2,,The top is sleeveless with a vibrant pink and white floral pattern. It appears to have a gathered waist or a layered design.,Apparel,summer,women


### Conclusion

This notebook demonstrates how VLM Run transforms the challenge of building intelligent fashion search into a straightforward process. Key benefits include:

1. **Automated Metadata Extraction**: VLM Run's retail.product-catalog domain eliminates manual tagging
2. **Consistent Categorization**: Every image receives structured metadata in the same format
3. **Hybrid Search Capabilities**: Combine visual similarity with text-based filtering
4. **Scalable Pipeline**: Process thousands of images efficiently with batch operations

## Additional Resources
- [VLM Run Documentation](https://docs.vlm.run)
- [API Reference](https://docs.vlm.run/)
- [More Examples](https://github.com/vlm-run/vlmrun-cookbook)
- [Lance Hybrid Search](https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/)
- [Fashion Dataset](https://huggingface.co/datasets/ashraq/fashion-product-images-small)