# VLM-Embeddings API Examples with MLX Server

This notebook demonstrates how to use the embeddings endpoint of MLX Server through the OpenAI-compatible API. You'll learn how to generate embeddings, work with batches, compare similarity between texts, and use embeddings for practical applications.

1. Setup and Connection

- Shows how to connect to the MLX Server using the OpenAI Python client.
- Uses a local server endpoint (http://localhost:8000/v1) and a placeholder API key.

In [24]:
# Import the OpenAI client for API communication
from openai import OpenAI

# Connect to the local MLX Server with OpenAI-compatible API
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="fake-api-key",
)

## Single Text-Image Embedding


In [25]:
from PIL import Image
from io import BytesIO
import base64

In [26]:
# To send images to the API, we need to convert them to base64-encoded strings in a data URI format.

def image_to_base64(image: Image.Image):
    """
    Convert a PIL Image to a base64-encoded data URI string that can be sent to the API.
    
    Args:
        image: A PIL Image object
        
    Returns:
        A data URI string with the base64-encoded image
    """
    # Convert image to bytes
    buffer = BytesIO()
    image.save(buffer, format="PNG")
    buffer.seek(0)
    image_data = buffer.getvalue()
    
    # Encode as base64
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    
    # Create the data URI format required by the API
    mime_type = "image/png"  
    image_uri = f"data:{mime_type};base64,{image_base64}"
    
    return image_uri

In [27]:
image = Image.open("images/attention.png")
image_uri = image_to_base64(image)

In [28]:
# Generate embedding for a single text input
single_text = "Describe the image in detail"
response = client.embeddings.create(
    input=[single_text],
    model="mlx-community/Qwen2.5-VL-3B-Instruct-4bit",
    extra_body = {
        "image_url": image_uri
    }
)

In [29]:
print(len(response.data[0].embedding))

2048
