### 🏪 Example: Indexing and Querying Restaurant Data with Milvus

This notebook demonstrates how to index and search restaurant data using the Milvus Vector Database.

You'll learn how to:

* Prepare restaurant data with `title` and `types` fields.
* Embed text using the **LaBSE** sentence transformer.
* Insert the embedded data into Milvus.
* Perform vector similarity searches to find related restaurants.

#### 🛠 Requirements
Make sure you have the following Python libraries installed:
* `pymilvus`
* `sentence-transformers`
* `pandas`

You can use either:
* A **local Milvus** instance (e.g. via Docker)
* Or a **managed Milvus** service such as [Zilliz Cloud](https://cloud.zilliz.com)

📖 For more context, see the full blog post at: [wiphoo.dev](https://wiphoo.dev)

In [1]:
%pip install --upgrade "pymilvus[model]" sentence-transformers pandas

Looking in indexes: https://pypi.org/simple, https://packagecloud.io/github/git-lfs/pypi/simple
Collecting sentence-transformers
  Using cached sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting pandas
  Using cached pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting pymilvus[model]
  Using cached pymilvus-2.5.10-py3-none-any.whl.metadata (5.7 kB)
Collecting setuptools>69 (from pymilvus[model])
  Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
Collecting grpcio<=1.67.1,>=1.49.1 (from pymilvus[model])
  Using cached grpcio-1.67.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Collecting protobuf>=3.20.0 (from pymilvus[model])
  Using cached protobuf-6.31.1-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Collecting python-dotenv<2.0.0,>=1.0.1 (from pymilvus[model])
  Using cached python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting ujson>=2.0.0 (from 

### 🔗 Connect to Milvus (Local or Managed Cloud)

In [2]:
# create a connection to Milvus either local or Zilliz cloud
from pymilvus import connections

# local Milvus
connections.connect(uri='http://localhost:19530')

# # Zilliz cloud
# connections.connect(uri="https://YOUR_URI.cloud.zilliz.com", 
#                     token='YOUR_TOKEN',
#                     )

#### 🔤 Create an Embedding Function with LaBSE

To convert restaurant names, descriptions, or other text fields into dense vector embeddings, we use a *Sentence Transformer model*.
In this example, we use **LaBSE (Language-agnostic BERT Sentence Embedding)** — a multilingual model suitable for both English and Thai.

In [3]:
from pymilvus.model.dense import SentenceTransformerEmbeddingFunction

# create a embedding function
embedding_func = SentenceTransformerEmbeddingFunction(
    model_name = "sentence-transformers/LaBSE",
    batch_size = 32,
    device = "cpu",
    normalize_embeddings = True,
)

  from .autonotebook import tqdm as notebook_tqdm


#### 🗂️ Define a Schema and Create a Collection with Indexing & Partitioning

##### 📄 Step 1: Define the Schema for Restaurant Data

We’ll define a schema that includes key information for each restaurant:
* `place_id`: Unique identifier (string)
* `title`: Restaurant name (string)
* `dense_vector`: Vector representation of the title, created using the embedding function
* `lat`, `lng`: Geographic coordinates (float)
* `h3_r8`: H3 index at resolution 8 for spatial partitioning (string or int)

In [4]:
from pymilvus import (
    FieldSchema,
    DataType,
)


# define fields
fields = [
    FieldSchema(name="id", 
                dtype=DataType.VARCHAR, 
                is_primary=True, 
                auto_id=False, 
                max_length=128
            ),
    
    # store the original restaurant to retrieve based on semantically distance
    FieldSchema(name="title", 
                dtype=DataType.VARCHAR, 
                max_length=512
            ),

    FieldSchema(name="lat", dtype=DataType.FLOAT),
    FieldSchema(name="lng", dtype=DataType.FLOAT),
    
    # store the title as embdded as dense vector
    FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=embedding_func.dim),

    # store H3 resolution 8 as parition key
    FieldSchema(name="h3_r8", 
                dtype=DataType.VARCHAR, 
                max_length=32, 
                is_partition_key=True,
            ),
]


#### 📄 Step 2: Create the Schema

In [5]:
from pymilvus import CollectionSchema

schema = CollectionSchema(fields=fields, description="Schema สำหรับข้อมูลร้านอาหาร")

#### 📄 Step 3: Create the Collection

In [6]:
from pymilvus import Collection, utility

collection_name = "restaurants"

# delete exsiting collection if exists
if utility.has_collection(collection_name):
    Collection(collection_name).drop()

# created a new collection 
collection = Collection(collection_name, schema)


#### 📄 Step 4: Create the Index

In [7]:
# to make vector search efficient, we need to create indices for the vector fields
dense_index = {
    "index_type": "AUTOINDEX",  # IVF_FLAT, HNSW, etc.
    "metric_type": "COSINE"
}

collection.create_index(field_name="dense_vector", index_params=dense_index)
collection.load()

#### 🍽️ Step 5: Prepare, Embed, and Ingest Restaurant Data

In [8]:
# downlond smaple restaurant data
!wget https://go.wiphoo.dev/gYNwXN -O './sample_restaurants.csv'

--2025-06-01 12:13:17--  https://go.wiphoo.dev/gYNwXN
Resolving go.wiphoo.dev (go.wiphoo.dev)... 91.197.243.143, 207.174.61.1
Connecting to go.wiphoo.dev (go.wiphoo.dev)|91.197.243.143|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://raw.githubusercontent.com/wiphoo/Website_Resources/ed5ef8e5ca8aab2afef63a588da126e393c53b61/data/2025/restaurants/2025-05-31_sample_restaurants.csv?clid=eyJpIjoiXzdoUW5YVmxiZ1VTYXZjODVNRnBBIiwiaCI6IiIsInAiOiIvZ1lOd1hOIiwidCI6MTc0ODc1NDc5OH0.L8keo-rwsKaMg10mE--QJfVjlq4hrrAHWQzddGgIYDw [following]
--2025-06-01 12:13:18--  https://raw.githubusercontent.com/wiphoo/Website_Resources/ed5ef8e5ca8aab2afef63a588da126e393c53b61/data/2025/restaurants/2025-05-31_sample_restaurants.csv?clid=eyJpIjoiXzdoUW5YVmxiZ1VTYXZjODVNRnBBIiwiaCI6IiIsInAiOiIvZ1lOd1hOIiwidCI6MTc0ODc1NDc5OH0.L8keo-rwsKaMg10mE--QJfVjlq4hrrAHWQzddGgIYDw
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 

In [9]:
import pandas as pd

# read sample restaurant data
df = pd.read_csv("./sample_restaurants.csv")

In [10]:
# combinate multiple fields and create embedded
df["combined_text"] = df[["title", "types", "type_ids"]].agg(" ".join, axis=1)
embedded_text = embedding_func.encode_documents(df["combined_text"].tolist())

In [11]:
entities = [
    df["place_id"].tolist(),
    df["title"].tolist(),
    df["latitude"].tolist(),
    df["longitude"].tolist(),
    embedded_text,
    df["h3_r8"].tolist(),
]

collection.insert(entities)
collection.flush()

print(f"Inserted {len(df)} records with embeddings.")

Inserted 268 records with embeddings.


#### ⚙️ Step 6: Create Helper Functions

In [12]:
def milvus_result_to_dataframe(results):
    """
    Convert Milvus search results to a pandas DataFrame.

    Parameters:
        results (list): Milvus search results in the format:
            [
                [  # query 1 result
                    Hit(id=..., distance=..., entity=...), 
                    ...
                ],
                ...
            ]

    Returns:
        pd.DataFrame: Flattened DataFrame with distance and entity fields.
    """
    flat_results = []

    for query_results in results:
        for match in query_results:
            entity = match.get("entity", {})
            flat_result = {
                "id": match.get("id"),
                "distance": match.get("distance"),
                **{k: v for k, v in entity.items()}
            }
            flat_results.append(flat_result)

    return pd.DataFrame(flat_results)

In [13]:
def search(query):
    """
    Perform a semantic search on the collection using a given text query.

    Args:
        query (str): The text query to search for.

    Returns:
        List[Dict[str, Any]]: A list of search results containing output fields
        such as 'id', 'title', 'lat', 'lng', and 'h3_r8'.
    """
    # convert the query to an embedding vector using the provided embedding function
    query_vector = embedding_func.encode_queries(queries=[query])

    # execute the search on the vector database
    results = collection.search(
        data=query_vector,
        anns_field="dense_vector",
        param={
            "metric_type": "COSINE",
            "params": {"nprobe": 15}
        },
        output_fields=["id", "title", "lat", "lng", "h3_r8"],
        limit=10,
    )

    return results

#### 🔍 Step 7: Test Queries 

In [14]:
query = 'ซูซิ'
results = search(query)
milvus_result_to_dataframe(results)

Unnamed: 0,id,distance,lat,lng,h3_r8,title
0,ChIJG5cMlYmf4jARjI14Tvhzj5I,0.504464,13.726238,100.543182,8864a4b14dfffff,Sushi Sora
1,ChIJC_IG4WGZ4jARJil71QyJnJQ,0.445954,13.740359,100.525108,8864a4b10dfffff,Min Sushi by Sushi Cottage ずしコテージ
2,ChIJ89mnRNGj4jARXFw7F8kdlu8,0.428978,13.660226,100.501335,8864a4b223fffff,ไข่หวานบ้านซูชิ สาขาประชาอุทิศ
3,ChIJFepMlimf4jARW2MqZCN7GMQ,0.424719,13.721445,100.54673,8864a4b327fffff,sushimai ซูชิมั้ย ศรีบำเพ็ญ
4,ChIJ-eRZ7dif4jARwl4RGXfAbuI,0.353874,13.722355,100.546768,8864a4b327fffff,OJI Omakase at Sathorn
5,ChIJT-MmY-mj4jARBxgvjap0hf0,0.339274,13.65134,100.488991,8864a4b231fffff,Suki Teenoi Susco Phuttha Bucha
6,ChIJQRMf7wOZ4jARQz7dlVrlE48,0.328779,13.721587,100.516533,8864a4b15dfffff,Cozii Steak and Restaurant โคซี่ สเต๊ก
7,ChIJcdXwlUif4jAR2xee0t6QEs0,0.317556,13.74421,100.53511,8864a4b16bfffff,Sindosegi Thailand (ซินโดเซกิ)
8,ChIJfR02toKZ4jARBdP-FsD5LHw,0.311415,13.744181,100.533394,8864a4b16bfffff,Yuzu Curry Siam Square Soi.9
9,ChIJRffK4yuf4jARIcEK2GMqhEc,0.303736,13.729627,100.535095,8864a4b141fffff,Xin Tian Di (ซิน เทียน ตี้)


In [15]:
query = 'ซูสิ'
results = search(query)
milvus_result_to_dataframe(results)

Unnamed: 0,id,distance,lng,h3_r8,title,lat
0,ChIJG5cMlYmf4jARjI14Tvhzj5I,0.49179,100.543182,8864a4b14dfffff,Sushi Sora,13.726238
1,ChIJC_IG4WGZ4jARJil71QyJnJQ,0.437201,100.525108,8864a4b10dfffff,Min Sushi by Sushi Cottage ずしコテージ,13.740359
2,ChIJ89mnRNGj4jARXFw7F8kdlu8,0.41677,100.501335,8864a4b223fffff,ไข่หวานบ้านซูชิ สาขาประชาอุทิศ,13.660226
3,ChIJFepMlimf4jARW2MqZCN7GMQ,0.401952,100.54673,8864a4b327fffff,sushimai ซูชิมั้ย ศรีบำเพ็ญ,13.721445
4,ChIJ-eRZ7dif4jARwl4RGXfAbuI,0.346747,100.546768,8864a4b327fffff,OJI Omakase at Sathorn,13.722355
5,ChIJT-MmY-mj4jARBxgvjap0hf0,0.322483,100.488991,8864a4b231fffff,Suki Teenoi Susco Phuttha Bucha,13.65134
6,ChIJQRMf7wOZ4jARQz7dlVrlE48,0.313915,100.516533,8864a4b15dfffff,Cozii Steak and Restaurant โคซี่ สเต๊ก,13.721587
7,ChIJcdXwlUif4jAR2xee0t6QEs0,0.296838,100.53511,8864a4b16bfffff,Sindosegi Thailand (ซินโดเซกิ),13.74421
8,ChIJJVKaiCyZ4jARNtm2xPOh4zA,0.29353,100.527039,8864a4b10dfffff,Sasa Restaurant,13.741302
9,ChIJfR02toKZ4jARBdP-FsD5LHw,0.292948,100.533394,8864a4b16bfffff,Yuzu Curry Siam Square Soi.9,13.744181


In [16]:
query = 'sush'
results = search(query)
milvus_result_to_dataframe(results)

Unnamed: 0,id,distance,title,lat,lng,h3_r8
0,ChIJG5cMlYmf4jARjI14Tvhzj5I,0.369419,Sushi Sora,13.726238,100.543182,8864a4b14dfffff
1,ChIJ89mnRNGj4jARXFw7F8kdlu8,0.341008,ไข่หวานบ้านซูชิ สาขาประชาอุทิศ,13.660226,100.501335,8864a4b223fffff
2,ChIJFepMlimf4jARW2MqZCN7GMQ,0.33789,sushimai ซูชิมั้ย ศรีบำเพ็ญ,13.721445,100.54673,8864a4b327fffff
3,ChIJZ2mDRy6j4jAR3lgsc2hfQUw,0.310574,สเต็กปากมันส์,13.642928,100.49353,8864a4b239fffff
4,ChIJC_IG4WGZ4jARJil71QyJnJQ,0.307294,Min Sushi by Sushi Cottage ずしコテージ,13.740359,100.525108,8864a4b10dfffff
5,ChIJd-0f4IaZ4jARzU5kkvVX-io,0.289997,หมูจิ้มเปรี้ยว (โรงอาหารเซนต์โยฯ),13.725265,100.530655,8864a4b141fffff
6,ChIJT-MmY-mj4jARBxgvjap0hf0,0.280936,Suki Teenoi Susco Phuttha Bucha,13.65134,100.488991,8864a4b231fffff
7,ChIJm5ZRHxqj4jARVgsfB1L8ziY,0.280684,ครัวกันเอง,13.651203,100.484299,8864a4b233fffff
8,ChIJ2zWuDkWj4jARW4erWTVj2mk,0.280051,After You Dessert Cafe - Susco Phutthabucha,13.65135,100.488884,8864a4b231fffff
9,ChIJJVKaiCyZ4jARNtm2xPOh4zA,0.268876,Sasa Restaurant,13.741302,100.527039,8864a4b10dfffff


In [17]:
# disconnect Milvus connection
connections.disconnect('default')