## Using InterSystems IRIS Vector Search

### InterSystems IRIS Vector Search: An Overview
InterSystems IRIS Vector Search brings powerful AI and machine learning capabilities directly into your SQL workflows by enabling the storage and querying of high-dimensional vector embeddings within a relational database. Vector search works by comparing embedding vectors—numerical representations of unstructured data like text—to determine semantic similarity, making it ideal for tasks like intelligent search and information retrieval. With InterSystems IRIS, you can store these embeddings using the optimized VECTOR and EMBEDDING data types. The EMBEDDING type streamlines the process by converting text into vectors directly through SQL, without requiring direct interaction with an embedding model. By integrating these capabilities into standard SQL operations, IRIS transforms your relational database into a high-performance hybrid vector database—ready to support next-generation AI applications.

Watch the video below to get an overview of how vector search can power generative AI applications in InterSystems IRIS.

<iframe width="560" height="315" src="https://www.youtube.com/embed/-4SAkjqCpCI?si=_5x94XRFQvnok_U8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

### Running a Simple Vector Search
In the respository for this workshop, there is a set of medical data that will be used for experimentation. The data set includes ~1,500 patient encounters, each with structured and coded medical data. With each encounter, however, is also a generated clinical summary note that provides more context about the patient. This might include things such as their commuting situation, their mood during the encounter, or other information not easily categorized into a structured encounter record.

Run the block of code below to initiate a connection to InterSystems IRIS and view a snippet of this data set.

In [None]:
import os, pandas as pd
from sentence_transformers import SentenceTransformer
from sqlalchemy import create_engine, text

from dotenv import load_dotenv
load_dotenv(override=True)

username = 'SuperUser'
password = 'SYS'
hostname = 'localhost'
port = 1972
namespace = 'IRISAPP'
CONNECTION_STRING = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"
engine = create_engine(CONNECTION_STRING)

df = pd.read_sql("SELECT * FROM GenAI.encounters", engine)
df.head()

Notice that in addition to structured data—such as codes, costs, and standardized descriptions of the encounters—there are also columns with unstructured observations and notes, and accompanying vector embeddings. These vector embeddings will help a generative AI application retrieve relevant chunks of data from this set of patient encounters.

Let's try running a vector search. First, run the following line of code to select the sentence transformer model that will be used to create an embedding from your search term. The embedding model you use to embed your search queries should be compatible with the model used to create embeddings in your data set.

In [None]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

Run the next module, optionally replacing "Headache" with a search term of your choice. This module will create and print an embedding for the search term you have entered.

In [None]:
note_search = "Tylenol usage"
search_vector = model.encode(note_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector
print(search_vector)

Now let's run a vector search against our CLINICAL_NOTES field using your search term. With the code below, you will retrieve the top three chunks from the CLINICAL_NOTES field in your data set that are deemed most similar to the search term you provided. The results will be displayed in a Pandas DataFrame for easy viewing.

In [None]:
from sqlalchemy import text

vector_str = ",".join(str(x) for x in search_vector)
print(vector_str)

with engine.connect() as conn:
    with conn.begin():
        sql = text("""
            SELECT TOP 10 ENCOUNTER_ID, CLINICAL_NOTES
            FROM GenAI.encounters
            ORDER BY VECTOR_DOT_PRODUCT(CLINICAL_NOTES_Vector, TO_VECTOR(:search_vector)) DESC
        """)
        results = conn.execute(sql, {"search_vector": vector_str}).fetchall()

# Display results
df = pd.DataFrame(results)
pd.set_option("display.max_colwidth", None)
df.head(10)

### Searching across multiple vectorized fields
Let's now consider that you may want to search across more than just your CLINICAL_NOTES field. In the block below, you will notice that similarities are being calculated between your search term and all five vectorized fields in the data set. Then, the results are being ordered by the greatest similarity match.

In the result set that follows, explore the similarity scores provided. Sometimes one field provides a particularly good match, while others do not.

Enter whatever search term you would like in the note_search variable. Feel free to play around with multiple searches.

In [None]:
note_search = "Pregnancy complications"
search_vector = model.encode(note_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector
print(search_vector)

vector_str = ",".join(str(x) for x in search_vector) 

with engine.connect() as conn:
    with conn.begin():
        sql = text("""
            SELECT TOP 5
                ENCOUNTER_ID,
                CLINICAL_NOTES,
                DESCRIPTION_OBSERVATIONS,
                DESCRIPTION_CONDITIONS,
                DESCRIPTION_PROCEDURES,
                DESCRIPTION_MEDICATIONS,
                VECTOR_DOT_PRODUCT(CLINICAL_NOTES_Vector, TO_VECTOR(:search_vector))
                    AS sim_notes,
                VECTOR_DOT_PRODUCT(DESCRIPTION_OBSERVATIONS_Vector, TO_VECTOR(:search_vector))
                    AS sim_obs,
                VECTOR_DOT_PRODUCT(DESCRIPTION_CONDITIONS_Vector,   TO_VECTOR(:search_vector))
                    AS sim_cond,
                VECTOR_DOT_PRODUCT(DESCRIPTION_PROCEDURES_Vector,   TO_VECTOR(:search_vector))
                    AS sim_proc,
                VECTOR_DOT_PRODUCT(DESCRIPTION_MEDICATIONS_Vector,  TO_VECTOR(:search_vector))
                    AS sim_med
            FROM GenAI.encounters
            ORDER BY GREATEST(
                VECTOR_DOT_PRODUCT(CLINICAL_NOTES_Vector,           TO_VECTOR(:search_vector)),
                VECTOR_DOT_PRODUCT(DESCRIPTION_OBSERVATIONS_Vector, TO_VECTOR(:search_vector)),
                VECTOR_DOT_PRODUCT(DESCRIPTION_CONDITIONS_Vector,   TO_VECTOR(:search_vector)),
                VECTOR_DOT_PRODUCT(DESCRIPTION_PROCEDURES_Vector,   TO_VECTOR(:search_vector)),
                VECTOR_DOT_PRODUCT(DESCRIPTION_MEDICATIONS_Vector,  TO_VECTOR(:search_vector))
) DESC

        """)
        results = conn.execute(sql, {"search_vector": vector_str}).fetchall()
df = pd.DataFrame(results, columns=[
    "ENCOUNTER_ID",
    "CLINICAL_NOTES", "DESCRIPTION_OBSERVATIONS", "DESCRIPTION_CONDITIONS",
    "DESCRIPTION_PROCEDURES", "DESCRIPTION_MEDICATIONS",
    "sim_notes",
    "sim_obs",
    "sim_cond",
    "sim_proc",
    "sim_med"
])
df["DESCRIPTION_OBSERVATIONS"] = df["DESCRIPTION_OBSERVATIONS"].str[:250]
df.head(5)


### Summary
Feel free to continue playing around with simple vector searches. When you are finished with this notebook, return to the workshop exercise document and proceed to task 2.3.