## Using InterSystems IRIS Vector Search

### InterSystems IRIS Vector Search: An Overview
InterSystems IRIS Vector Search brings powerful AI and machine learning capabilities directly into your SQL workflows by enabling the storage and querying of high-dimensional vector embeddings within a relational database. Vector search works by comparing embedding vectors—numerical representations of unstructured data like text—to determine semantic similarity, making it ideal for tasks like intelligent search and information retrieval. With InterSystems IRIS, you can store these embeddings using the optimized VECTOR and EMBEDDING data types. The EMBEDDING type streamlines the process by converting text into vectors directly through SQL, without requiring direct interaction with an embedding model. By integrating these capabilities into standard SQL operations, IRIS transforms your relational database into a high-performance hybrid vector database—ready to support next-generation AI applications.

Watch the video below to get an overview of how vector search can power generative AI applications in InterSystems IRIS.

<iframe width="560" height="315" src="https://www.youtube.com/embed/-4SAkjqCpCI?si=_5x94XRFQvnok_U8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

### Running a Simple Vector Search
In the respository for this workshop, there is a set of medical data that will be used for experimentation. The data set includes ~1,500 patient encounters, each with structured and coded medical data. With each encounter, however, is also a generated clinical summary note that provides more context about the patient. This might include things such as their commuting situation, their mood during the encounter, or other information not easily categorized into a structured encounter record.

Run the block of code below to initiate a connection to InterSystems IRIS and view a snippet of this data set.

In [43]:
import os, pandas as pd
from sentence_transformers import SentenceTransformer
from sqlalchemy import create_engine, text

from dotenv import load_dotenv
load_dotenv(override=True)

username = 'SuperUser'
password = 'SYS'
hostname = 'localhost'
port = 1972
namespace = 'IRISAPP'
CONNECTION_STRING = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"
engine = create_engine(CONNECTION_STRING)

df = pd.read_sql("SELECT * FROM GenAI.encounters_vectorized", engine)
df.head()

Unnamed: 0,ENCOUNTER_ID,START,STOP,PATIENT_ID,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION,DESCRIPTION_OBSERVATIONS,DESCRIPTION_CONDITIONS,DESCRIPTION_MEDICATIONS,DESCRIPTION_PROCEDURES,DESCRIPTION_OBSERVATIONS_Vector,DESCRIPTION_PROCEDURES_Vector,DESCRIPTION_MEDICATIONS_Vector,DESCRIPTION_CONDITIONS_Vector
0,0,2014-05-23,2014-05-23,0,wellness,410620009,Well child visit (procedure),136.8,704.2,0.0,,,,CONDITIONS: Medication review due (situation),,,,,"-.055821496993303298951,.036406360566616058349...","-.055821496993303298951,.036406360566616058349..."
1,1,2015-05-29,2015-05-29,0,wellness,410620009,Well child visit (procedure),136.8,953.11,0.0,,,"OBSERVATIONS: Body Height 130.3 cm, OBSERVATIO...",CONDITIONS: Gingivitis (disorder),,PROCEDURES: Medication reconciliation (procedu...,".083452835679054260253,.0047196964733302593231...","-.063727788627147674561,.046419896185398101806...","-.038138713687658309936,-.03412200883030891418...","-.038138713687658309936,-.03412200883030891418..."
2,2,2015-06-05,2015-06-06,0,ambulatory,185349003,Encounter for check up (procedure),85.55,3105.35,0.0,66383009.0,Gingivitis (disorder),,,MEDICATIONS: sodium fluoride 0.0272 MG/MG Oral...,PROCEDURES: Dental consultation and report (pr...,,"-.029604965820908546447,.039720412343740463256...",".026072388514876365661,-.028673971071839332581...",".026072388514876365661,-.028673971071839332581..."
3,3,2016-06-03,2016-06-03,0,wellness,410620009,Well child visit (procedure),136.8,1152.67,0.0,,,"OBSERVATIONS: Body Height 135.2 cm, OBSERVATIO...",CONDITIONS: Medication review due (situation),,PROCEDURES: Medication reconciliation (procedu...,".085358038544654846191,.0010697043035179376602...","-.063727788627147674561,.046419896185398101806...","-.055821496993303298951,.036406360566616058349...","-.055821496993303298951,.036406360566616058349..."
4,4,2016-06-10,2016-06-11,0,ambulatory,185349003,Encounter for check up (procedure),85.55,3105.35,0.0,103697008.0,Patient referral for dental care (procedure),,,MEDICATIONS: sodium fluoride 0.0272 MG/MG Oral...,PROCEDURES: Dental consultation and report (pr...,,"-.029604965820908546447,.039720412343740463256...",".026072388514876365661,-.028673971071839332581...",".026072388514876365661,-.028673971071839332581..."


Notice that in addition to structured data—such as codes, costs, and standardized descriptions of the encounters—there are also columns with unstructured observations and notes, and accompanying vector embeddings. These vector embeddings will help a generative AI application retrieve relevant chunks of data from this set of patient encounters.

Let's try running a vector search. First, run the following line of code to select the sentence transformer model that will be used to create an embedding from your search term. The embedding model you use to embed your search queries should be compatible with the model used to create embeddings in your data set.

In [None]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

Run the next module, optionally replacing "Headache" with a search term of your choice. This module will create and print an embedding for the search term you have entered.

In [None]:
note_search = "Headache"
search_vector = model.encode(note_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector
print(search_vector)

Now let's run a vector search using your search term. With the code below, you will retrieve the top three chunks from your data set that are deemed most similar to the search term you provided. The results will be displayed in a Pandas DataFrame for easy viewing.

In [None]:
from sqlalchemy import text

vector_str = ",".join(str(x) for x in search_vector) 

with engine.connect() as conn:
    with conn.begin():
        sql = text("""
            SELECT TOP 3 * 
            FROM GenAI.encounters_vectorized
            ORDER BY VECTOR_DOT_PRODUCT(DESCRIPTION_OBSERVATIONS_Vector, TO_VECTOR(:search_vector)) DESC
        """)
        results = conn.execute(sql, {"search_vector": vector_str}).fetchall()

# Display results
df = pd.DataFrame(results)
df.head()