## AI-SQL  For Healthcare
## Transforming Healthcare Through AI-Powered Clinical Analytics
Snowflake's AI-SQL capabilities listed below simplifies usage of AI on multimodal Healthcare data for SQL Savvy users
1) FILE DATA TYPE
2) AI_FILTER
3) AI_TRANSCRIBE
4) AI_CLASSIFY
5) AI_AGG

*This roadmap outlines the transformation of basic healthcare multimodal analytics into a comprehensive **Healthcare Intelligence Platform** that revolutionizes clinical decision-making, patient safety, and healthcare operations.*

## üìã Overview
- **Advanced Medical Image Intelligence** : Transform basic image classification into comprehensive diagnostic support with treatment recommendations.
- **AI-Powered Diagnostic Analysis**: Beyond basic classification to detailed abnormality detection
- **Clinical Decision Support**: Evidence-based recommendations integrated with patient context

## Results:

- **40% improvement** in workflow efficiency due to automatic classification and processing of multimodal data
- Develop **intelligent clinical decision support systems**
- Actionable **health intervention insights** aggregated at population level across multiple medical transcripts
- **Risk Stratification**: Complication risk and readmission prediction

In [None]:
import streamlit as st
import pandas as pd
from snowflake.snowpark.context import get_active_session

session=get_active_session()

session.query_tag = {"origin":"sf_sit-is", "name":"AISQL-HOL", "version":{"major":1, "minor":0}, "attributes":{"is_quickstart":3, "source":"notebook"}}
print(session)

In [None]:
-- Set the correct schema context for accessing data
-- Data assets are in the default schema, not the notebooks schema
USE SCHEMA DEFAULT_SCHEMA;

## Leverage AISQL to process images for Healthcare

Pt1: Load Medical Images directly from stage using newly introduced FILE Data Type. Leverage 

Useful for multiple usecases for.e.g Identify medical images to the appropriata Data Product pipelines. E.g 'Retinal Scan' images into a Vision Health Data Product.

In [None]:
-- uploaded medical images(jpegs) into Snowflake stage 

create or replace  table medical_images as
select TO_FILE(file_url) img_file, * 
FROM directory(@MEDICAL_IMAGES);


select * from medical_images;



Leverage the AI_COMPLETE function to classify each image in the medical_images table. It passes a prompt to llama4-maverick to classify each image and return the result as a new column named image_classification. 

In [None]:
SELECT 
IMG_FILE,
AI_COMPLETE('llama4-maverick',
PROMPT('Classify the input image {0} in no more than 2 words', img_file)) AS image_classification
FROM medical_images;

Leverage the AI_FILTER function that takes a prompt and an input, and returns True or False. In this case, the prompt is 'This medical image {0} is a Chest X-ray', and the input is the img_file from each row. The query will only return the IMG_FILE for the images that the AI model determines are Chest X-rays.

In [None]:
SELECT 
IMG_FILE
FROM medical_images
WHERE AI_FILTER(PROMPT('This medical image {0} is a Chest X-ray', img_file)) ;

Leverages the AI_CLASSIFY function to automatically categorize medical images stored in the medical_images table. For each image, it determines whether the file is an 'X-ray', 'CT', 'MRI', 'Ultrasound', 'PET', or 'Other' and returns the result in a new column called classification.

In [None]:
SELECT 
    img_file,
    AI_CLASSIFY(PROMPT('please help me classify this medical scan image {0}', img_file),
        ['X-ray', 'CT', 'MRI', 'Ultrasound','PET','Other']) as classification
from medical_images;


## Working with audio files using AI_TRANSCRIBE

AI_TRANSCRIBE function transcribes text from an audio file with optional timestamps and speaker labels. AI_TRANSCRIBE supports numerous languages, and audio can contain more than one language.

In [None]:
query=f'''(SELECT  
        A.RELATIVE_PATH ,
        A.CALL_TRANSCRIPT,
        AI_CLASSIFY(CALL_TRANSCRIPT, ['claims related', 'coverage related', 'billing related','other']):labels[0]::text AI_CALL_REASON
FROM 
(SELECT RELATIVE_PATH, AI_TRANSCRIBE(audio_file):text::STRING CALL_TRANSCRIPT FROM
    (SELECT 
        relative_path RELATIVE_PATH,
        TO_FILE('@CALL_RECORDINGS', relative_path) AS audio_file 
    FROM directory(@CALL_RECORDINGS)
    -- WHERE RELATIVE_PATH LIKE  '%CALL_RECORDINGS%'
    LIMIT 5
    )
) A)'''
data=session.sql(query).to_pandas()

data.head()

In [None]:
import streamlit as st
import pandas as pd


st.title("Call Recordings Dashboard")

# Color mapping
color_map = {
    "claims related": "üîµ",
    "coverage related": "üü¢",
    "billing related": "üü°",
    "other": "‚ö™"
}

data["Reason"] = data["AI_CALL_REASON"].map(color_map) + " " + data["AI_CALL_REASON"]

st.dataframe(data[["RELATIVE_PATH", "Reason", "CALL_TRANSCRIPT"]])


## Working with PDFs via AISQL Functions

Use AI_FILTER to:
 1) Filter out Medical Transcripts that are related to a specific ICD10 value.


In [None]:
CREATE OR REPLACE TABLE POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW AS SELECT
    relative_path as TRANSCRIPT_NAME, 
    GET_PRESIGNED_URL(@Medical_Transcripts, relative_path) as scoped_file_url, 
    TO_VARIANT(SNOWFLAKE.CORTEX.PARSE_DOCUMENT(@Medical_Transcripts, relative_path , {'mode': 'ocr'})) as raw_text_dict,
    raw_text_dict:content as FULL_TEXT
FROM DIRECTORY(@Medical_Transcripts);


Overall we created a new Snowflake table named POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW from files stored in the stage. It leverages the SNOWFLAKE.CORTEX.PARSE_DOCUMENT function to extract text from each file, using OCR mode, and stores the file name and the extracted text in the new table.

In [None]:
--MEDICAL TRANSCRIPTS information is preloaded into a Snowflake STRING column using Parse_Document function.
SELECT TRANSCRIPT_NAME, FULL_TEXT FROM POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW

Let us select the transcript name and the full text from the POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW table, but it only returns rows where the AI_FILTER function determines that the medical transcript is about a motor vehicle accident. The AI_FILTER function acts as a powerful filter, using its AI capabilities to analyze the content of the FULL_TEXT column based on the provided prompt and returning a boolean value (true or false) to include or exclude a row.

In [None]:
--Filtering for medical transcripts which refer to a motor accident

SELECT TRANSCRIPT_NAME, FULL_TEXT FROM POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW
WHERE AI_FILTER('This medical transcript has references to a motor vehicle accident faced by the member' ||  FULL_TEXT)

## Get aggregated insights directly via SQL

Use AI_AGG to:

Generate Actionable health intervention insights aggregated at population level across multiple medical transcripts

In [None]:
SELECT AI_AGG(FULL_TEXT, '

You are an AI agent helping the physican assigned for a population of patients.And these are the medical transcripts associated to those patients.
Evaluate these medical transcripts, and recommend the following.

1) Key insights on the overall health of the undertaken population
2) What are the specific actions at a population level that they physician can take to improve the health
3) Who is the top patient that needs highest level of attention , and what are the proposed next steps in engaging with that patient?

') ACTIONABLE_INSIGHTS
FROM POP_HEALTH_MEDICAL_TRANSCRIPTS_RAW

Retrieve the results of the query from previous cell and convert the output into a Pandas DataFrame and iterates through the ACTIONABLE_INSIGHTS column, writing each generated insight to the Streamlit UI using st.write for display.

In [None]:
agg_patient_level_sdoh_insights_df = ai_agg_patient_level_sdoh_insights_1.to_df().to_pandas()
for insight in agg_patient_level_sdoh_insights_df['ACTIONABLE_INSIGHTS']:
    st.write(insight)

## Clinical decision support system
Bring together demographics and clinical notes and apply Snowflake Cortex AI to predict diagnostic confidence, complication risk, length of stay, and readmission likelihood ‚Äî all in real time. 

 - Join patient demographics with clinical notes.

- Construct a case_text summary for potential future embedding or NLP use.

- Apply AI classifications:

- Diagnostic confidence

- Complication risk

- Length of stay prediction

- Readmission risk

In [None]:
CREATE OR REPLACE VIEW CLINICAL_DECISION_SUPPORT_BASE AS
WITH comprehensive_patient_analysis AS (
    SELECT 
        p.patient_id,
        p.age,
        p.gender,
        p.race_ethnicity,
        p.insurance_type,
        p.chronic_conditions,
        p.risk_factors,
        p.primary_diagnosis,
        p.secondary_diagnoses,
        p.admission_type,
        cn.note_text,
        cn.chief_complaint,
        cn.assessment,
        cn.plan,
        cn.department,
        cn.provider_id,

        -- Precomputed string for AI_EMBED
        CONCAT(
            p.primary_diagnosis, ' ',
            cn.chief_complaint, ' ',
            cn.assessment, ' ',
            p.chronic_conditions, ' ',
            p.risk_factors
        ) AS case_text,

        -- Diagnostic confidence
        AI_CLASSIFY(
            CONCAT('Clinical presentation: ', cn.chief_complaint, '. Assessment: ', cn.assessment,
                   '. Patient profile: ', p.age, ' year old ', p.gender, ' with history of ',
                   p.chronic_conditions, '. Current plan: ', cn.plan),
            ['high_confidence_diagnosis', 'moderate_confidence', 'low_confidence', 'requires_additional_workup']
        ) AS diagnostic_confidence,

        -- Complication risk
        AI_CLASSIFY(
            CONCAT('Patient risk assessment: Age ', p.age, ', diagnosis: ', p.primary_diagnosis,
                   ', chronic conditions: ', p.chronic_conditions,
                   ', risk factors: ', p.risk_factors,
                   ', admission type: ', p.admission_type),
            ['low_complication_risk', 'moderate_complication_risk', 'high_complication_risk', 'critical_risk']
        ) AS complication_risk,

        -- Length of stay prediction
        AI_CLASSIFY(
            CONCAT('Hospital stay prediction for: ', p.primary_diagnosis, ' in ', p.age, ' year old ',
                   p.gender, ' with comorbidities: ', p.chronic_conditions,
                   ', admission type: ', p.admission_type),
            ['1-2_days', '3-5_days', '6-10_days', 'extended_stay']
        ) AS predicted_loss,

        -- Readmission risk
        AI_CLASSIFY(
            CONCAT('Readmission risk for: ', p.primary_diagnosis, ', patient age: ', p.age,
                   ', chronic conditions: ', p.chronic_conditions,
                   ', insurance: ', p.insurance_type, ', risk factors: ', p.risk_factors),
            ['low_readmission_risk', 'moderate_readmission_risk', 'high_readmission_risk', 'very_high_risk']
        ) AS readmission_risk
    FROM patient_demographics p
    LEFT JOIN clinical_notes cn ON p.patient_id = cn.patient_id
    WHERE cn.note_id IS NOT NULL
)

SELECT * FROM comprehensive_patient_analysis;

In [None]:
import altair as alt
query = "SELECT * FROM CLINICAL_DECISION_SUPPORT_BASE"

df=session.sql(query).to_pandas()

df.columns = [col.lower() for col in df.columns]

st.subheader("üóÉÔ∏è Data Snapshot")
st.dataframe(df.head())

# -- Chart 1: Diagnostic Confidence Distribution --
st.subheader("üìä Diagnostic Confidence Distribution")
conf_chart = alt.Chart(df).mark_bar().encode(
    x=alt.X("diagnostic_confidence", title="Confidence Level"),
    y=alt.Y("count()", title="Patient Count"),
    color="diagnostic_confidence"
).properties(height=300)
st.altair_chart(conf_chart, use_container_width=True)

# -- Chart 2: Complication Risk by Admission Type --
st.subheader("ü©∫ Complication Risk by Admission Type")
risk_chart = alt.Chart(df).mark_bar().encode(
    x=alt.X("admission_type:N", title="Admission Type"),
    y=alt.Y("count()", title="Patient Count"),
    color="complication_risk",
    tooltip=["admission_type", "complication_risk", "count()"]
).properties(height=300)
st.altair_chart(risk_chart, use_container_width=True)

# -- Chart 3: Readmission Risk by Insurance Type --
st.subheader("üîÅ Readmission Risk by Insurance Type")
readmission_chart = alt.Chart(df).mark_bar().encode(
    x=alt.X("insurance_type:N", title="Insurance Type"),
    y=alt.Y("count()", title="Patient Count"),
    color="readmission_risk",
    tooltip=["insurance_type", "readmission_risk", "count()"]
).properties(height=300).interactive()
st.altair_chart(readmission_chart, use_container_width=True)

# -- Chart 4: Predicted Length of Stay by Diagnosis --
st.subheader("üìÖ Predicted Length of Stay by Diagnosis")
los_chart = alt.Chart(df).mark_bar().encode(
    x=alt.X("primary_diagnosis:N", sort='-y', title="Primary Diagnosis"),
    y=alt.Y("count()", title="Count"),
    color="predicted_loss",
    tooltip=["primary_diagnosis", "predicted_loss", "count()"]
).properties(height=400).interactive()
st.altair_chart(los_chart, use_container_width=True)

# -- Chart 5: Patient Age vs Risk (Scatter Plot) --
st.subheader("üéØ Age vs Complication Risk")
age_risk_chart = alt.Chart(df).mark_circle(size=60).encode(
    x=alt.X("age:Q"),
    y=alt.Y("complication_risk:N"),
    color="complication_risk",
    tooltip=["age", "gender", "primary_diagnosis", "complication_risk"]
).interactive().properties(height=300)
st.altair_chart(age_risk_chart, use_container_width=True)

## End of notebook

## Results:

- **40% improvement** in workflow efficiency due to automatic classification and processing of multimodal data
- Develop **intelligent clinical decision support systems**
- Actionable **health intervention insights** aggregated at population level across multiple medical transcripts
- **Risk Stratification**: Complication risk and readmission prediction