# Azure AI Search integrated vectorization sample

This Python notebook demonstrates the [integrated vectorization](https://learn.microsoft.com/azure/search/vector-search-integrated-vectorization) features of Azure AI Search that are currently in public preview. 

Integrated vectorization takes a dependency on indexers and skillsets, using the Text Split skill for data chunking, and the AzureOpenAIEmbedding skill and your Azure OpenAI resorce for embedding.

This example uses PDFs from the `data/documents` folder for chunking, embedding, indexing, and queries.

### Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).
 
+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking.

+ A deployment of the `text-embedding-ada-002` model on Azure OpenAI.

+ Azure Blob Storage. This notebook connects to your storage account and loads a container with the sample PDFs.


In [None]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# Variables not used here do not need to be updated in your .env file
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"]) if len(os.environ["AZURE_SEARCH_ADMIN_KEY"]) > 0 else DefaultAzureCredential()
index_name = os.environ["AZURE_SEARCH_INDEX"]
blob_connection_string = os.environ["BLOB_CONNECTION_STRING"]
blob_container_name = os.environ["BLOB_CONTAINER_NAME"]
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_key = os.environ["AZURE_OPENAI_KEY"] if len(os.environ["AZURE_OPENAI_KEY"]) > 0 else None
azure_openai_embedding_deployment = os.environ["EMBEDDING_MODEL_NAME"]

# Extract storage account name from connection string for Azure AD authentication
storage_account_name = None
if "AccountName=" in blob_connection_string:
    storage_account_name = blob_connection_string.split("AccountName=")[1].split(";")[0]
else:
    # Alternative: set AZURE_STORAGE_ACCOUNT_NAME in your .env file
    storage_account_name = os.environ.get("AZURE_STORAGE_ACCOUNT_NAME", "")

In [None]:
# Debug Azure OpenAI configuration
print("=== Azure OpenAI Configuration Debug ===")
print(f"Azure OpenAI endpoint: {azure_openai_endpoint}")
print(f"Azure OpenAI key: {'***' + azure_openai_key[-4:] if azure_openai_key and len(azure_openai_key) > 4 else 'Not set'}")
print(f"Embedding deployment: {azure_openai_embedding_deployment}")
print(f"Search endpoint: {endpoint}")
print(f"Search index: {index_name}")
print("===========================================")

In [None]:
# Fix Azure OpenAI endpoint URL (remove trailing slash if present)
if azure_openai_endpoint.endswith('/'):
    azure_openai_endpoint = azure_openai_endpoint.rstrip('/')
    print(f"Fixed Azure OpenAI endpoint: {azure_openai_endpoint}")

# Test Azure OpenAI connection
try:
    from openai import AzureOpenAI
    client = AzureOpenAI(
        api_key=azure_openai_key,
        api_version="2024-02-01",
        azure_endpoint=azure_openai_endpoint
    )
    
    # Test embedding
    response = client.embeddings.create(
        input="test",
        model=azure_openai_embedding_deployment
    )
    print("✅ Azure OpenAI connection test successful")
    print(f"Embedding dimensions: {len(response.data[0].embedding)}")
except Exception as e:
    print(f"❌ Azure OpenAI connection test failed: {e}")
    print("This might explain why the indexer is failing.")

In [None]:
# Let's check all Azure OpenAI related environment variables
print("=== All Azure OpenAI Environment Variables ===")
for key, value in os.environ.items():
    if any(keyword in key.upper() for keyword in ['OPENAI', 'EMBEDDING', 'MODEL', 'DEPLOYMENT']):
        if 'KEY' in key.upper() or 'SECRET' in key.upper():
            print(f"{key}: {'***' + value[-4:] if value and len(value) > 4 else 'Not set'}")
        else:
            print(f"{key}: {value}")

print("\n=== Testing different possible deployment names ===")
# Common embedding deployment names to try
possible_names = [
    azure_openai_embedding_deployment,
    "text-embedding-ada-002", 
    "embedding",
    "embeddings",
    "text-embedding-3-small",
    "text-embedding-3-large"
]

client = AzureOpenAI(
    api_key=azure_openai_key,
    api_version="2024-02-01",
    azure_endpoint=azure_openai_endpoint
)

for deployment_name in possible_names:
    if deployment_name:
        try:
            response = client.embeddings.create(
                input="test",
                model=deployment_name
            )
            print(f"✅ {deployment_name} - WORKS! (dimensions: {len(response.data[0].embedding)})")
            break
        except Exception as e:
            print(f"❌ {deployment_name} - {str(e)[:100]}...")

In [None]:
print(f"Storage account name: {storage_account_name}")
print(f"Blob container name: {blob_container_name}")

## Connect to Blob Storage and load documents

Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.  

### Azure Storage Authentication

If you're getting a "Key based authentication is not permitted" error, your storage account requires Azure AD authentication. You have two options:

1. **Azure AD Authentication (Recommended)**: Use your Azure credentials
2. **Connection String**: If your storage account allows it

For Azure AD authentication, make sure you're logged in to Azure:
- Run `az login` in terminal, or
- Use VS Code Azure extension to sign in, or  
- Set up service principal credentials

You may also need to add your user/service principal to the storage account's access control (IAM) with "Storage Blob Data Contributor" role.

### ⚠️ Permission Required: Storage Blob Data Contributor

Your authentication is working, but you need upload permissions. Follow these steps:

#### Option 1: Azure Portal (Recommended)
1. Go to [Azure Portal](https://portal.azure.com)
2. Navigate to **Storage accounts** → **discoun** 
3. Click **Access Control (IAM)** in the left menu
4. Click **+ Add** → **Add role assignment**
5. Select **Storage Blob Data Contributor** role
6. Click **Next**
7. Select **User, group, or service principal**
8. Click **+ Select members**
9. Search for and select: **yanivvaknin@microsoft.com**
10. Click **Review + assign**

#### Option 2: Azure CLI (if user lookup works)
```bash
# Get your object ID first
USER_ID=$(az ad user show --id yanivvaknin@microsoft.com --query id -o tsv)

# Assign the role
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee $USER_ID \
  --scope "/subscriptions/0ac2cbd1-7fd2-435c-a854-3a3335dcb184/resourceGroups/aisw/providers/Microsoft.Storage/storageAccounts/discoun"
```

After assigning permissions, re-run the cell above. The upload should work! 🚀

In [None]:
from azure.storage.blob import BlobServiceClient  
from azure.identity import DefaultAzureCredential
from azure.core.exceptions import HttpResponseError

# Connect to Blob Storage using Azure AD authentication
try:
    if not storage_account_name:
        raise ValueError("Storage account name not found. Please check your BLOB_CONNECTION_STRING or set AZURE_STORAGE_ACCOUNT_NAME.")
    
    # Use Azure AD authentication
    credential_ad = DefaultAzureCredential()
    account_url = f"https://{storage_account_name}.blob.core.windows.net"
    blob_service_client = BlobServiceClient(account_url=account_url, credential=credential_ad)
    
    print(f"Using Azure AD authentication for storage account: {storage_account_name}")
    
    # Test authentication by listing containers (requires minimal permissions)
    try:
        containers = list(blob_service_client.list_containers())
        print(f"Successfully authenticated. Found {len(containers)} containers.")
    except HttpResponseError as auth_error:
        if "AuthorizationPermissionMismatch" in str(auth_error):
            print("❌ Permission Error: You need 'Storage Blob Data Contributor' role on this storage account.")
            print(f"Please run this command in Azure CLI:")
            print(f"az role assignment create --role 'Storage Blob Data Contributor' --assignee $(az account show --query user.name -o tsv) --resource-group aisw --scope '/subscriptions/0ac2cbd1-7fd2-435c-a854-3a3335dcb184/resourceGroups/aisw/providers/Microsoft.Storage/storageAccounts/{storage_account_name}'")
            print("\nAlternatively, you can:")
            print("1. Go to Azure Portal")
            print(f"2. Navigate to Storage Account: {storage_account_name}")
            print("3. Go to Access Control (IAM)")
            print("4. Add role assignment: 'Storage Blob Data Contributor'")
            print("5. Add your user account")
            raise
        else:
            raise

except Exception as e:
    print(f"Azure AD authentication failed: {e}")
    print("Make sure you're logged in to Azure (run 'az login') and have proper permissions.")
    raise

# Only proceed if authentication was successful
container_client = blob_service_client.get_container_client(blob_container_name)
if not container_client.exists():
    try:
        container_client.create_container()
        print(f"Created container: {blob_container_name}")
    except HttpResponseError as e:
        if "AuthorizationPermissionMismatch" in str(e):
            print(f"❌ Cannot create container. Need 'Storage Blob Data Contributor' role.")
            raise
        else:
            raise

documents_directory = os.path.join("data", "documents1")
uploaded_count = 0
for file in os.listdir(documents_directory):
    try:
        with open(os.path.join(documents_directory, file), "rb") as data:
            name = os.path.basename(file)
            blob_client = container_client.get_blob_client(name)
            
            # Check if blob exists
            try:
                if not blob_client.exists():
                    blob_client.upload_blob(data=data, overwrite=True)
                    uploaded_count += 1
                    print(f"✅ Uploaded: {name}")
                else:
                    print(f"⏭️  Already exists: {name}")
            except HttpResponseError as upload_error:
                if "AuthorizationPermissionMismatch" in str(upload_error):
                    print(f"❌ Cannot upload {name}. Need proper permissions.")
                    raise
                else:
                    raise
    except Exception as file_error:
        print(f"❌ Error processing {file}: {file_error}")
        continue

print(f"\n🎉 Setup complete! Uploaded {uploaded_count} new files to container '{blob_container_name}'")

## Create a blob data source connector on Azure AI Search

In [None]:
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection
)

# Create a data source using Azure AD authentication with ResourceId format
indexer_client = SearchIndexerClient(endpoint, credential)
container = SearchIndexerDataContainer(name=blob_container_name)

# For Azure AD authentication, use ResourceId format (replace with your actual values)
# Format: ResourceId=/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Storage/storageAccounts/{storage-account-name};
resource_id_connection_string = f"ResourceId=/subscriptions/0ac2cbd1-7fd2-435c-a854-3a3335dcb184/resourceGroups/aisw/providers/Microsoft.Storage/storageAccounts/{storage_account_name};"

data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=resource_id_connection_string,
    container=container
)
data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated using Azure AD authentication with ResourceId")

## Create a search index

Vector and nonvector content is stored in a search index.

In [None]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    HnswParameters,
    VectorSearchAlgorithmMetric,
    ExhaustiveKnnAlgorithmConfiguration,
    ExhaustiveKnnParameters,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    SemanticConfiguration,
    SemanticSearch,
    SemanticPrioritizedFields,
    SemanticField,
    SearchIndex
)

# Create a search index  
index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  
fields = [  
    SearchField(name="parent_id", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),  
    SearchField(name="title", type=SearchFieldDataType.String),  
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=3072, vector_search_profile_name="myHnswProfile"),  
]  
  
# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(  
            name="myHnsw",  
            parameters=HnswParameters(  
                m=8,  
                ef_construction=256,  
                ef_search=256,  
                metric=VectorSearchAlgorithmMetric.dot_product,  
            ),  
        ),  
        ExhaustiveKnnAlgorithmConfiguration(  
            name="myExhaustiveKnn",  
            parameters=ExhaustiveKnnParameters(  
                metric=VectorSearchAlgorithmMetric.dot_product,  
            ),  
        ),  
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",  
            vectorizer_name="myOpenAI",  
        ),  
        VectorSearchProfile(  
            name="myExhaustiveKnnProfile",  
            algorithm_configuration_name="myExhaustiveKnn",  
            vectorizer_name="myOpenAI",  
        ),  
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=azure_openai_endpoint,  
                deployment_name=azure_openai_embedding_deployment,  
                api_key=azure_openai_key,  
                model_name="text-embedding-3-large",  
            ),  
        ),  
    ],  
)  
  
semantic_config = SemanticConfiguration(  
    name="my-semantic-config",  
    prioritized_fields=SemanticPrioritizedFields(  
        content_fields=[SemanticField(field_name="chunk")]  
    ),  
)  
  
# Create the semantic search with the configuration  
semantic_search = SemanticSearch(configurations=[semantic_config])  
  
# Create the search index
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")  

## Create a skillset

Skills drive integrated vectorization. [Text Split](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) provides data chunking. [AzureOpenAIEmbedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) handles calls to Azure OpenAI, using the connection information you provide in the environment variables. An [indexer projection](https://learn.microsoft.com/azure/search/index-projections-concept-intro) specifies secondary indexes used for chunked data.

In [None]:
from azure.search.documents.indexes.models import (
    SplitSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    AzureOpenAIEmbeddingSkill,
    SearchIndexerIndexProjection,
    SearchIndexerIndexProjectionSelector,
    SearchIndexerIndexProjectionsParameters,
    IndexProjectionMode,
    SearchIndexerSkillset
)

# Create a skillset  
skillset_name = f"{index_name}-skillset"  
  
split_skill = SplitSkill(  
    description="Split skill to chunk documents",  
    text_split_mode="pages",  
    context="/document",  
    maximum_page_length=1000,  
    page_overlap_length=400,  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/content"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="textItems", target_name="pages")  
    ],  
)  
  
embedding_skill = AzureOpenAIEmbeddingSkill(  
    description="Skill to generate embeddings via Azure OpenAI",  
    context="/document/pages/*",  
    resource_url=azure_openai_endpoint,  
    deployment_name=azure_openai_embedding_deployment,  
    model_name="text-embedding-3-large",  
    api_key=azure_openai_key,  
    inputs=[  
        InputFieldMappingEntry(name="text", source="/document/pages/*"),  
    ],  
    outputs=[  
        OutputFieldMappingEntry(name="embedding", target_name="vector")  
    ],  
)  
  
index_projections = SearchIndexerIndexProjection(  
    selectors=[  
        SearchIndexerIndexProjectionSelector(  
            target_index_name=index_name,  
            parent_key_field_name="parent_id",  
            source_context="/document/pages/*",  
            mappings=[  
                InputFieldMappingEntry(name="chunk", source="/document/pages/*"),  
                InputFieldMappingEntry(name="vector", source="/document/pages/*/vector"),  
                InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),  
            ],  
        ),  
    ],  
    parameters=SearchIndexerIndexProjectionsParameters(  
        projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
    ),  
)  
  
skillset = SearchIndexerSkillset(  
    name=skillset_name,  
    description="Skillset to chunk documents and generating embeddings",  
    skills=[split_skill, embedding_skill],  
    index_projection=index_projections,  
)  
  
client = SearchIndexerClient(endpoint, credential)  
client.create_or_update_skillset(skillset)  
print(f"{skillset.name} created")

## Create an indexer

In [None]:
from azure.search.documents.indexes.models import (
    SearchIndexer,
    FieldMapping
)

# Create an indexer  
indexer_name = f"{index_name}-indexer"  
  
indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,  
    data_source_name=data_source.name,  
    # Map the metadata_storage_name field to the title field in the index to display the PDF title in the search results  
    field_mappings=[FieldMapping(source_field_name="metadata_storage_name", target_field_name="title")]  
)  
  
indexer_client = SearchIndexerClient(endpoint, credential)  
indexer_result = indexer_client.create_or_update_indexer(indexer)  
  
# Run the indexer  
indexer_client.run_indexer(indexer_name)  
print(f' {indexer_name} is created and running. If queries return no results, please wait a bit and try again.')  


In [None]:
indexer_status = indexer_client.get_indexer_status(indexer_name)
if indexer_status.last_result.status == "success":
    print("Indexer completed successfully")
else:
    print("❌ Indexer has issues or is still running")

In [None]:
# Get detailed indexer status and error information
indexer_status = indexer_client.get_indexer_status(indexer_name)
print(f"Indexer Status: {indexer_status.last_result.status}")
print(f"Items processed: {indexer_status.last_result.item_count}")
print(f"Items failed: {indexer_status.last_result.failed_item_count}")

if indexer_status.last_result.errors:
    print("\n=== INDEXER ERRORS ===")
    for error in indexer_status.last_result.errors:
        print(f"Error: {error.error_message}")
        print(f"Key: {error.key}")
        print(f"Name: {error.name}")
        print("---")

if indexer_status.last_result.warnings:
    print("\n=== INDEXER WARNINGS ===")
    for warning in indexer_status.last_result.warnings:
        print(f"Warning: {warning.message}")
        print(f"Key: {warning.key}")
        print(f"Name: {warning.name}")
        print("---")

In [None]:
# Wait a bit for indexing to complete
import time
print("Waiting for indexing to complete...")
time.sleep(10)

# Check status again
indexer_status = indexer_client.get_indexer_status(indexer_name)
print(f"Indexer Status: {indexer_status.last_result.status}")
print(f"Items processed: {indexer_status.last_result.item_count}")
print(f"Items failed: {indexer_status.last_result.failed_item_count}")

# Check if there are any documents in the index
from azure.search.documents import SearchClient
search_client = SearchClient(endpoint, index_name, credential=credential)
results = search_client.search(search_text="*", top=1)
doc_count = 0
for result in results:
    doc_count += 1
    break
    
print(f"Documents in index: {doc_count} (sample check)")

# Get actual count
try:
    from azure.search.documents.indexes import SearchIndexClient
    index_client = SearchIndexClient(endpoint, credential)
    stats = index_client.get_index_statistics(index_name)
    print(f"Total documents in index: {stats.document_count}")
except Exception as e:
    print(f"Could not get index statistics: {e}")

In [None]:
# Check what's currently in the index
print("=== Current Index Contents ===")
results = search_client.search(search_text="*", top=10)
doc_count = 0
for result in results:
    doc_count += 1
    print(f"Doc {doc_count}: {result.get('title', 'No title')} (ID: {result.get('chunk_id', 'No ID')})")
    print(f"  Content preview: {str(result.get('chunk', ''))[:100]}...")
    print()

print(f"Total documents found: {doc_count}")

# Now try the original vector search to see if it works
print("\n=== Testing Vector Search ===")
try:
    from azure.search.documents.models import VectorizableTextQuery
    
    # Pure Vector Search
    query = "מה השפות שנתמכות באישור ניהול חשבון"  
    
    vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="vector", exhaustive=True)
    
    results = search_client.search(  
        search_text=None,  
        vector_queries=[vector_query],
        select=["parent_id", "chunk_id", "chunk"],
        top=3
    )  
    
    result_count = 0
    for result in results:  
        result_count += 1
        print(f"Result {result_count}:")
        print(f"  parent_id: {result['parent_id']}")  
        print(f"  chunk_id: {result['chunk_id']}")  
        print(f"  Score: {result['@search.score']}")  
        print(f"  Content: {result['chunk'][:200]}...")
        print()
        
    if result_count == 0:
        print("No results found. The index might still be processing.")
        
except Exception as e:
    print(f"Vector search failed: {e}")
    print("This might be because the vectorizer configuration is still not fully set up.")

## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

If you indexed the health plan PDF file, send queries that ask plan-related questions.

In [None]:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

# Pure Vector Search
query = "מה השפות שנתמכות באישור ניהול חשבון"  
  
search_client = SearchClient(endpoint, index_name, credential=credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=5, fields="vector", exhaustive=True)
# Use the below query to pass in the raw vector query instead of the query vectorization
# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields="vector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=1
)  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")   


## Perform a hybrid search

In [None]:
# Hybrid Search
query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"  
  
search_client = SearchClient(endpoint, index_name, credential=credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)
  
results = search_client.search(  
    search_text=query,  
    vector_queries= [vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=1
)  
  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")  


## Perform a hybrid search + semantic reranking

In [None]:
from azure.search.documents.models import (
    QueryType,
    QueryCaptionType,
    QueryAnswerType,
    VectorizableTextQuery  # Add this line
)
# Semantic Hybrid Search
query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"

search_client = SearchClient(endpoint, index_name, credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)

results = search_client.search(  
    search_text=query,
    vector_queries=[vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    query_type=QueryType.SEMANTIC,
    semantic_configuration_name='my-semantic-config',
    query_caption=QueryCaptionType.EXTRACTIVE,
    query_answer=QueryAnswerType.EXTRACTIVE,
    top=1
)
semantic_answers = results.get_answers()
if semantic_answers:
    for answer in semantic_answers:
        if answer.highlights:
            print(f"Semantic Answer: {answer.highlights}")
        else:
            print(f"Semantic Answer: {answer.text}")
        print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Reranker Score: {result['@search.reranker_score']}")
    print(f"Content: {result['chunk']}")  

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")


## Generate a comprehensive answer using Azure OpenAI

Now let's use Azure OpenAI to generate a comprehensive answer based on the retrieved context from our similarity search.

In [None]:
from openai import AzureOpenAI
import os

# Set up Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_key,
    api_version="2024-10-21"
)

# Get the deployment name for chat completions
deployment_name = os.getenv('AZURE_OPENAI_MODEL')

def generate_answer_with_context(user_query, search_results):
    """Generate a comprehensive answer using Azure OpenAI based on search results."""
    
    # Collect context from search results
    context_chunks = []
    for result in search_results:
        context_chunks.append(result.get('chunk', ''))
    
    # Combine all context
    combined_context = "\n\n".join(context_chunks[:3])  # Use top 3 results
    
    # Create the system message
    system_message = """You are a helpful assistant that answers questions based on the provided context. 
    Use the context to provide accurate, comprehensive answers. If the context doesn't contain enough 
    information to fully answer the question, indicate what information is available and what might be missing.
    
    Always cite specific information from the context when possible."""
    
    # Create the user message with context
    user_message = f"""Based on the following context, please answer this question: {user_query}
    
    Context:
    {combined_context}
    
    Please provide a comprehensive answer based on the context above."""
    
    try:
        response = client.chat.completions.create(
            model=deployment_name,
            messages=[
                {"role": "system", "content": system_message},
                {"role": "user", "content": user_message}
            ],
            max_tokens=1000,
            temperature=0.3
        )
        
        return response.choices[0].message.content
    except Exception as e:
        return f"Error generating answer: {str(e)}"

# Use the same query from the search above
user_query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"

# Collect the search results (re-run the search to get fresh results)
search_client = SearchClient(endpoint, index_name, credential)
vector_query = VectorizableTextQuery(text=user_query, k_nearest_neighbors=3, fields="vector", exhaustive=True)

# Use a simpler search without semantic search (since it's not enabled)
search_results = search_client.search(  
    search_text=user_query,
    vector_queries=[vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=3
)

# Convert search results to list for reuse
search_results_list = list(search_results)

# Generate the AI-powered answer
print("=" * 80)
print("🤖 AI-GENERATED COMPREHENSIVE ANSWER")
print("=" * 80)
print()

ai_answer = generate_answer_with_context(user_query, search_results_list)
print(ai_answer)

print("\n" + "=" * 80)
print("📄 SUPPORTING CONTEXT FROM SEARCH")
print("=" * 80)

# Show the supporting evidence
for i, result in enumerate(search_results_list, 1):
    print(f"\n--- Context Chunk {i} ---")
    print(f"Document: {result['parent_id']}")
    print(f"Chunk ID: {result['chunk_id']}")
    print(f"Search Score: {result.get('@search.score', 'N/A')}")
    print(f"Content: {result['chunk'][:500]}...")  # Truncate for readability

In [None]:
try:
    result = index_client.delete_index(index_name)
    print ('Index', index_name, 'Deleted')
except Exception as ex:
    print (ex)