# Search Index Knowledge Source

Use **SearchIndexKnowledgeSource** to wrap an **existing Azure AI Search index** as a Knowledge Source without re-indexing data.

## üîë Key Features

| Feature | Description |
|---------|-------------|
| **Data Source** | Existing Azure AI Search index |
| **Auto-create Resources** | ‚ùå Does not create any new resources |
| **Deletion Behavior** | ‚úÖ Only deletes KS itself, does not affect original index |
| **Vector Search** | ‚úÖ Supported (if original index has vector fields) |
| **Use Cases** | Reuse indexes created traditionally, gradual migration |

## üìã Table of Contents

| Step | Description | Jump to |
|------|-------------|---------|
| 0Ô∏è‚É£ Environment Setup | Install dependencies and initialize | [View](#env-config) |
| üîç View Index | View existing index structure | [View](#view-index) |
| 1Ô∏è‚É£ Create Knowledge Source | Configure field mappings | [View](#create-ks) |
| 2Ô∏è‚É£ Create Knowledge Base | Create knowledge base | [View](#create-kb) |
| 3Ô∏è‚É£ Query Knowledge Base | Execute queries and multi-turn conversations | [View](#query-kb) |
| üìã View Resources | List created resources | [View](#list-resources) |
| üßπ Delete Resources | Cleanup (optional) | [View](#cleanup) |

---

## üìå Use Cases

| Scenario | Description |
|----------|-------------|
| **Existing Index** | Reuse indexes created traditionally (Indexer, Push API) |
| **Custom Schema** | Index contains custom field structure |
| **Gradual Migration** | Gradually migrate from traditional retrieval to Agentic Retrieval |
| **Multi-Index Integration** | Integrate multiple independent indexes into one Knowledge Base |

## ‚ö†Ô∏è Prerequisites

1. ‚úÖ Azure AI Search index **already exists** and contains data
2. ‚úÖ Know the index field structure (content fields, vector fields, etc.)
3. ‚úÖ If index has vector fields, configure corresponding Embedding model

---

<a id="env-config"></a>
## 0Ô∏è‚É£ Environment Setup

In [None]:
# Install necessary Python packages
# azure-search-documents: Azure AI Search SDK (requires version 11.7.0b2+ for Knowledge Source support)
# azure-identity: Azure authentication
# python-dotenv: Environment variable management

%pip install azure-search-documents==11.7.0b2 azure-identity python-dotenv -qU

<a id="init-config"></a>
## Initialize Configuration

Configure Azure AI Search and Azure OpenAI connection information.

In [None]:
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient

load_dotenv()

# Azure AI Search configuration
search_endpoint = os.getenv("AZURE_SEARCH_ENDPOINT")
search_api_key = os.getenv("AZURE_SEARCH_API_KEY")

# Azure OpenAI configuration
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-ada-002")
embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
gpt_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini")

# Create SearchIndexClient
index_client = SearchIndexClient(
    endpoint=search_endpoint,
    credential=AzureKeyCredential(search_api_key)
)

print(f"‚úÖ Azure AI Search: {search_endpoint}")
print("\nüîß Azure OpenAI:")
print(f"   Endpoint: {azure_openai_endpoint}")
print(f"   Embedding: {embedding_deployment} ({embedding_model})")
print(f"   GPT Model: {gpt_deployment}")

<a id="create-ks"></a>
## 1Ô∏è‚É£ Create Search Index Knowledge Source

Wrap an existing index as a Knowledge Source. The key is configuring **field mappings**.

---

<a id="view-index"></a>
## üîç View Existing Index

List all available indexes and view the target index field structure.

In [None]:
# Configure the index name to view
existing_index_name = "index-azure-ml"  # Use existing Azure ML index

# Get and display index structure
try:
    index = index_client.get_index(existing_index_name)
    print(f"üìã Index: {index.name}")
    print("=" * 50)
    print("\nField list:")
    for field in index.fields:
        field_type = field.type
        attrs = []
        if field.key:
            attrs.append("KEY")
        if field.searchable:
            attrs.append("searchable")
        if field.filterable:
            attrs.append("filterable")
        if field.sortable:
            attrs.append("sortable")
        if hasattr(field, 'vector_search_dimensions') and field.vector_search_dimensions:
            attrs.append(f"vector({field.vector_search_dimensions})")
        
        attrs_str = f" [{', '.join(attrs)}]" if attrs else ""
        print(f"  - {field.name}: {field_type}{attrs_str}")
    
    # Check semantic configuration
    if index.semantic_search:
        print("\n‚úÖ Semantic search configured")
        for config in index.semantic_search.configurations:
            print(f"  - Config name: {config.name}")
    else:
        print("\n‚ö†Ô∏è Semantic search not configured")
    
    # Check vector search
    if index.vector_search:
        print("\n‚úÖ Vector search configured")
        for profile in index.vector_search.profiles:
            print(f"  - Profile: {profile.name}")
    else:
        print("\n‚ö†Ô∏è Vector search not configured")
        
except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"\nPlease ensure index '{existing_index_name}' exists.")

In [None]:
from azure.search.documents.indexes.models import (
    SearchIndexKnowledgeSource,
    SearchIndexKnowledgeSourceParameters,
    SearchIndexKnowledgeSourceFieldMappings,
    KnowledgeSourceAzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters
)

# Knowledge Source name
search_index_ks_name = "azure-ml-knowledge-source"

# Existing index name
# existing_index_name = "index-azure-ml"  # Already defined above

# Embedding model configuration (for query vectorization)
aoai_embedding_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=embedding_deployment,
    model_name=embedding_model
)

# Create Search Index Knowledge Source
search_index_knowledge_source = SearchIndexKnowledgeSource(
    name=search_index_ks_name,
    description="Knowledge Source based on existing index",
    search_index_parameters=SearchIndexKnowledgeSourceParameters(
        # Point to existing index
        index_name=existing_index_name,
        
        # Field mappings: Map index fields to Knowledge Source standard fields
        field_mappings=SearchIndexKnowledgeSourceFieldMappings(
            content_fields=["content", "description"],  # Text content fields (multiple allowed)
            title_field="title",                        # Title field
            url_field="url",                            # URL field
            filepath_field="filepath",                  # File path field
            vector_fields=["content_vector"]            # Vector fields
        ),
        
        # Embedding model (for query vectorization)
        embedding_model=KnowledgeSourceAzureOpenAIVectorizer(
            azure_open_ai_parameters=aoai_embedding_params
        )
    )
)

# ‚ö†Ô∏è Uncomment to create
# try:
#     index_client.create_or_update_knowledge_source(knowledge_source=search_index_knowledge_source)
#     print(f"‚úÖ Search Index Knowledge Source '{search_index_ks_name}' created successfully!")
#     print(f"   Associated index: {existing_index_name}")
# except Exception as e:
#     print(f"‚ùå Creation failed: {e}")

print("üí° Configuration is ready, uncomment the code above to create")
print("\nüìã Configuration summary:")
print("   - Knowledge Source: {search_index_ks_name}")
print("   - Associated index: {existing_index_name}")
print("   - Content fields: content, description")
print("   - Vector fields: content_vector")

### üîß Field Mapping Description

Map index fields to Knowledge Source standard fields:

| Standard Field | Description | Required | Example |
|----------------|-------------|----------|---------|
| `content_fields` | Main content fields, supports multiple | ‚úÖ Required | `["content", "chunk"]` |
| `title_field` | Document title | Optional | `"title"` |
| `url_field` | Document URL (for citations) | Optional | `"url"` |
| `filepath_field` | File path | Optional | `"filepath"` |
| `vector_fields` | Vector fields for vector search | Recommended | `["content_vector"]` |

### üí° Notes

- **Does not create new resources**: Just a wrapper for existing index, does not create Indexer or other resources
- **Field names must match**: Ensure field names in mappings exactly match index schema
- **Vector dimensions must match**: Embedding model dimensions must match vector field dimensions in index

---

<a id="create-kb"></a>
## 2Ô∏è‚É£ Create Knowledge Base

### üìñ Agentic Retrieval Field Retrieval Logic Explained

Agentic Retrieval uses different fields at different stages. Understanding this is crucial for correct configuration:

#### 1. Search Phase (`searchFields`)

| Configuration | Behavior |
|--------------|----------|
| `searchFields=[]` (default) | Searches **all `searchable` fields** in the index |
| `searchFields=["title", "content"]` | Only searches specified fields |

> Activity log `searchFields: []` indicates no restriction, i.e., searches all searchable fields

#### 2. Semantic Ranking Phase

Uses fields configured in the index's **Semantic Configuration**:
- `titleField` ‚Üí For title semantic understanding
- `contentField` ‚Üí For content semantic understanding  
- `keywordsField` ‚Üí For keyword matching

These fields are displayed in Activity log's `source_data_fields` (system auto-derived).

#### 3. Output Phase (`sourceDataFields`)

Documentation definition:
> "Used to request **additional fields** for referenced source data."

This configuration **only affects the output content of `references[].sourceData`**, not search scope!

`sourceData` ultimately contains:
- Semantic config fields (`title`, `content`, `terms`)
- Your configured `sourceDataFields` (e.g., `source`, `category`)
- System auto-added `id`

#### Retrieval Flow Diagram

```mermaid
flowchart TD
    Q[üîç User Query] --> S
    
    subgraph S[1Ô∏è‚É£ Search Phase]
        S1[searchFields configuration]
        S2[Default: all searchable fields]
    end
    
    S --> R
    
    subgraph R[2Ô∏è‚É£ Semantic Ranking Phase]
        R1[Semantic Config fields]
        R2[titleField / contentField / keywordsField]
    end
    
    R --> O
    
    subgraph O[3Ô∏è‚É£ Output Phase]
        O1[references.sourceData]
        O2[semantic fields + sourceDataFields + id]
    end
    
    O --> A[üìù Answer + References]
```

#### ‚ö†Ô∏è Common Misconceptions

**Activity shows 4 fields in `source_data_fields`, but I only configured 1?**

This is normal! Activity displays all fields used for semantic ranking, including:
- Fields auto-derived from semantic config
- Your manually configured `sourceDataFields`
- System auto-added `id`

Your configuration is in effect, the system just automatically merges all related fields.

In [None]:
from azure.search.documents.indexes.models import (
    KnowledgeBase,
    KnowledgeSourceReference,
    KnowledgeBaseAzureOpenAIModel,
    AzureOpenAIVectorizerParameters,
    KnowledgeRetrievalOutputMode,
    KnowledgeRetrievalLowReasoningEffort
)

# Knowledge Base name
kb_name = "azure-ml-knowledge-base"

# Azure OpenAI parameters
aoai_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=gpt_deployment,
    model_name=gpt_deployment
)

# Create Knowledge Base
kb = KnowledgeBase(
    name=kb_name,
    description="Knowledge base based on existing index",
    
    knowledge_sources=[
        KnowledgeSourceReference(name=search_index_ks_name)
    ],
    
    retrieval_instructions="Use this knowledge source to query indexed documents.",
    answer_instructions="Provide accurate answers based on document content and cite sources.",
    
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS,
    
    models=[
        KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_params)
    ],
    
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort()
)

# ‚ö†Ô∏è Uncomment to create
# index_client.create_or_update_knowledge_base(knowledge_base=kb)
# print(f"‚úÖ Knowledge Base '{kb_name}' created successfully!")

print("üí° Configuration is ready, uncomment the code above to create")

<a id="query-kb"></a>
## 3Ô∏è‚É£ Query Knowledge Base

In [None]:
# Test query - modify the question based on your index content
result = ask_question("What information does this knowledge base contain?", show_activity=True)

In [None]:
# Follow-up question (using conversation history)
result = ask_question("Please provide more details")

In [None]:
import json
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    SearchIndexKnowledgeSourceParams,
    KnowledgeRetrievalMediumReasoningEffort,
)

# Create Knowledge Base retrieval client
kb_client = KnowledgeBaseRetrievalClient(
    endpoint=search_endpoint,
    knowledge_base_name=kb_name,
    credential=AzureKeyCredential(search_api_key)
)

# Conversation history
messages = []


def ask_question(question: str, show_activity: bool = False, debug: bool = False, max_refs: int = None):
    """
    Send a question to the Knowledge Base and display the response.
    
    Args:
        question: The question to ask
        show_activity: If True, print detailed activity information
        debug: If True, print full reference structure
        max_refs: Maximum number of references to display (None = all)
    """
    messages.append({"role": "user", "content": question})

    req = KnowledgeBaseRetrievalRequest(
        messages=[
            KnowledgeBaseMessage(
                role=m["role"],
                content=[KnowledgeBaseMessageTextContent(text=m["content"])]
            ) for m in messages
        ],
        knowledge_source_params=[
            SearchIndexKnowledgeSourceParams(
                knowledge_source_name=search_index_ks_name,
                include_references=True,
                include_reference_source_data=True,
                always_query_source=True
            )
        ],
        include_activity=True,
        retrieval_reasoning_effort=KnowledgeRetrievalMediumReasoningEffort
    )

    result = kb_client.retrieve(retrieval_request=req)

    # Extract response
    response_parts = []
    for resp in result.response:
        for content in resp.content:
            response_parts.append(content.text)

    response_content = "\n\n".join(response_parts) if response_parts else "No response"

    # Add to conversation history
    messages.append({"role": "assistant", "content": response_content})

    # Display results
    print("=" * 60)
    print(f"Question: {question}")
    print("=" * 60)
    print(f"\nüìù Answer:\n{response_content}\n")

    # Display activity log
    if show_activity and result.activity:
        print("\n" + "=" * 60)
        print("üìã Activity Log (Query Planning)")
        print("=" * 60)
        for i, activity in enumerate(result.activity):
            activity_dict = activity.as_dict()
            print(f"\n[Activity {i}]:")
            print(json.dumps(activity_dict, indent=2, ensure_ascii=False, default=str))
        print("=" * 60)

    # Display references
    if result.references:
        refs_to_show = result.references if max_refs is None else result.references[:max_refs]
        print(f"\nüìö References (showing {len(refs_to_show)}/{len(result.references)}):")
        for i, ref in enumerate(refs_to_show):
            ref_dict = ref.as_dict()
            ref_id = ref_dict.get("id", i)
            if debug:
                print(f"\n  [Ref {ref_id}] Full structure:")
                print(json.dumps(ref_dict, indent=4, ensure_ascii=False, default=str))
            else:
                source_data = ref_dict.get("sourceData", ref_dict.get("source_data", {}))
                title = source_data.get("title", "N/A")
                source = source_data.get("source", "N/A")
                print(f"  [{ref_id}] {title}")
                print(f"        Source: {source}")

    return result


print("‚úÖ Query function is ready!")
print(f"   Knowledge Base: {kb_name}")
print(f"   Knowledge Source: {search_index_ks_name}")
print("\nUsage:")
print("  ask_question('your question')                    - Basic query")
print("  ask_question('...', show_activity=True)    - Show activity log")
print("  ask_question('...', debug=True)            - Show full reference JSON")
print("  ask_question('...', max_refs=5)            - Limit to 5 references")

<a id="cleanup"></a>
## üßπ Delete Resources (Optional)

> ‚úÖ **Safe**: Deleting Search Index Knowledge Source **does not affect** the original index.

---

<a id="list-resources"></a>
## üìã View Created Resources

List Knowledge Sources and Knowledge Bases in the current Search service.

In [None]:
# List all Knowledge Sources
print("Knowledge Sources:")
print("=" * 50)
for ks in index_client.list_knowledge_sources():
    ks_dict = ks.as_dict()
    print(f"  - {ks_dict.get('name', 'N/A')}")
    if 'searchIndexParameters' in ks_dict:
        print(f"    Associated index: {ks_dict['searchIndexParameters'].get('searchIndexName', 'N/A')}")

print()

# List all Knowledge Bases
print("Knowledge Bases:")
print("=" * 50)
for kb in index_client.list_knowledge_bases():
    kb_dict = kb.as_dict()
    print(f"  - {kb_dict.get('name', 'N/A')}")
    if 'knowledgeSources' in kb_dict:
        sources = [s.get('name', 'N/A') for s in kb_dict['knowledgeSources']]
        print(f"    Knowledge Sources: {', '.join(sources)}")

In [None]:
# Uncomment to execute deletion
# index_client.delete_knowledge_base(kb_name)
# print(f"‚úÖ Knowledge Base '{kb_name}' deleted")

# index_client.delete_knowledge_source(search_index_ks_name)
# print(f"‚úÖ Knowledge Source '{search_index_ks_name}' deleted")
# print("   ‚úÖ Original index is not affected")

print("üí° To delete resources, uncomment the code above and run")

---

## üìä Search Index KS vs Other Types Comparison

| Feature | Search Index KS | Blob KS / OneLake KS |
|---------|-----------------|----------------------|
| **Data Source** | Existing index | Raw files |
| **Auto-create Resources** | ‚ùå No | ‚úÖ Auto-creates Index + Indexer + Skillset |
| **Deletion Behavior** | ‚úÖ Only deletes KS | ‚ö†Ô∏è Cascade deletes all resources |
| **Control Level** | Full control | Managed approach |
| **Use Cases** | Existing index, gradual migration | Quick start, fully managed |

### When to use Search Index KS?

- ‚úÖ Already have indexes created traditionally
- ‚úÖ Need custom Index Schema
- ‚úÖ Need to share the same index with other systems
- ‚úÖ Don't want deleting KS to affect the index

### When to use Blob / OneLake KS?

- ‚úÖ Starting from scratch, no existing index
- ‚úÖ Want fully managed, no need to manage Indexer
- ‚úÖ Need built-in image semanticization and chunking

---

## üîó Related Notebooks

| Notebook | Description |
|----------|-------------|
| [03_knowledge_source_overview.ipynb](03_knowledge_source_overview.ipynb) | Knowledge Source Overview |
| [03b_blob_knowledge_source.ipynb](03b_blob_knowledge_source.ipynb) | Blob Knowledge Source (auto pipeline) |
| [03c_web_knowledge_source.ipynb](03c_web_knowledge_source.ipynb) | Web Knowledge Source (real-time search) |
| [03d_sharepoint_knowledge_source.ipynb](03d_sharepoint_knowledge_source.ipynb) | SharePoint Knowledge Source Overview |