# Azure Blob Knowledge Source

Use **Azure Blob Knowledge Source** to automatically create a complete indexer pipeline that ingests, chunks, and vectorizes documents from Blob Storage.

## üîÑ Workflow

```mermaid
flowchart LR
    subgraph Input["üì¶ Input"]
        Blob["Blob Storage<br/>PDF/Word/PPT..."]
    end
    
    subgraph Auto["‚öôÔ∏è Auto-created Pipeline"]
        DS["Data Source"]
        SK["Skillset<br/>Chunking + Image Semanticization"]
        IDX["Index<br/>Vector + Text"]
        IXR["Indexer"]
        
        DS --> IXR
        SK --> IXR
        IXR --> IDX
    end
    
    subgraph Query["üîç Query"]
        KB["Knowledge Base"]
        API["Agentic Retrieval API"]
        KB --> API
    end
    
    Blob --> DS
    IDX --> KB
```

> üí° Creating a Blob Knowledge Source **automatically creates** Data Source, Skillset, Index, and Indexer - four resources

## üìã Table of Contents

| Step | Description | Jump to |
|------|-------------|---------|
| 0Ô∏è‚É£ Install Dependencies | Install necessary Python packages | [View](#install-deps) |
| 1Ô∏è‚É£ Initialize Configuration | Configure Azure AI Search, Storage, Azure OpenAI | [View](#init-config) |
| 2Ô∏è‚É£ Create Knowledge Source | Auto-create Data Source + Skillset + Index + Indexer | [View](#step1) |
| 3Ô∏è‚É£ Create Knowledge Base | Create knowledge base | [View](#step2) |
| 4Ô∏è‚É£ Check Ingestion Status | Monitor indexing progress | [View](#step3) |
| 5Ô∏è‚É£ Execute Query | Agentic Retrieval query | [View](#step4) |
| 6Ô∏è‚É£ View Resource Details | Check auto-created resources | [View](#step5) |
| üßπ Cleanup Resources | Delete resources (optional) | [View](#step7) |

---

## üìä Feature Support

| Feature | Configuration | Description |
|---------|--------------|-------------|
| üìÑ Text Extraction | ‚úÖ Enabled by default | Supports PDF, Word, PPT, etc. |
| üî¢ Embedding | ‚úÖ Configure `embedding_model` | For vector search |
| üñºÔ∏è Image Semanticization | ‚úÖ Configure `chat_completion_model` + `disable_image_verbalization=False` | GPT-4o generates image descriptions |
| üîÑ Incremental Update | ‚úÖ Auto-supported | Only processes changed documents |
| üóëÔ∏è Soft Delete Detection | ‚öôÔ∏è Requires configuration | Enable Native Blob Soft Delete |

## üîê Authentication Methods

| Method | Connection String Format | Description |
|--------|--------------------------|-------------|
| **RBAC (Recommended)** | `ResourceId=/subscriptions/.../storageAccounts/xxx;` | Use Managed Identity |
| **Access Key** | `DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...` | Traditional key method |

## ‚ö†Ô∏è Permission Requirements

When using RBAC, Azure AI Search's Managed Identity needs:
- `Storage Blob Data Reader` - Read Blob content
- `Storage Blob Data Contributor` - If writing to Knowledge Store is required

---

<a id="install-deps"></a>

In [None]:
# Install necessary Python packages
# azure-search-documents: Azure AI Search SDK (requires version 11.7.0b2+ for Knowledge Source support)
# azure-identity: Azure authentication
# python-dotenv: Environment variable management

%pip install azure-search-documents==11.7.0b2 azure-identity python-dotenv -qU

<a id="init-config"></a>
## 0Ô∏è‚É£ Initialize Configuration

Configure Azure AI Search, Storage Account, and Azure OpenAI connection information.

In [None]:
import os
from dotenv import load_dotenv
from azure.identity import AzureCliCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient

load_dotenv()

# Azure AI Search configuration
search_endpoint = os.getenv("AZURE_SEARCH_ENDPOINT")
search_api_key = os.getenv("AZURE_SEARCH_API_KEY")

# Azure OpenAI configuration
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-ada-002")
embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
gpt_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini")

# Create SearchIndexClient
index_client = SearchIndexClient(
    endpoint=search_endpoint,
    credential=AzureKeyCredential(search_api_key)
)

print(f"‚úÖ Azure AI Search: {search_endpoint}")
print("\nüîß Azure OpenAI:")
print(f"   Endpoint: {azure_openai_endpoint}")
print(f"   Embedding: {embedding_deployment} ({embedding_model})")
print(f"   GPT Model: {gpt_deployment}")

---

<a id="step1"></a>
## 1Ô∏è‚É£ Create Blob Knowledge Source

This automatically creates Data Source + Skillset + Index + Indexer, completing the entire pipeline configuration with one operation.

### Configuration Description

| Parameter | Description |
|-----------|-------------|
| `connection_string` | Storage Account connection string (RBAC or Access Key) |
| `container_name` | Blob container name |
| `folder_path` | Optional, specify subfolder |
| `is_adls_gen2` | Whether using ADLS Gen2 (supports ACL) |
| `content_extraction_mode` | `MINIMAL` (default) or `STANDARD` |
| `disable_image_verbalization` | `False` enables image semanticization, `True` disables |

In [None]:
from azure.search.documents.indexes.models import (
    AzureBlobKnowledgeSource,
    AzureBlobKnowledgeSourceParameters,
    KnowledgeSourceIngestionParameters,
    KnowledgeSourceContentExtractionMode,
    KnowledgeBaseAzureOpenAIModel,
    KnowledgeSourceAzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters
)

# Knowledge Source name
blob_ks_name = "demo2-blob-knowledge-source"

# Storage Account configuration (using RBAC)
# ‚ö†Ô∏è Important: RBAC authentication must use "ResourceId=...;" format
# Read from environment variables, or use default example value
storage_account_resource_id = os.getenv(
    "STORAGE_ACCOUNT_RESOURCE_ID",
    "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Storage/storageAccounts/{storage-account-name}"
)
storage_connection_string = f"ResourceId={storage_account_resource_id};"
blob_container_name = os.getenv("BLOB_CONTAINER_NAME", "indexfromblob")

# Azure OpenAI parameters - GPT model (for image semanticization)
aoai_chat_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=gpt_deployment,
    model_name=gpt_deployment
)

# Azure OpenAI parameters - Embedding model
aoai_embedding_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=embedding_deployment,
    model_name=embedding_model
)

# Create Blob Knowledge Source
blob_knowledge_source = AzureBlobKnowledgeSource(
    name=blob_ks_name,
    description="Knowledge source that auto-ingests, chunks, and vectorizes from Blob Storage",
    azure_blob_parameters=AzureBlobKnowledgeSourceParameters(
        # Blob Storage connection
        connection_string=storage_connection_string,
        container_name=blob_container_name,
        folder_path=None,  # Optional: specify subfolder
        is_adls_gen2=False,  # Set to True if using ADLS Gen2
        
        # Data import parameters
        ingestion_parameters=KnowledgeSourceIngestionParameters(
            # Content extraction mode
            content_extraction_mode=KnowledgeSourceContentExtractionMode.MINIMAL,
            
            # ‚úÖ Enable image semanticization (GPT-4o generates image descriptions)
            disable_image_verbalization=False,
            chat_completion_model=KnowledgeBaseAzureOpenAIModel(
                azure_open_ai_parameters=aoai_chat_params
            ),
            
            # ‚úÖ Embedding model for vector search
            embedding_model=KnowledgeSourceAzureOpenAIVectorizer(
                azure_open_ai_parameters=aoai_embedding_params
            )
        )
    )
)

try:
    result = index_client.create_or_update_knowledge_source(knowledge_source=blob_knowledge_source)
    print(f"‚úÖ Blob Knowledge Source '{blob_ks_name}' created successfully!")
    print("   Type: azureBlob")
    
    print("\nüîß Enabled features:")
    print("   - Image semanticization: ‚úÖ Enabled")
    print(f"   - Embedding: ‚úÖ {embedding_deployment}")
    
    print("\nüìã Auto-created resources:")
    if hasattr(result, 'azure_blob_parameters') and result.azure_blob_parameters:
        params = result.azure_blob_parameters
        cr = getattr(params, 'created_resources', None)
        if cr:
            # Compatible with both dict and object return formats
            if isinstance(cr, dict):
                print(f"   - Data Source: {cr.get('datasource', 'N/A')}")
                print(f"   - Index: {cr.get('index', 'N/A')}")
                print(f"   - Skillset: {cr.get('skillset', 'N/A')}")
                print(f"   - Indexer: {cr.get('indexer', 'N/A')}")
            else:
                print(f"   - Data Source: {getattr(cr, 'datasource', 'N/A')}")
                print(f"   - Index: {getattr(cr, 'index', 'N/A')}")
                print(f"   - Skillset: {getattr(cr, 'skillset', 'N/A')}")
                print(f"   - Indexer: {getattr(cr, 'indexer', 'N/A')}")
except Exception as e:
    print(f"‚ùå Creation failed: {e}")

---

<a id="step2"></a>
## 2Ô∏è‚É£ Create Knowledge Base

Knowledge Base is the entry point for queries and can be associated with one or more Knowledge Sources.

In [None]:
from azure.search.documents.indexes.models import (
    KnowledgeBase,
    KnowledgeSourceReference,
    KnowledgeBaseAzureOpenAIModel,
    AzureOpenAIVectorizerParameters,
    KnowledgeRetrievalOutputMode,
    KnowledgeRetrievalLowReasoningEffort
)

# Knowledge Base name
blob_kb_name = "demo2-blob-knowledge-base"

# Azure OpenAI parameters (for answer generation)
aoai_kb_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=gpt_deployment,
    model_name=gpt_deployment
)

# Create Knowledge Base
blob_knowledge_base = KnowledgeBase(
    name=blob_kb_name,
    description="Knowledge base based on Blob Storage documents",
    
    # Associate Knowledge Source
    knowledge_sources=[KnowledgeSourceReference(name=blob_ks_name)],
    
    # Retrieval and answer instructions
    retrieval_instructions="Use this knowledge source to answer questions about stored documents",
    answer_instructions="Based on retrieved document content, provide accurate and detailed answers with citations",
    
    # Output mode: answer synthesis
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS,
    
    # GPT model configuration
    models=[KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_kb_params)],
    
    # Reasoning effort level
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort()
)

try:
    index_client.create_or_update_knowledge_base(knowledge_base=blob_knowledge_base)
    print(f"‚úÖ Knowledge Base '{blob_kb_name}' created successfully!")
    print(f"   Associated Knowledge Source: {blob_ks_name}")
except Exception as e:
    print(f"‚ùå Creation failed: {e}")

---

<a id="step3"></a>
## 3Ô∏è‚É£ Check Ingestion Status

> ‚ö†Ô∏è Initial ingestion may take several minutes to tens of minutes, depending on document count and size

### Indexer Status Description

| Status | Description |
|--------|-------------|
| `inProgress` | Running |
| `success` | Completed successfully |
| `transientFailure` | Temporary failure (will auto-retry) |
| `persistentFailure` | Permanent failure (check configuration) |

In [None]:
import requests
import json

def check_indexer_status(search_endpoint, api_key, indexer_name):
    """Get Indexer run status"""
    endpoint = f"{search_endpoint}/indexers/{indexer_name}/status"
    params = {"api-version": "2025-11-01-preview"}
    headers = {"api-key": api_key}
    response = requests.get(endpoint, params=params, headers=headers)
    return response.json()

# Indexer name = Knowledge Source name + "-indexer"
indexer_name = f"{blob_ks_name}-indexer"
indexer_status = check_indexer_status(search_endpoint, search_api_key, indexer_name)

print(f"üîÑ Indexer '{indexer_name}' status:")
print(f"   Overall status: {indexer_status.get('status', 'N/A')}")

if "lastResult" in indexer_status:
    last_result = indexer_status["lastResult"]
    print("\nüìã Last execution:")
    print(f"   Status: {last_result.get('status', 'N/A')}")
    print(f"   Start time: {last_result.get('startTime', 'N/A')}")
    print(f"   End time: {last_result.get('endTime', 'N/A')}")
    print(f"   ‚úÖ Successfully processed: {last_result.get('itemsProcessed', 0)} documents")
    print(f"   ‚ùå Failed to process: {last_result.get('itemsFailed', 0)} documents")
    
    # Display errors
    if last_result.get('errors'):
        print(f"\n‚ö†Ô∏è Errors ({len(last_result['errors'])} items):")
        for err in last_result['errors'][:3]:
            print(f"   - {err.get('message', 'Unknown')[:100]}")
else:
    print("   Not yet run")

---

<a id="step4"></a>
## 4Ô∏è‚É£ Execute Query

Use Agentic Retrieval API to query the knowledge base, getting synthesized answers and citation sources.

In [None]:
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    AzureBlobKnowledgeSourceParams
)

# Create Knowledge Base client
blob_kb_client = KnowledgeBaseRetrievalClient(
    endpoint=search_endpoint,
    knowledge_base_name=blob_kb_name,
    credential=AzureKeyCredential(search_api_key)
)

# Query question
question = "Show me eval metrics of TabFM"

# Build query request
request = KnowledgeBaseRetrievalRequest(
    include_activity=True,  # Include activity log
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text=question)]
        )
    ],
    knowledge_source_params=[
        AzureBlobKnowledgeSourceParams(
            knowledge_source_name=blob_ks_name,
            include_references=True,        # Include references
            include_reference_source_data=True  # Include reference source data
        )
    ]
)

print(f"üîç Query: {question}")
print("=" * 60)

result = blob_kb_client.retrieve(retrieval_request=request)

print("\nüìù Answer:")
print("-" * 40)
for resp in result.response:
    for content in resp.content:
        print(content.text)

# Display references
if result.references:
    print("\nüìö Reference sources:")
    for i, ref in enumerate(result.references, 1):
        ref_dict = ref.as_dict()
        blob_url = ref_dict.get('blob_url', 'N/A')
        print(f"  [{i}] {blob_url}")

---

<a id="step5"></a>
## 5Ô∏è‚É£ View Activity Log and Reference Details

Activity log shows detailed information during the query process, including retrieval, re-ranking, and other steps.

In [None]:
import json

if result.activity:
    print("üìä Activity log:")
    print("=" * 60)
    for i, activity in enumerate(result.activity, 1):
        act = activity.as_dict()
        print(f"\nüîπ Step {i}: {act.get('type', 'N/A')}")
        # Print full content
        for key, value in act.items():
            if key != 'type' and value is not None:
                print(f"   {key}: {value}")

if result.references:
    print("\n" + "=" * 60)
    print("üîó Reference sources:")
    print("-" * 60)
    for i, ref in enumerate(result.references, 1):
        ref_dict = ref.as_dict()
        print(f"\n  [{i}] {ref_dict.get('blob_url', 'N/A')}")
        for key, value in ref_dict.items():
            if key != 'blob_url' and value is not None:
                if isinstance(value, str) and len(value) > 300:
                    print(f"      {key}: {value[:300]}...")
                else:
                    print(f"      {key}: {value}")

---

<a id="step6"></a>
## 6Ô∏è‚É£ View Auto-created Resources

Knowledge Source automatically creates the following resources:

| Resource | Naming Convention | Description |
|----------|-------------------|-------------|
| Data Source | `{ks_name}-datasource` | Connects to Blob Storage |
| Index | `{ks_name}-index` | Stores document chunks and vectors |
| Skillset | `{ks_name}-skillset` | Document processing pipeline |
| Indexer | `{ks_name}-indexer` | Executes data import |

In [None]:
import requests

def get_knowledge_source_definition(search_endpoint, api_key, ks_name):
    endpoint = f"{search_endpoint}/knowledgesources/{ks_name}"
    params = {"api-version": "2025-11-01-preview"}
    headers = {"api-key": api_key}
    response = requests.get(endpoint, params=params, headers=headers)
    return response.json()

ks_definition = get_knowledge_source_definition(search_endpoint, search_api_key, blob_ks_name)

if "azureBlobParameters" in ks_definition:
    blob_params = ks_definition["azureBlobParameters"]
    if "createdResources" in blob_params:
        created = blob_params["createdResources"]
        print("üîß Auto-created resources:")
        print(f"   Data Source: {created.get('datasource', 'N/A')}")
        print(f"   Indexer: {created.get('indexer', 'N/A')}")
        print(f"   Skillset: {created.get('skillset', 'N/A')}")
        print(f"   Index: {created.get('index', 'N/A')}")

### 6a. View Skillset (Skills Configuration)

Skillset defines the document processing pipeline, including text chunking, image semanticization, embedding, etc.

In [None]:
def get_skillset_definition(search_endpoint, api_key, skillset_name):
    endpoint = f"{search_endpoint}/skillsets/{skillset_name}"
    params = {"api-version": "2025-11-01-preview"}
    headers = {"api-key": api_key}
    response = requests.get(endpoint, params=params, headers=headers)
    return response.json()

skillset_name = f"{blob_ks_name}-skillset"
skillset = get_skillset_definition(search_endpoint, search_api_key, skillset_name)

if "skills" in skillset:
    print(f"üîß Skillset contains {len(skillset['skills'])} skills:")
    for i, skill in enumerate(skillset.get("skills", []), 1):
        skill_type = skill.get("@odata.type", "Unknown")
        skill_name = skill.get("name", "N/A")
        print(f"  [{i}] {skill_name} ({skill_type})")

### 6b. Incremental Processing and Soft Delete Strategy

#### üîÑ Incremental Processing Mechanism

| Scenario | Indexer Behavior |
|----------|------------------|
| File unchanged | ‚è≠Ô∏è Skip |
| File content modified | üîÑ Re-process (Upsert) |
| New file | ‚ûï Process and add |
| File deleted | ‚ùì Depends on soft delete strategy |

#### üóëÔ∏è Recommendation: Enable Native Blob Soft Delete

After enabling soft delete in Storage Account, Indexer can automatically detect and delete corresponding documents in the index.

In [None]:
def get_datasource_definition(search_endpoint, api_key, datasource_name):
    endpoint = f"{search_endpoint}/datasources/{datasource_name}"
    params = {"api-version": "2025-11-01-preview"}
    headers = {"api-key": api_key}
    response = requests.get(endpoint, params=params, headers=headers)
    return response.json()

datasource_name = f"{blob_ks_name}-datasource"
datasource = get_datasource_definition(search_endpoint, search_api_key, datasource_name)

print("üóëÔ∏è Deletion detection strategy:")
if "dataDeletionDetectionPolicy" in datasource and datasource["dataDeletionDetectionPolicy"]:
    print(f"   ‚úÖ Configured: {datasource['dataDeletionDetectionPolicy'].get('@odata.type')}")
else:
    print("   ‚ö†Ô∏è Not configured! Recommend enabling Native Blob Soft Delete")

---

<a id="step7"></a>
## üßπ Cleanup Resources (Optional)

> ‚ö†Ô∏è Deleting Knowledge Source will also delete all auto-created resources (Data Source, Index, Skillset, Indexer)!

In [None]:
# ‚ö†Ô∏è Deleting Knowledge Source will also delete all auto-created resources!

# Uncomment to execute deletion
# index_client.delete_knowledge_base(blob_kb_name)
# print(f"‚úÖ Knowledge Base '{blob_kb_name}' deleted")

# index_client.delete_knowledge_source(blob_ks_name)
# print(f"‚úÖ Knowledge Source '{blob_ks_name}' deleted")
# print("   Auto-created Data Source, Index, Skillset, Indexer are also deleted")

print("üí° To delete resources, uncomment the code above and run")
print(f"\nüìã Resources to be deleted:")
print(f"   - Knowledge Base: {blob_kb_name}")
print(f"   - Knowledge Source: {blob_ks_name}")
print(f"   - And its auto-created Data Source, Index, Skillset, Indexer")

---

## üì∑ Appendix: Image Processing Mechanism Explained

### üîÑ Processing Flow

```
Blob Storage ‚Üí Document Cracking ‚Üí normalized_images ‚Üí GenAI Prompt Skill ‚Üí Text Description ‚Üí Embedding ‚Üí Index
```

### üì¶ normalized_images Structure

| Field | Description |
|-------|-------------|
| `data` | BASE64 encoded JPEG image |
| `pageNumber` | PDF page number (starting from 1) |
| `boundingPolygon` | Image bounding box coordinates on the page |
| `width` / `height` | Normalized image dimensions |

### ü§ñ Image Description Generation (Image Verbalization)

Generate image descriptions via **GenAI Prompt Skill** calling GPT-4o:

1. Extract images from PDF/documents
2. Send images to GPT-4o
3. GPT-4o returns text descriptions of images
4. Descriptions are stored as independent chunks in the index

### üìç Key Conclusions

> üí° Image descriptions are indexed as **complete semantic units** and will not be cut by Text Split!
> 
> Images and text are **stored side by side** as independent chunks

### ‚ö†Ô∏è Rate Limit Note

If documents contain many images, GPT-4o may trigger Rate Limit. Solutions:
- Set `disable_image_verbalization=True` to disable image semanticization
- Increase Azure OpenAI TPM quota

---

## ‚ö†Ô∏è Appendix: Knowledge Source API Index Schema Limitations

**Index auto-created by Knowledge Source API has only 6 fixed fields**:

| Field | Type | Description |
|-------|------|-------------|
| `uid` | String (Key) | Unique identifier |
| `snippet_parent_id` | String | Parent document ID for text chunk |
| `blob_url` | String | Source document URL (**not** image URL) |
| `snippet` | String | Text content or image description |
| `image_snippet_parent_id` | String | Parent document ID for image chunk |
| `snippet_vector` | Collection(Single) | 1536-dimension vector |

### ‚úÖ Can Retrieve vs ‚ùå Cannot Retrieve

```
‚úÖ Can Retrieve                      ‚ùå Cannot Retrieve
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ snippet (image text     ‚îÇ       ‚îÇ Original image Base64/URL‚îÇ
‚îÇ   description)          ‚îÇ       ‚îÇ Page number in PDF       ‚îÇ
‚îÇ blob_url (source doc URL)‚îÇ      ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       
```

---

## üîß Appendix: What If You Need More Control?

### Solution Comparison

| Aspect | Knowledge Source API | Traditional Indexer + Skillset |
|--------|---------------------|-------------------------------|
| **Code Amount** | 1 line to create | Need to configure 4+ components |
| **Index Schema** | Fixed 6 fields | Fully customizable |
| **Location Metadata** | ‚ùå Not saved | ‚úÖ Configurable |
| **Original Images** | ‚ùå Not saved | ‚úÖ Can export |
| **Use Cases** | Rapid prototyping, simple apps | Production systems, complex requirements |

### If You Need Original Images or Location Information

**Solution 1: Trace Back via blob_url (Recommended)**
```python
# Get source document URL from search results, use Document Intelligence to re-parse
from azure.ai.documentintelligence import DocumentIntelligenceClient
result = client.begin_analyze_document("prebuilt-layout", {"urlSource": blob_url}).result()
```

**Solution 2: Use Traditional Indexer + Custom Index Schema**

Manually create Index/Skillset for full field control. Reference `03_multimodal_search.ipynb`.

**Solution 3: Use Knowledge Store to Export Original Images**

Configure Knowledge Store projection to Blob Storage in Skillset.

---

<a id="image-alternatives"></a>
## üîß What If You Need Original Images or Location Information?

### Solution 1: Trace Back via blob_url (Recommended)

```python
# Get source document URL from search results
blob_url = search_result["blob_url"]

# Use Document Intelligence API to re-parse
from azure.ai.documentintelligence import DocumentIntelligenceClient
client = DocumentIntelligenceClient(endpoint=di_endpoint, credential=credential)
result = client.begin_analyze_document("prebuilt-layout", {"urlSource": blob_url}).result()
```

### Solution 2: Use Traditional Indexer + Custom Index Schema

Manually create Index/Skillset for full field control

### Solution 3: Use Knowledge Store to Export Original Images

Configure Knowledge Store projection to Blob Storage in Skillset

---

### üìä Knowledge Source API vs Traditional Approach Comparison

| Aspect | Knowledge Source API | Traditional Indexer + Skillset |
|--------|---------------------|-------------------------------|
| **Code Amount** | 1 line to create | Need to configure 4+ components |
| **Index Schema** | Fixed 6 fields | Fully customizable |
| **Location Metadata** | ‚ùå Not saved | ‚úÖ Configurable |
| **Original Images** | ‚ùå Not saved | ‚úÖ Can export |
| **Use Cases** | Rapid prototyping | Production systems |