# Knowledge Source Overview

**Knowledge Source** in Azure AI Search is a core component of Agentic Retrieval, defining data sources and ingestion methods.

## üì¶ Knowledge Source Types

```mermaid
flowchart TB
    subgraph KS["Knowledge Source Types"]
        direction TB
        
        subgraph Indexed["üì• Indexed Type (Index Storage)"]
            SI["SearchIndexKnowledgeSource<br/>Reference existing index"]
            Blob["AzureBlobKnowledgeSource<br/>Auto-ingest from Blob"]
            SP_I["IndexedSharePointKnowledgeSource<br/>Auto-ingest from SharePoint"]
            OL["OneLakeKnowledgeSource<br/>Auto-ingest from OneLake"]
        end
        
        subgraph Remote["üåê Remote Type (Real-time Query)"]
            SP_R["RemoteSharePointKnowledgeSource<br/>Real-time SharePoint query"]
            Web["WebKnowledgeSource<br/>Real-time Bing query"]
        end
    end
```

| Type | Description | Vector Search | User Permissions | Latency |
|------|-------------|:-------------:|:----------------:|:-------:|
| `SearchIndexKnowledgeSource` | Reference existing Search Index | ‚úÖ | ‚ùå | ‚ö° Low |
| `AzureBlobKnowledgeSource` | Auto-ingest and index from Blob | ‚úÖ | ‚ùå | ‚ö° Low |
| `IndexedSharePointKnowledgeSource` | Auto-ingest and index from SharePoint | ‚úÖ | ‚ö†Ô∏è Optional | ‚ö° Low |
| `OneLakeKnowledgeSource` | Auto-ingest and index from OneLake | ‚úÖ | ‚ùå | ‚ö° Low |
| `RemoteSharePointKnowledgeSource` | Real-time SharePoint query | ‚ùå | ‚úÖ Real-time | üê¢ High |
| `WebKnowledgeSource` | Real-time Bing search query | ‚ùå | ‚ùå | üê¢ High |

### üîê IndexedSharePointKnowledgeSource ACL Configuration

`IndexedSharePointKnowledgeSource` **supports ACL indexing**, but is **not enabled by default** (Preview feature):

```python
# Enable ACL indexing
ingestion_parameters = KnowledgeSourceIngestionParameters(
    ingestion_permission_options=["user_ids", "group_ids"],  # Enable ACL
    # ... other parameters
)
```

**Configuration requirements**:
- App permissions require `Files.Read.All` + `Sites.FullControl.All` (or `Sites.Selected`)
- Delegated permissions not supported
- ACL is only captured during initial indexing, permission changes require manual resync

**Limitations**:
- "Anyone links" or "People in your organization links" not supported
- SharePoint site groups must be resolvable to Entra Group IDs

> üí° **Recommendation**: For complete SharePoint permission model and sensitivity label support, use `RemoteSharePointKnowledgeSource` (real-time query via Copilot Retrieval API).

---

## üîë Two Modes

| Mode | Type | Auto-create Resources | Deletion Behavior | Use Case |
|------|------|:---------------------:|-------------------|----------|
| **Indexed Mode** | `AzureBlobKnowledgeSource`<br>`IndexedSharePointKnowledgeSource`<br>`OneLakeKnowledgeSource` | ‚úÖ Auto-creates<br/>Index + Indexer + Skillset + DataSource | ‚ö†Ô∏è **Cascade deletes all resources** | Quick start, fully managed |
| **Native Mode** | `SearchIndexKnowledgeSource` | ‚ùå References existing Index | ‚úÖ **Only deletes KS itself** | Full control, manual management |
| **Remote Mode** | `RemoteSharePointKnowledgeSource`<br>`WebKnowledgeSource` | ‚ùå No index, real-time query | ‚úÖ **Only deletes KS itself** | Real-time data, user permissions |

### ‚ö†Ô∏è Important Note

- **Indexed Mode**: Deleting Knowledge Source will **also delete** all auto-created resources including Index, Indexer, etc.
- **Native / Remote Mode**: Deleting Knowledge Source **does not affect** other resources

---

## üìë Table of Contents

| Notebook | Type | Description |
|----------|------|-------------|
| [03a_search_index_knowledge_sources.ipynb](./03a_search_index_knowledge_sources.ipynb) | Native | Search Index Knowledge Source |
| [03b_blob_knowledge_source.ipynb](./03b_blob_knowledge_source.ipynb) | Indexed | Azure Blob Knowledge Source |
| [03c_web_knowledge_source.ipynb](./03c_web_knowledge_source.ipynb) | Remote | Web/Bing Knowledge Source |
| [03d_sharepoint_knowledge_source.ipynb](./03d_sharepoint_knowledge_source.ipynb) | - | SharePoint Knowledge Source Overview |

---

## üìö Reference Documentation

- [Knowledge Source Overview](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-overview)
- [Search Index Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-search-index)
- [Blob Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-blob)
- [SharePoint Knowledge Sources](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-sharepoint-indexed)
- [Web Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-web)

---

## Install Dependencies

In [None]:
# Install required packages
%pip install azure-search-documents==11.7.0b2 azure-identity python-dotenv -qU

## Initialize Configuration

In [None]:
import os
from dotenv import load_dotenv
from azure.identity import AzureCliCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient

# Load environment variables
load_dotenv()

# Configuration
search_endpoint = os.getenv("AZURE_SEARCH_ENDPOINT")
search_api_key = os.getenv("AZURE_SEARCH_API_KEY")
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-ada-002")
embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
gpt_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini")

# Azure AI Search uses API Key authentication
index_client = SearchIndexClient(
    endpoint=search_endpoint,
    credential=AzureKeyCredential(search_api_key)
)

# Azure OpenAI uses Token Provider authentication (API Key is disabled)
credential = AzureCliCredential()
token_provider = get_bearer_token_provider(
    credential,
    "https://cognitiveservices.azure.com/.default"
)

print("‚úÖ Initialization successful")
print(f"   Search Endpoint: {search_endpoint}")
print(f"   OpenAI Endpoint: {azure_openai_endpoint}")

---

## üì¶ Create a Knowledge Base with Multiple Knowledge Sources

A Knowledge Base can contain multiple Knowledge Sources of different types.

Azure AI Search automatically merges results from different sources and provides unified ranking.

In [None]:
from azure.search.documents.indexes.models import (
    KnowledgeBase,
    KnowledgeSourceReference,
    KnowledgeBaseAzureOpenAIModel,
    AzureOpenAIVectorizerParameters,
    KnowledgeRetrievalOutputMode,
    KnowledgeRetrievalLowReasoningEffort
)

# Create a Knowledge Base with multiple sources
multi_source_kb = KnowledgeBase(
    name="multi-source-knowledge-base",
    description="Knowledge Base containing multiple data sources",
    
    # Reference multiple Knowledge Sources
    knowledge_sources=[
        KnowledgeSourceReference(name="my-search-index-ks"),      # Existing index
        KnowledgeSourceReference(name="my-blob-ks"),              # Blob storage
        KnowledgeSourceReference(name="my-remote-sharepoint-ks"), # SharePoint (real-time)
        KnowledgeSourceReference(name="my-web-ks")                # Web/Bing (real-time)
    ],
    
    # Retrieval instructions: Tell LLM how to select data sources
    retrieval_instructions="""
    - For internal company documents, use SharePoint Knowledge Source
    - For product documentation, use Blob Storage Knowledge Source
    - For latest technical information, use Web Knowledge Source
    - For historical data, use Search Index Knowledge Source
    """,
    
    # Answer instructions
    answer_instructions="Provide concise, accurate answers and cite information sources",
    
    # Output mode
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS,
    
    # Configure LLM
    models=[
        KnowledgeBaseAzureOpenAIModel(
            azure_open_ai_parameters=AzureOpenAIVectorizerParameters(
                resource_url=azure_openai_endpoint,
                deployment_name=gpt_deployment,
                model_name=gpt_deployment
            )
        )
    ],
    
    # Reasoning Effort
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort()
)

# Uncomment to create
# index_client.create_or_update_knowledge_base(knowledge_base=multi_source_kb)

print("üìù Multi-source Knowledge Base configuration example is ready")

---

## üîó Related Resources

- [Knowledge Source Overview](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-overview)
- [Create Search Index Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-search-index)
- [Create Blob Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-blob)
- [Create OneLake Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-onelake)
- [Create Indexed SharePoint Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-sharepoint-indexed)
- [Create Remote SharePoint Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-sharepoint-remote)
- [Create Web Knowledge Source](https://learn.microsoft.com/en-us/azure/search/agentic-knowledge-source-how-to-web)