# SharePoint Indexer Setup Manual

This document explains how to manually create a SharePoint Indexer for Azure AI Search.

> This is **manual configuration method**, for automatic pipeline solution see `03f_indexed_sharepoint_ks.ipynb`.

## Prerequisites ‚úÖ

**Before starting, make sure the following has been completed**:

| Requirement | Done? |
|-------------|-------|
| Created App Registration in Entra ID | ‚¨ú |
| Added `Sites.Read.All` permission (Application type) | ‚¨ú |
| Global Admin granted Admin Consent | ‚¨ú |
| Saved Client ID and Client Secret | ‚¨ú |

> üí° Permission configuration details see `03d_sharepoint_knowledge_source.ipynb`

## Main Steps üìã

1. Create Data Source (connect SharePoint)
2. Create Index (define fields)
3. Create Indexer (orchestrate synchronization)
4. Verify results

## üîß Environment Configuration

In [None]:
%load_ext dotenv
%dotenv

import os
import requests
import json

# Azure AI Search Configuration
search_endpoint = os.environ.get("AZURE_SEARCH_ENDPOINT")
search_api_key = os.environ.get("AZURE_SEARCH_API_KEY")

# SharePoint App Registration
sp_app_id = os.environ.get("SP_APP_ID")
sp_app_secret = os.environ.get("SP_APP_SECRET")
sp_tenant_id = os.environ.get("SP_TENANT_ID")

# SharePoint Site
sharepoint_site = "https://your-tenant.sharepoint.com/sites/your-site"

# Azure OpenAI Configuration
azure_openai_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
gpt_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini")

print(f"‚úÖ Azure AI Search: {search_endpoint}")
print(f"‚úÖ SharePoint Site: {sharepoint_site}")
print(f"‚úÖ App ID: {sp_app_id}")

## 1Ô∏è‚É£ Create Data Source

Create Data Source connected to SharePoint. The Data Source stores the SharePoint connection information, including:
- SharePoint tenant and site URL
- App Registration Client ID and Secret

In [None]:
# Data Source name
datasource_name = "sharepoint-ds-demo"

# Build connection string
# Format: SharePoint Online;Url=<site>;ApplicationId=<app-id>;ApplicationSecret=<secret>;TenantId=<tenant>
connection_string = f"SharePointOnlineEndpoint={sharepoint_site};ApplicationId={sp_app_id};ApplicationSecret={sp_app_secret};TenantId={sp_tenant_id}"

# Data Source definition
datasource_payload = {
    "name": datasource_name,
    "type": "sharepoint",
    "credentials": {
        "connectionString": connection_string
    },
    "container": {
        "name": "defaultSiteLibrary"  # Default document library
    }
}

# Create Data Source
headers = {
    "Content-Type": "application/json",
    "api-key": search_api_key
}

response = requests.put(
    f"{search_endpoint}/datasources/{datasource_name}?api-version=2024-11-01-preview",
    headers=headers,
    json=datasource_payload
)

if response.status_code in [200, 201]:
    print(f"‚úÖ Data Source '{datasource_name}' created successfully!")
else:
    print(f"‚ùå Creation failed: {response.status_code}")
    print(response.text)

## 2Ô∏è‚É£ Create Index

Create the Index to store indexed content. After Indexer completes execution, you can query this Index.

In [None]:
# Index name
index_name = "sharepoint-index-demo"

# Index definition
index_payload = {
    "name": index_name,
    "fields": [
        {"name": "id", "type": "Edm.String", "key": True, "searchable": False},
        {"name": "metadata_spo_item_name", "type": "Edm.String", "searchable": True, "filterable": True},
        {"name": "metadata_spo_item_path", "type": "Edm.String", "searchable": False, "filterable": True},
        {"name": "metadata_spo_item_content_type", "type": "Edm.String", "searchable": False, "filterable": True},
        {"name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "searchable": False, "filterable": True, "sortable": True},
        {"name": "metadata_spo_item_size", "type": "Edm.Int64", "searchable": False, "filterable": True},
        {"name": "content", "type": "Edm.String", "searchable": True, "analyzer": "standard.lucene"}
    ]
}

# Create Index
response = requests.put(
    f"{search_endpoint}/indexes/{index_name}?api-version=2024-11-01-preview",
    headers=headers,
    json=index_payload
)

if response.status_code in [200, 201]:
    print(f"‚úÖ Index '{index_name}' created successfully!")
else:
    print(f"‚ùå Creation failed: {response.status_code}")
    print(response.text)

## 3Ô∏è‚É£ Create Indexer

Create Indexer to link Data Source and Index, controlling synchronization schedule.

In [None]:
# Indexer name
indexer_name = "sharepoint-index-demoer"

# Indexer definition (with Schedule auto-update configuration)
indexer_payload = {
    "name": indexer_name,
    "dataSourceName": datasource_name,
    "targetIndexName": index_name,
    "parameters": {
        "configuration": {
            "indexedFileNameExtensions": ".pdf,.docx,.pptx,.xlsx,.txt,.md",
            "excludedFileNameExtensions": ".png,.jpg,.jpeg,.gif,.bmp",
            "dataToExtract": "contentAndMetadata"
        }
    },
    # üîÑ Auto-update Schedule - runs every 2 hours
    # Format: PT{n}H = n hours, PT{n}M = n minutes (minimum 5 minutes)
    "schedule": {
        "interval": "PT2H",  # Every 2 hours
        # "startTime": "2025-01-01T00:00:00Z"  # Optional: specify start time
    },
    "fieldMappings": [
        {"sourceFieldName": "metadata_spo_item_name", "targetFieldName": "metadata_spo_item_name"},
        {"sourceFieldName": "metadata_spo_item_path", "targetFieldName": "metadata_spo_item_path"},
        {"sourceFieldName": "metadata_spo_item_content_type", "targetFieldName": "metadata_spo_item_content_type"},
        {"sourceFieldName": "metadata_spo_item_last_modified", "targetFieldName": "metadata_spo_item_last_modified"},
        {"sourceFieldName": "metadata_spo_item_size", "targetFieldName": "metadata_spo_item_size"}
    ]
}

# Create Indexer
response = requests.put(
    f"{search_endpoint}/indexers/{indexer_name}?api-version=2024-11-01-preview",
    headers=headers,
    json=indexer_payload
)

if response.status_code in [200, 201]:
    print(f"‚úÖ Indexer '{indexer_name}' created successfully!")
    print("\nüìÖ Schedule configuration:")
    print(f"   Run interval: every 2 hours")
    print(f"   Incremental indexing: SharePoint built-in change detection (based on LastModified)")
    print("\nüîÑ Indexer will automatically start running, fetching documents from SharePoint...")
else:
    print(f"‚ùå Creation failed: {response.status_code}")
    print(response.text)

## 4Ô∏è‚É£ Check Indexer Status

Check if Indexer runs successfully.

In [None]:
import time

# Wait a few seconds for Indexer to start running
print("‚è≥ Waiting for Indexer to run...")
time.sleep(5)

# Get Indexer status
response = requests.get(
    f"{search_endpoint}/indexers/{indexer_name}/status?api-version=2024-11-01-preview",
    headers=headers
)

if response.status_code == 200:
    status = response.json()
    last_result = status.get("lastResult", {})
    print(f"üìä Indexer status:")
    print(f"   Status: {status.get('status', 'N/A')}")
    print(f"   Last run status: {last_result.get('status', 'N/A')}")
    print(f"   Documents indexed: {last_result.get('itemsProcessed', 0)}")
    print(f"   Documents failed: {last_result.get('itemsFailed', 0)}")
    
    if last_result.get('errors'):
        print(f"\n‚ö†Ô∏è Error messages:")
        for err in last_result['errors'][:3]:
            print(f"   - {err.get('message', 'Unknown error')}")
else:
    print(f"‚ùå Failed to get status: {response.status_code}")
    print(response.text)

### Full Status Details

## 5Ô∏è‚É£ Next Steps

After Indexer runs successfully, you can:

1. **Create Search Index Knowledge Source**: Connect this Index to Knowledge Base (see `03a_search_index_knowledge_sources.ipynb`)
2. **Use directly with Agent**: Integrate directly using Agent SDK

---

üìñ **Reference**:
- [SharePoint Indexer Documentation](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online)
- [Configure App Registration for SharePoint](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online#step-1-create-a-microsoft-entra-application)

## üßπ Cleanup

In [None]:
# Manually trigger Indexer run
response = requests.post(
    f"{search_endpoint}/indexers/{indexer_name}/run?api-version=2024-11-01-preview",
    headers=headers
)

if response.status_code == 202:
    print(f"‚úÖ Indexer '{indexer_name}' triggered!")
    print("\n‚è≥ Please wait a few minutes and then check the status...")
else:
    print(f"Status: {response.status_code}")
    print(response.text)

### Delete Indexer

In [None]:
## 5Ô∏è‚É£ Next Steps

After Indexer runs successfully, you can:

1. **Create Search Index Knowledge Source**: Connect this Index to Knowledge Base (see `03a_search_index_knowledge_sources.ipynb`)
2. **Use directly with Agent**: Integrate directly using Agent SDK

---

üìñ **Reference**:
- [SharePoint Indexer Documentation](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online)
- [Configure App Registration for SharePoint](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online#step-1-create-a-microsoft-entra-application)