# Remote SharePoint Knowledge Source

Use **Remote SharePoint Knowledge Source** to query SharePoint in real-time, returning results based on user permissions.

## üîë Key Features

| Feature | Description |
|---------|-------------|
| **Query Method** | Real-time query, no pre-indexing |
| **Permission Model** | Uses user token, respects user permissions |
| **Vector Search** | ‚ùå Not supported (uses SharePoint Search API) |
| **Latency** | Higher (real-time SharePoint calls) |
| **Cost** | ‚ö†Ô∏è Requires M365 Copilot License ($30/user/month) |

## üìã Table of Contents

| Step | Description | Jump |
|------|-------------|------|
| 0Ô∏è‚É£ Environment Config | Configure Azure AI Search, Azure OpenAI | [View](#env-config) |
| 1Ô∏è‚É£ Create Knowledge Source | Define filters and return fields | [View](#create-ks) |
| 2Ô∏è‚É£ Create Knowledge Base | Create knowledge base | [View](#create-kb) |
| 3Ô∏è‚É£ Query Knowledge Base | Pass user token for query | [View](#query-kb) |
| 4Ô∏è‚É£ View References | View returned document references | [View](#references) |
| üßπ Delete Resources | Cleanup resources (optional) | [View](#cleanup) |

---

## Comparison with Other SharePoint Solutions

| Solution | Notebook | Description | Permission Requirements |
|----------|----------|-------------|------------------------|
| **Remote SP KS** | `03g_sharepoint_remote_ks.ipynb` (this file) | Real-time query, user permissions | M365 Copilot License |
| **Indexed SP KS** | `03f_sharepoint_indexed_ks.ipynb` | Pre-indexed, auto pipeline | Global Admin + Sites.Read.All |
| **Manual Indexer** | `03e_sharepoint_indexer.ipynb` | Pre-indexed, full control | Global Admin + Sites.Read.All |

## ‚ö†Ô∏è Prerequisites

- User must have **M365 Copilot License**
- User must have **permission to access target SharePoint documents**

---

<a id="env-config"></a>
## 0Ô∏è‚É£ Environment Configuration

In [None]:
%load_ext dotenv
%dotenv

import os
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential

# Azure AI Search Configuration
search_endpoint = os.environ.get("AZURE_SEARCH_ENDPOINT")
search_api_key = os.environ.get("AZURE_SEARCH_API_KEY")

# Azure OpenAI Configuration
azure_openai_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
gpt_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini")

# Authentication
credential = DefaultAzureCredential()
index_client = SearchIndexClient(endpoint=search_endpoint, credential=AzureKeyCredential(search_api_key))

print(f"‚úÖ Azure AI Search: {search_endpoint}")
print(f"‚úÖ Azure OpenAI: {azure_openai_endpoint}")

<a id="create-ks"></a>
## 1Ô∏è‚É£ Create Remote SharePoint Knowledge Source

Remote mode **does not require pre-configuring SharePoint connection**, only needs to define filters and return fields.

The actual SharePoint access permission is determined by the **user token** passed at query time.

In [None]:
from azure.search.documents.indexes.models import (
    RemoteSharePointKnowledgeSource,
    RemoteSharePointKnowledgeSourceParameters
)

# Knowledge Source name
ks_name = "demo-sharepoint-remote-ks"

# Create Remote SharePoint Knowledge Source
remote_sp_ks = RemoteSharePointKnowledgeSource(
    name=ks_name,
    description="SharePoint Remote mode - real-time query using user permissions",
    remote_share_point_parameters=RemoteSharePointKnowledgeSourceParameters(
        # Filter expression (SharePoint Search syntax)
        # Example: only search specific file types
        filter_expression="filetype:docx OR filetype:pdf OR filetype:pptx",
        
        # Metadata fields to return
        resource_metadata=["Author", "Title", "ModifiedBy", "LastModifiedTime"],
        
        # Container type ID (optional, for SharePoint Embedded)
        container_type_id=None
    )
)

# Create Knowledge Source
index_client.create_or_update_knowledge_source(knowledge_source=remote_sp_ks)
print(f"‚úÖ Knowledge Source '{ks_name}' created successfully!")
print(f"\nüìã Configuration:")
print(f"   - Filter: docx, pdf, pptx files")
print(f"   - Return fields: Author, Title, ModifiedBy, LastModifiedTime")

<a id="create-kb"></a>
## 2Ô∏è‚É£ Create Knowledge Base

In [None]:
from azure.search.documents.indexes.models import (
    KnowledgeBase,
    KnowledgeSourceReference,
    KnowledgeBaseAzureOpenAIModel,
    AzureOpenAIVectorizerParameters,
    KnowledgeRetrievalOutputMode,
    KnowledgeRetrievalLowReasoningEffort
)

# Knowledge Base name
kb_name = "demo-sharepoint-remote-kb"

# Azure OpenAI parameters
aoai_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=gpt_deployment,
    model_name=gpt_deployment
)

# Create Knowledge Base
kb = KnowledgeBase(
    name=kb_name,
    description="SharePoint Knowledge Base - Remote mode (real-time query user-accessible documents)",
    
    knowledge_sources=[
        KnowledgeSourceReference(name=ks_name)
    ],
    
    retrieval_instructions="Use this knowledge source to query SharePoint documents the user has permission to access.",
    answer_instructions="Provide accurate answers based on SharePoint documents. If the user doesn't have access to relevant documents, please explain.",
    
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS,
    
    models=[
        KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_params)
    ],
    
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort()
)

index_client.create_or_update_knowledge_base(knowledge_base=kb)
print(f"‚úÖ Knowledge Base '{kb_name}' created successfully!")

<a id="query-kb"></a>
## 3Ô∏è‚É£ Query SharePoint Documents

**‚ö†Ô∏è Key Point**: When querying Remote SharePoint, you must pass the `x_ms_query_source_authorization` parameter,
which is the user's access token. Azure AI Search will query SharePoint on behalf of the user (On-Behalf-Of).

Users can only query documents **they have permission to access**!

In [None]:
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    RemoteSharePointKnowledgeSourceParams
)

# üîë Get user's access token
# This token will be passed to SharePoint to verify user permissions
user_token_provider = get_bearer_token_provider(credential, "https://search.azure.com/.default")
user_token = user_token_provider()

print(f"‚úÖ Got user access token")
print(f"   Token prefix: {user_token[:50]}...")

In [None]:
# Create Knowledge Base retrieval client
kb_client = KnowledgeBaseRetrievalClient(
    endpoint=search_endpoint,
    knowledge_base_name=kb_name,
    credential=credential
)

# Define query question
# üí° Tip: Remote SharePoint uses SharePoint Search API
#    - Searching file names/titles works well
#    - Searching PDF content requires SharePoint to have indexed that content first
question = "project plan"  # Modify to what you want to search

# Create retrieval request
request = KnowledgeBaseRetrievalRequest(
    include_activity=True,
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text=question)]
        )
    ],
    knowledge_source_params=[
        RemoteSharePointKnowledgeSourceParams(
            knowledge_source_name=ks_name,
            include_references=True,
            include_reference_source_data=True
        )
    ]
)

print(f"üîç Querying: {question}")
print("=" * 60)

# Execute query - üîë Key: pass user token!
result = kb_client.retrieve(
    retrieval_request=request,
    x_ms_query_source_authorization=user_token  # ‚ö†Ô∏è Must pass user token
)

# Display answer
print("\nüìù Answer:")
print("-" * 40)
for resp in result.response:
    for content in resp.content:
        print(content.text)

# Display activity log
if result.activity:
    print("\nüìä Activity log:")
    for act in result.activity:
        print(f"   {act.as_dict()}")

<a id="references"></a>
## 4Ô∏è‚É£ View Reference Sources

In [None]:
# Display reference sources
if result.references:
    print("üìö Referenced SharePoint documents:")
    print("-" * 40)
    for i, ref in enumerate(result.references, 1):
        ref_dict = ref.as_dict()
        print(f"\n  [{i}] {ref_dict.get('title', 'N/A')}")
        if 'url' in ref_dict:
            print(f"      URL: {ref_dict['url']}")
        if 'Author' in ref_dict:
            print(f"      Author: {ref_dict['Author']}")
        if 'LastModifiedTime' in ref_dict:
            print(f"      Modified: {ref_dict['LastModifiedTime']}")
else:
    print("‚ö†Ô∏è No references returned")
    print("   Possible reasons:")
    print("   1. No matching documents in SharePoint Site")
    print("   2. Current user doesn't have access permission")
    print("   3. M365 Copilot License required")

<a id="cleanup"></a>
## üßπ Delete Resources (Optional)

> ‚ÑπÔ∏è Remote SharePoint Knowledge Source **does not create any Index**, deleting KS only deletes the KS itself.

In [None]:
# Uncomment to execute deletion
# index_client.delete_knowledge_base(kb_name)
# print(f"‚úÖ Knowledge Base '{kb_name}' deleted")

# index_client.delete_knowledge_source(ks_name)
# print(f"‚úÖ Knowledge Source '{ks_name}' deleted")

print("üí° To delete resources, uncomment the code above and run")

---

## üìä Remote vs Indexed SharePoint Comparison

| Feature | Remote SharePoint | Indexed SharePoint |
|---------|-------------------|--------------------|
| **Pre-indexing** | ‚ùå Real-time query | ‚úÖ Pre-indexed |
| **Vector Search** | ‚ùå Not supported | ‚úÖ Supported |
| **User Permissions** | ‚úÖ Respects user permissions | ‚ùå All users see same content |
| **Latency** | Higher | Low |
| **Cost** | M365 Copilot License | Global Admin authorization |
| **Data Freshness** | Real-time | Depends on Indexer run frequency |

### Selection Recommendations

- **Need user-level permission control** ‚Üí Remote SharePoint
- **Need vector search/semantic understanding** ‚Üí Indexed SharePoint
- **Data updates frequently** ‚Üí Remote SharePoint
- **Need low latency** ‚Üí Indexed SharePoint