# Create Azure Search Index with Vector Search

This notebook creates an Azure AI Search index with vector search capabilities.

## Steps:
1. Load configuration from .env file
2. Check if index already exists
3. (Optional) Delete existing index
4. Create new index with vector search schema
5. Verify index was created successfully

## 1. Setup - Import utilities and load configuration

In [None]:
import sys
from pathlib import Path

# Add parent directory to path
sys.path.insert(0, str(Path().absolute().parent))

from utils import config, create_index, delete_index, get_index_info, get_index_stats

print(f"‚úÖ Configuration loaded")
print(f"   Service: {config.azure_search_service}")
print(f"   Index: {config.azure_search_index_name}")

## 2. Check Current Index Status

Check if the index already exists and view its statistics

In [None]:
# Try to get index information
print("üìä Checking if index exists...\n")
index_info = get_index_info()

if index_info:
    print("\nüìà Index statistics:")
    get_index_stats()
else:
    print("\n‚ÑπÔ∏è  Index does not exist yet")

## 3. Delete Existing Index (Optional)

‚ö†Ô∏è **Warning:** This will permanently delete the index and all its data!

Uncomment the code below to delete the existing index

In [None]:
# Uncomment to delete:
# delete_index()

## 4. Define Index Schema

**Customize this schema for your project:**
- Modify field names and types
- Change vector dimensions if using a different embedding model
- Adjust HNSW parameters for performance tuning

In [None]:
# Index schema definition
# This schema is for contract items with vector embeddings
INDEX_SCHEMA = {
    "name": config.azure_search_index_name,
    "fields": [
        {
            "name": "contract_item_id",
            "type": "Edm.String",
            "key": True,  # Primary key
            "filterable": True
        },
        {
            "name": "contract_number",
            "type": "Edm.String",
            "filterable": True,
            "searchable": True  # Enable full-text search
        },
        {
            "name": "contract_item_number",
            "type": "Edm.Int32",
            "filterable": True
        },
        {
            "name": "item_text",
            "type": "Edm.String",
            "searchable": True  # Main text field for BM25 search
        },
        {
            "name": "vendor_name",
            "type": "Edm.String",
            "filterable": True,
            "searchable": True
        },
        {
            "name": "unit_price",
            "type": "Edm.Double",
            "filterable": True,
            "sortable": True
        },
        {
            "name": "currency",
            "type": "Edm.String",
            "filterable": True
        },
        {
            "name": "contract_start",
            "type": "Edm.DateTimeOffset",
            "filterable": True,
            "sortable": True
        },
        {
            "name": "contract_end",
            "type": "Edm.DateTimeOffset",
            "filterable": True,
            "sortable": True
        },
        {
            "name": "embedding",
            "type": "Collection(Edm.Single)",
            "dimensions": 1536,  # text-embedding-ada-002 produces 1536-dim vectors
            "searchable": True,
            "retrievable": True,
            "vectorSearchProfile": "myHnswProfile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "myHnsw",
                "kind": "hnsw",
                "hnswParameters": {
                    "metric": "cosine",  # Cosine similarity for embeddings
                    "m": 12,  # Number of bi-directional links
                    "efConstruction": 400,  # Size of dynamic candidate list for construction
                    "efSearch": 100  # Size of dynamic candidate list for search
                }
            },
            {
                "name": "myExhaustive",
                "kind": "exhaustiveKnn",
                "exhaustiveKnnParameters": {
                    "metric": "cosine"
                }
            }
        ],
        "profiles": [
            {
                "name": "myHnswProfile",
                "algorithm": "myHnsw"  # Use HNSW for fast approximate search
            },
            {
                "name": "myExhaustiveProfile",
                "algorithm": "myExhaustive"  # Use exhaustive for exact search
            }
        ]
    }
}

print("‚úÖ Index schema defined")

## 5. Create the Index

In [None]:
# Create the index
success = create_index(INDEX_SCHEMA)

if success:
    print("\nüéâ Index created successfully!")
else:
    print("\n‚ùå Failed to create index. Check error messages above.")

## 6. Verify Index Creation

In [None]:
# Verify the index was created
print("‚úÖ Verifying index...\n")
get_index_info()

print("\nüìà Index statistics:")
get_index_stats()

## Next Steps

Now that the index is created, you can:
1. Open `02_embed_and_upload.ipynb` to load data and generate embeddings
2. Or use the command-line script: `python scripts/embed_and_upload.py`