# Document Classification with Azure OpenAI and Cognitive Search Vector
This code demonstrates how to use Azure Cognitive Search with OpenAI and Azure Python SDK to classify documents

## Prerequisites
1. To run the code, install the following packages. Please use the latest pre-release version `pip install azure-search-documents==11.4.0b6`.

2. The [100.0 CreateKnowledgeBaseWithCogSearch notebook](100.0%20CreateKnowledgeBaseWithCogSearch.ipynb) has been run first

- > ! pip install azure-search-documents==11.4.0b6
- > ! pip install openai

## Load all the AOAI API keys and model parameters

In [1]:
import aoai

MY_AOAI_ENDPOINT = 'https://tr-non-prod-gpt4.openai.azure.com/'
MY_AOAI_VERSION = '2023-07-01-preview'
MY_GPT_ENGINE = 'tr-gpt4'
MY_AOAI_EMBEDDING_ENGINE = 'tr-embedding-ada'

status = aoai.setupOpenai(aoai_endpoint=MY_AOAI_ENDPOINT, 
                 aoai_version=MY_AOAI_VERSION)
if status > 0:
    print("AOAI setup succeeded")
else:
    print("AOAI setup failed")


Got OPENAI API Key from environment variable
AOAI setup succeeded


## Authenticate to Azure Cognitive Search and connect

In [2]:
import cog_search

cogSearchCredential = cog_search.getCogSearchCredential()


Got Azure Cognitive Search ADMIN API Key from environment variable


In [7]:
from azure.search.documents.indexes import SearchIndexClient

# Create a search index
MY_COG_SEARCH_ENDPOINT = 'https://tr-docai-cog-search.search.windows.net'
MY_COG_SEEARCH_INDEX_NAME = 'sample-azure-service-docs-index'

## Import all cognitive search packages

In [8]:
from azure.search.documents import SearchClient  
from azure.search.documents.models import Vector  
from azure.search.documents.indexes.models import (  
    SearchIndex,  
    SearchField,  
    SearchFieldDataType,  
    SimpleField,  
    SearchableField,  
    SearchIndex,  
    SemanticConfiguration,  
    PrioritizedFields,  
    SemanticField,  
    SearchField,  
    SemanticSettings,  
    VectorSearch,  
    VectorSearchAlgorithmConfiguration,  
)  

# Find the Category of input based on different search types against Azure Cognitive Service

## Perform a Vector Similarity Search

In [9]:
# Pure Vector Search multi-lingual (e.g 'tools for software development' in Dutch)  
query = "tools voor softwareontwikkeling"  
  
search_client = SearchClient(
                    endpoint=MY_COG_SEARCH_ENDPOINT, 
                    index_name=MY_COG_SEEARCH_INDEX_NAME, 
                    credential=cogSearchCredential)

results = search_client.search(  
    search_text=None,  
    vector=aoai.generate_embedding(
                    the_engine=MY_AOAI_EMBEDDING_ENGINE,
                    the_text=query), 
    top_k=1,  
    vector_fields="contentVector",
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Query: {query}")
    print(f"Category: {result['category']}")  
    print(f"Score: {result['@search.score']}")  

Query: tools voor softwareontwikkeling
Category: Developer Tools
Score: 0.80402017


## Perform a Semantic Hybrid Search

<font color=red>Make sure Semantic Search is enabled for this Azure Cognitive Search instance</font>

In [10]:
# Semantic Hybrid Search
# note the incorrect spelling here - 'sarch'
query = "what is azure sarch?"

search_client = SearchClient(
                    endpoint=MY_COG_SEARCH_ENDPOINT, 
                    index_name=MY_COG_SEEARCH_INDEX_NAME, 
                    credential=cogSearchCredential)

results = search_client.search(
    search_text=query,
    vector=aoai.generate_embedding(
                    the_engine=MY_AOAI_EMBEDDING_ENGINE,
                    the_text=query), 
    top_k=1,  
    vector_fields="contentVector",
    select=["title", "content", "category"],
    query_type="semantic", 
    query_language="en-us", 
    semantic_configuration_name='my-semantic-config', 
    query_caption="extractive", 
    query_answer="extractive",
    top=1
)

#TODO Get the Category name instead of the key
semantic_answers = results.get_answers()
for answer in semantic_answers:
    print(f"Query: {query}")
    print(f"Category: {answer.key}")
    print(f"Semantic Answer Score: {answer.score}")

Query: what is azure sarch?
Category: 40
Semantic Answer Score: 0.9462890625


## Perform a Cross-Field Vector Search

In [11]:
# Cross-Field Vector Search
query = "tools for software development"  
  
search_client = SearchClient(
                    endpoint=MY_COG_SEARCH_ENDPOINT, 
                    index_name=MY_COG_SEEARCH_INDEX_NAME, 
                    credential=cogSearchCredential)
  
results = search_client.search(  
    search_text=None,  
    vector=aoai.generate_embedding(
                    the_engine=MY_AOAI_EMBEDDING_ENGINE,
                    the_text=query), 
    top_k=1,  
    vector_fields="titleVector, contentVector",
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Query: {query}")
    print(f"Category: {result['category']}")  
    print(f"Score: {result['@search.score']}")  

Query: tools for software development
Category: Developer Tools
Score: 0.03333333507180214


## Perform a Pure Vector Search with a filter

In [12]:
# Pure Vector Search with Filter
query = "tools for software development"  
  
search_client = SearchClient(
                    endpoint=MY_COG_SEARCH_ENDPOINT, 
                    index_name=MY_COG_SEEARCH_INDEX_NAME, 
                    credential=cogSearchCredential)
  

results = search_client.search(  
    search_text=None,  
    vector=aoai.generate_embedding(
                    the_engine=MY_AOAI_EMBEDDING_ENGINE,
                    the_text=query), 
    top_k=1,  
    vector_fields="contentVector",
    filter="category eq 'Developer Tools'",
    select=["title", "content", "category"]
)  
  
for result in results:  
    print(f"Query: {query}")
    print(f"Category: {result['category']}")  
    print(f"Score: {result['@search.score']}")  

Query: tools for software development
Category: Developer Tools
Score: 0.82971567


## Perform a Hybrid Search

In [13]:
# Hybrid Search
query = "scalable storage solution"  
  
search_client = SearchClient(
                    endpoint=MY_COG_SEARCH_ENDPOINT, 
                    index_name=MY_COG_SEEARCH_INDEX_NAME, 
                    credential=cogSearchCredential)
  
results = search_client.search(  
    search_text=query,  
    vector=aoai.generate_embedding(
                    the_engine=MY_AOAI_EMBEDDING_ENGINE,
                    the_text=query), 
    top_k=1,  
    vector_fields="contentVector",
    select=["title", "content", "category"],
    top=1
)  
  
for result in results:  
    print(f"Query: {query}")
    print(f"Category: {result['category']}")  
    print(f"Score: {result['@search.score']}")  

Query: scalable storage solution
Category: Storage
Score: 0.03205128386616707
