## Getting started with  the Azure AI language service  

<p>Azure AI Language is designed to help you extract information from text. It provides functionality that you can use for:

<ul>
<li>Language detection - determining the language in which text is written.</li>
<li>Key phrase extraction - identifying important words and phrases in the text that indicate the main points.</li>
<li>Sentiment analysis - quantifying how positive or negative the text is.</li>
<li>Named entity recognition - detecting references to entities, including people, locations, time periods, organizations, and more.</li>
<li>Entity linking - identifying specific entities by providing reference links to Wikipedia articles.</li>
</ul>

## Provision an Azure AI Language resource
[1] https://learn.microsoft.com/en-us/training/modules/analyze-text-ai-language/2-provision-resource

[2] https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?tabs=multiservice%2Cwindows&pivots=azportal


1. Create a resource group 
2. Create an AI-service (multiservice resource)

## Consume the service 


All the services work in a similar manner. A laguage service is first created, which is 
accessed via an endpoint and a key.
From the client application, a TextAnalyticsClient is instantiated.
Such client provides an  API with methods such as  detect_language, analyze_sentiment and recognize_entities. The methods all take the same arguments. Yet the return (a json) differes 
depending on the action. 

Some example code here:

https://microsoftlearning.github.io/mslearn-ai-language/Instructions/Exercises/01-analyze-text.html

As an alternative, Azure provides a SDK client that wraps around the service. The client is instantiate with some required creadentials and a resource location. Then the client provides an API to detect languages, and translate among other functionality.   


## Example services  


### Detect language service 

https://learn.microsoft.com/en-us/azure/ai-services/language-service/language-detection/overview



Client: 

In [130]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
import pprint 
import pandas as pd 


In [131]:

endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]
text_analytics_client = TextAnalyticsClient(endpoint, AzureKeyCredential(key))
 

Some example data 

In [132]:
text = "This capability is useful for content stores that \
collect arbitrary text, where language is unknown. Another \
scenario could involve a chat bot. If a user starts a session \
with the chat bot, language detection can be used to determine \
which language they are using and allow you to configure your \
bot responses in the appropriate language."

documents =  [text, "Bonjour tout le monde", 'Es un dia precioso', 'voy a matarlos a todos']  

Use the client for language detection  

In [133]:
from pprint import pprint 
detectedLanguage = text_analytics_client.detect_language(documents = documents )
pprint( detectedLanguage ) 




### Detect sentiment 

In [134]:
detectedSentiment = text_analytics_client.analyze_sentiment(documents = documents )

pprint( [ (documents[n][0:40], 'sentiment', r['sentiment']) for n,r in enumerate(detectedSentiment) ] )
pprint([ r['confidence_scores'] for r in detectedSentiment ])


[('This capability is useful for content st', 'sentiment', 'neutral'),
 ('Bonjour tout le monde', 'sentiment', 'positive'),
 ('Es un dia precioso', 'sentiment', 'positive'),
 ('voy a matarlos a todos', 'sentiment', 'negative')]
[SentimentConfidenceScores(positive=0.03, neutral=0.97, negative=0.0),
 SentimentConfidenceScores(positive=0.77, neutral=0.21, negative=0.01),
 SentimentConfidenceScores(positive=0.98, neutral=0.02, negative=0.01),
 SentimentConfidenceScores(positive=0.09, neutral=0.3, negative=0.61)]


### Key phrases 

In [135]:
phrases = text_analytics_client.extract_key_phrases(documents=documents)
phrases



###  (draft)Consume multiple services in one call to the endpoint. 

Using the SDK 

Reference: 
https://learn.microsoft.com/en-us/python/api/overview/azure/ai-textanalytics-readme?view=azure-python#detect-language


In [137]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import (
    TextAnalyticsClient,
    RecognizeEntitiesAction,
    RecognizeLinkedEntitiesAction,
    RecognizePiiEntitiesAction,
    ExtractKeyPhrasesAction,
    AnalyzeSentimentAction,
)

endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

documents = [
    'We went to Contoso Steakhouse located at midtown NYC last week for a dinner party, and we adore the spot! '
    'They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) '
    'and he is super nice, coming out of the kitchen and greeted us all.'
    ,

    'We enjoyed very much dining in the place! '
    'The Sirloin steak I ordered was tender and juicy, and the place was impeccably clean. You can even pre-order from their '
    'online menu at www.contososteakhouse.com, call 312-555-0176 or send email to order@contososteakhouse.com! '
    'The only complaint I have is the food didn\'t come fast enough. Overall I highly recommend it!'
]

poller = text_analytics_client.begin_analyze_actions(
    documents,
    display_name="Sample Text Analysis",
    actions=[
        RecognizeEntitiesAction(),
        RecognizePiiEntitiesAction(),
        ExtractKeyPhrasesAction(),
        RecognizeLinkedEntitiesAction(),
        AnalyzeSentimentAction(),
    ],
)

document_results = poller.result()
for doc, action_results in zip(documents, document_results):
    print(f"\nDocument text: {doc}")
    for result in action_results:
        if result.kind == "EntityRecognition":
            print("...Results of Recognize Entities Action:")
            for entity in result.entities:
                print(f"......Entity: {entity.text}")
                print(f".........Category: {entity.category}")
                print(f".........Confidence Score: {entity.confidence_score}")
                print(f".........Offset: {entity.offset}")

        elif result.kind == "PiiEntityRecognition":
            print("...Results of Recognize PII Entities action:")
            for pii_entity in result.entities:
                print(f"......Entity: {pii_entity.text}")
                print(f".........Category: {pii_entity.category}")
                print(f".........Confidence Score: {pii_entity.confidence_score}")

        elif result.kind == "KeyPhraseExtraction":
            print("...Results of Extract Key Phrases action:")
            print(f"......Key Phrases: {result.key_phrases}")

        elif result.kind == "EntityLinking":
            print("...Results of Recognize Linked Entities action:")
            for linked_entity in result.entities:
                print(f"......Entity name: {linked_entity.name}")
                print(f".........Data source: {linked_entity.data_source}")
                print(f".........Data source language: {linked_entity.language}")
                print(
                    f".........Data source entity ID: {linked_entity.data_source_entity_id}"
                )
                print(f".........Data source URL: {linked_entity.url}")
                print(".........Document matches:")
                for match in linked_entity.matches:
                    print(f"............Match text: {match.text}")
                    print(f"............Confidence Score: {match.confidence_score}")
                    print(f"............Offset: {match.offset}")
                    print(f"............Length: {match.length}")

        elif result.kind == "SentimentAnalysis":
            print("...Results of Analyze Sentiment action:")
            print(f"......Overall sentiment: {result.sentiment}")
            print(
                f"......Scores: positive={result.confidence_scores.positive}; \
                neutral={result.confidence_scores.neutral}; \
                negative={result.confidence_scores.negative} \n"
            )

        elif result.is_error is True:
            print(
                f"...Is an error with code '{result.error.code}' and message '{result.error.message}'"
            )

    print("------------------------------------------")


Document text: We went to Contoso Steakhouse located at midtown NYC last week for a dinner party, and we adore the spot! They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) and he is super nice, coming out of the kitchen and greeted us all.
...Results of Recognize Entities Action:
......Entity: Contoso Steakhouse
.........Category: Location
.........Confidence Score: 0.99
.........Offset: 11
......Entity: midtown
.........Category: Location
.........Confidence Score: 0.52
.........Offset: 41
......Entity: NYC
.........Category: Location
.........Confidence Score: 1.0
.........Offset: 49
......Entity: last week
.........Category: DateTime
.........Confidence Score: 1.0
.........Offset: 53
......Entity: dinner party
.........Category: Event
.........Confidence Score: 0.78
.........Offset: 69
......Entity: food
.........Category: Product
.........Confidence Score: 0.57
.........Offset: 129
......Entity: chief cook


# Azure translation service 

Azure AI Translator is a cloud-based machine translation service you can use to translate text and documents with a simple REST API call.Alternatively, Azure provides a SDKL for several programming languages to interact with the cloud service via  a TextTranslationClient  

Limits:
https://learn.microsoft.com/en-us/azure/ai-services/Translator/service-limits


Using the REST API 

In [140]:
import requests, uuid, json

TRANSLATOR_ENPOINT  = os.environ["TRANSLATOR_ENPOINT"]
TRANSLATOR_LOCATION = os.environ["TRANSLATOR_LOCATION"]
TRANSLATOR_KEY      = os.environ["TRANSLATOR_KEY"]


#this is a custom wrapper around the service 
class TranslatorClient:
    
    def __init__(self, key, endpoint, location):
        self.enpoint=  endpoint
        self.key = key 
        self.location = location 

    def detect_language( self, txt ):
        path = '/detect?api-version=3.0'
        constructed_url = self.enpoint + path

        body = [{
        'text': txt
        }]
        request = requests.post(constructed_url, headers=self._get_header(), json=body)
        response = request.json()

        return response[0]['language'] 

    def _get_header( self ):
        headers = {
            'Ocp-Apim-Subscription-Key': self.key,
            # location required if you're using a multi-service or regional (not global) resource.
            'Ocp-Apim-Subscription-Region': self.location,
            'Content-type': 'application/json',
            'X-ClientTraceId': str(uuid.uuid4())
        }
          
        return headers 

    def translate( self,  to_language,  documents, from_language = None ):

        if from_language is None:
            #assume only one language and the first document should be able to pick it up 
            from_language = self.detect_language( documents[0]['text'] )

        params = {
            'api-version': '3.0',
            'from': from_language,
            'to': [ to_language]
        }

      

        # You can pass more than one object in body.
        body = documents

        response = requests.post(endpoint+'/translate', params=params, headers=self._get_header(), json=body)
        return  response



# Add your key and endpoint
key      = TRANSLATOR_KEY
endpoint = TRANSLATOR_ENPOINT 
location = TRANSLATOR_LOCATION

documents = [{'text': 'quiero manejar tu carro'},
             {'text': 'Hace mucho calor en la playa'}, 
             {'text':'estoy ladillado'},
             {'text':'estoy aburrido'}
             ]


client=  TranslatorClient( key, endpoint, location )

response = client.translate( to_language = 'en', documents = documents )

translations = [ (documents[n]['text'], item['translations'][0]['text']) for n,item in  enumerate(response.json()) ]

translations

[('quiero manejar tu carro', 'I want to drive your car'),
 ('Hace mucho calor en la playa', "It's very hot on the beach"),
 ('estoy ladillado', "I'm"),
 ('estoy aburrido', 'I am bored')]

Using the SDK 

The SDK is a wrapper around the same API service, quite similar to the one created above.

https://learn.microsoft.com/en-us/azure/ai-services/translator/quickstart-text-sdk?pivots=programming-language-python

In [143]:
from azure.ai.translation.text import TextTranslationClient#, TranslatorCredential
from azure.ai.translation.text.models import InputTextItem
from azure.core.exceptions import HttpResponseError
 
#see 
#https://learn.microsoft.com/en-us/python/api/azure-ai-translation-text/azure.ai.translation.text.texttranslationclient?view=azure-python#azure-ai-translation-text-texttranslationclient-translate


In [145]:
def get_text_translation_multiple_inputs( text_translator, input_text_elements:list[str], to_language: list[str] ):
    # [START get_text_translation_multiple_inputs]

    results = []
    if isinstance( to_language,str): to_language = [ to_language ]
    try:
        translations = text_translator.translate(body=input_text_elements, to_language=to_language)
          


        for n,translation in enumerate(translations):
            #print( translation )

            language_score = translation.detected_language.score 
            language_translation = translation.translations[0].text if translation.translations else None

            results.append( { input_text_elements[n] : { 'translation': language_translation, 'laguage_score': language_score}} ) 
            
            print(
                f"Detected languages of the input text: {translation.detected_language.language if translation.detected_language else None} with score: {translation.detected_language.score if translation.detected_language else None}."
            )
            print(
                f"Text was translated to: '{translation.translations[0].to if translation.translations else None}' and the result is: '{translation.translations[0].text if translation.translations else None}'."
            )

    except HttpResponseError as exception:
        if exception.error is not None:
            print(f"Error Code: {exception.error.code}")
            print(f"Message: {exception.error.message}")

        return None 
    # [END get_text_translation_multiple_inputs]

    return results 
  
 
key      = os.environ["TRANSLATOR_KEY"]
endpoint = os.environ["TRANSLATOR_ENPOINT"]
location = os.environ["TRANSLATOR_LOCATION"]

text_translator = TextTranslationClient(credential=AzureKeyCredential( key ), region = location )

#Each translate request is limited to 50,000 characters, across all the target languages.
input_text_elements = ['This is a test.', '--*******-----******-------', 'This is text number 5']

translation_results = get_text_translation_multiple_inputs(text_translator, input_text_elements,'en' )  
print( translation_results )

Detected languages of the input text: en with score: 1.0.
Text was translated to: 'en' and the result is: 'This is a test.'.
Detected languages of the input text: en with score: 0.0.
Text was translated to: 'en' and the result is: '--*******-----******-------'.
Detected languages of the input text: en with score: 1.0.
Text was translated to: 'en' and the result is: 'This is text number 5'.
[{'This is a test.': {'translation': 'This is a test.', 'laguage_score': 1.0}}, {'--*******-----******-------': {'translation': '--*******-----******-------', 'laguage_score': 0.0}}, {'This is text number 5': {'translation': 'This is text number 5', 'laguage_score': 1.0}}]


In [146]:
def get_text_translation_multiple_inputs( text_translator, input_text_elements:list[str], to_language: list[str] ):
    # [START get_text_translation_multiple_inputs]

    results = []
    if isinstance( to_language,str): to_language = [ to_language ]
    try:
        translations = text_translator.translate(body=input_text_elements, to_language=to_language)

        for n,translation in enumerate(translations):
            #print( translation )

            language_score = translation.detected_language.score 
            language_translation = translation.translations[0].text if translation.translations else None

            results.append( { input_text_elements[n] : { 'translation': language_translation, 'laguage_score': language_score}} ) 
            
            print(
                f"Detected languages of the input text: {translation.detected_language.language if translation.detected_language else None} with score: {translation.detected_language.score if translation.detected_language else None}."
            )
            print(
                f"Text was translated to: '{translation.translations[0].to if translation.translations else None}' and the result is: '{translation.translations[0].text if translation.translations else None}'."
            )

    except HttpResponseError as exception:
        if exception.error is not None:
            print(f"Error Code: {exception.error.code}")
            print(f"Message: {exception.error.message}")

        return None 
    # [END get_text_translation_multiple_inputs]

    return results 
  
to_language = ["en"]
input_text_elements = [
            "This is a test.",
            "Esto es una prueba.",
            "Dies ist ein Test.",
            "--*******-----******-------",
            "Este es el texto numero 5",
            "Yo soy un emoji \U0001f600 de carita feliz"     
        ]
key      = TRANSLATOR_KEY
endpoint = TRANSLATOR_ENPOINT 
location = TRANSLATOR_LOCATION

text_translator = TextTranslationClient(credential=AzureKeyCredential( key ), region = location )

#Each translate request is limited to 50,000 characters, across all the target languages.
translation_results = get_text_translation_multiple_inputs(text_translator, input_text_elements,to_language )  
print( translation_results )


Detected languages of the input text: en with score: 1.0.
Text was translated to: 'en' and the result is: 'This is a test.'.
Detected languages of the input text: es with score: 1.0.
Text was translated to: 'en' and the result is: 'This is a test.'.
Detected languages of the input text: de with score: 1.0.
Text was translated to: 'en' and the result is: 'This is a test.'.
Detected languages of the input text: en with score: 0.0.
Text was translated to: 'en' and the result is: '--*******-----******-------'.
Detected languages of the input text: es with score: 1.0.
Text was translated to: 'en' and the result is: 'This is text number 5'.
Detected languages of the input text: es with score: 1.0.
Text was translated to: 'en' and the result is: 'I'm a happy face emoji 😀'.
[{'This is a test.': {'translation': 'This is a test.', 'laguage_score': 1.0}}, {'Esto es una prueba.': {'translation': 'This is a test.', 'laguage_score': 1.0}}, {'Dies ist ein Test.': {'translation': 'This is a test.', 'l

# Translation, language detection and sentiment analysis

The former two can be done directly with the function defined above. For the sentiment analysis part, we can use the 
text analytics descried a the beginning of the notebook, and pass to it the translated documents.


In [148]:
english_documents = [ value['translation'] for x in translation_results  for key, value in x.items()  ]
english_documents

['This is a test.',
 'This is a test.',
 'This is a test.',
 '--*******-----******-------',
 'This is text number 5',
 "I'm a happy face emoji 😀"]

In [149]:
text_analytics_endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
text_analytics_key = os.environ["AZURE_LANGUAGE_KEY"]

text_analytics_client = TextAnalyticsClient(endpoint=text_analytics_endpoint,credential=AzureKeyCredential(text_analytics_key))
english_documents = [ value['translation'] for x in translation_results  for key, value in x.items()  ]


sentiment = text_analytics_client.analyze_sentiment(documents = english_documents )
print( [ (english_documents[n][0:40], 'sentiment', r['sentiment']) for n,r in enumerate(sentiment) ] )
print([ r['confidence_scores'] for r in sentiment ])



[('This is a test.', 'sentiment', 'neutral'), ('This is a test.', 'sentiment', 'neutral'), ('This is a test.', 'sentiment', 'neutral'), ('--*******-----******-------', 'sentiment', 'neutral'), ('This is text number 5', 'sentiment', 'neutral'), ("I'm a happy face emoji 😀", 'sentiment', 'positive')]
[SentimentConfidenceScores(positive=0.0, neutral=0.99, negative=0.0), SentimentConfidenceScores(positive=0.0, neutral=0.99, negative=0.0), SentimentConfidenceScores(positive=0.0, neutral=0.99, negative=0.0), SentimentConfidenceScores(positive=0.01, neutral=0.94, negative=0.05), SentimentConfidenceScores(positive=0.08, neutral=0.91, negative=0.01), SentimentConfidenceScores(positive=0.99, neutral=0.01, negative=0.0)]


In [151]:
import numpy as np 
for n,s in enumerate(sentiment):
    value = s['sentiment']
    scores = s['confidence_scores']
    score = max(scores.values()) 
    translation_results[n]['sentiment'] = value 
    translation_results[n]['sentiment_score'] = score

pprint( translation_results ) 
     

[{'This is a test.': {'laguage_score': 1.0, 'translation': 'This is a test.'},
  'sentiment': 'neutral',
  'sentiment_score': 0.99},
 {'Esto es una prueba.': {'laguage_score': 1.0,
                          'translation': 'This is a test.'},
  'sentiment': 'neutral',
  'sentiment_score': 0.99},
 {'Dies ist ein Test.': {'laguage_score': 1.0,
                         'translation': 'This is a test.'},
  'sentiment': 'neutral',
  'sentiment_score': 0.99},
 {'--*******-----******-------': {'laguage_score': 0.0,
                                  'translation': '--*******-----******-------'},
  'sentiment': 'neutral',
  'sentiment_score': 0.94},
 {'Este es el texto numero 5': {'laguage_score': 1.0,
                                'translation': 'This is text number 5'},
  'sentiment': 'neutral',
  'sentiment_score': 0.91},
 {'Yo soy un emoji 😀 de carita feliz': {'laguage_score': 1.0,
                                        'translation': "I'm a happy face emoji "
                            

# Complete application 
Using the python SDK 

Takes a list of strings in a given language. The language is detected for each and then translated to English.
The sentiment is then analyzed for the translated documents. The result is a list of results, one per document 


In [1]:
from azure.ai.translation.text import TextTranslationClient#, TranslatorCredential
from azure.ai.translation.text.models import InputTextItem
from azure.core.exceptions import HttpResponseError
 
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
import pprint 
import pandas as pd 

In [4]:
def get_text_translation_multiple_inputs( text_translator, input_text_elements:list[str], to_language: list[str] ):
    # [START get_text_translation_multiple_inputs]

    results = []
    if isinstance( to_language,str): to_language = [ to_language ]
    try:
        translations = text_translator.translate(body=input_text_elements, to_language=to_language)

        for n,translation in enumerate(translations):
            
            language_score = translation.detected_language.score 
            language_translation = translation.translations[0].text if translation.translations else None

            results.append( { input_text_elements[n] : { 'translation': language_translation, 'laguage_score': language_score}} ) 
            
            print(
                f"Detected languages of the input text: {translation.detected_language.language if translation.detected_language else None} with score: {translation.detected_language.score if translation.detected_language else None}."
            )
            print(
                f"Text was translated to: '{translation.translations[0].to if translation.translations else None}' and the result is: '{translation.translations[0].text if translation.translations else None}'."
            )

    except HttpResponseError as exception:
        if exception.error is not None:
            print(f"Error Code: {exception.error.code}")
            print(f"Message: {exception.error.message}")

        return None 
    # [END get_text_translation_multiple_inputs]

    return results 
 
 
def translation_and_sentiment( translator_client, text_analytics_client, documents, to_language = None ):
    
    if to_language is None:
        to_language = ['en']
    #is_error

    translation_results = get_text_translation_multiple_inputs( translator_client, documents, to_language )
    english_documents = [ value['translation'] for x in translation_results  for key, value in x.items()  ]

    sentiment = text_analytics_client.analyze_sentiment(documents = english_documents )
    for n,s in enumerate(sentiment):
       
        if not  s.is_error:
            value = s['sentiment']
            scores = s['confidence_scores']
            score = max(scores.values()) 
            translation_results[n]['sentiment'] = value 
            translation_results[n]['sentiment_score'] = score
        else: 
            translation_results[n]['sentiment'] = 'neutral' 
            translation_results[n]['sentiment_score'] = 0.0

    return  translation_results


key_translator      = os.environ["TRANSLATOR_KEY"]
location_translator = os.environ["TRANSLATOR_LOCATION"]
text_translator = TextTranslationClient(credential=AzureKeyCredential( key_translator ), region = location_translator )


text_analytics_endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
text_analytics_key = os.environ["AZURE_LANGUAGE_KEY"]
text_analytics_client = TextAnalyticsClient(endpoint=text_analytics_endpoint,credential=AzureKeyCredential(text_analytics_key))



documents = ['This is a test.',
 '--*******-----******-------',
 "",
 "44"
 'This is text number 5']

translation_results = translation_and_sentiment( text_translator, text_analytics_client, documents  )
translation_results

Detected languages of the input text: en with score: 1.0.
Text was translated to: 'en' and the result is: 'This is a test.'.
Detected languages of the input text: en with score: 0.0.
Text was translated to: 'en' and the result is: '--*******-----******-------'.
Detected languages of the input text: en with score: 0.0.
Text was translated to: 'en' and the result is: ''.
Detected languages of the input text: en with score: 1.0.
Text was translated to: 'en' and the result is: '44This is text number 5'.


[{'This is a test.': {'translation': 'This is a test.', 'laguage_score': 1.0},
  'sentiment': 'neutral',
  'sentiment_score': 0.99},
 {'--*******-----******-------': {'translation': '--*******-----******-------',
   'laguage_score': 0.0},
  'sentiment': 'neutral',
  'sentiment_score': 0.94},
 {'': {'translation': '', 'laguage_score': 0.0},
  'sentiment': 'neutral',
  'sentiment_score': 0.0},
 {'44This is text number 5': {'translation': '44This is text number 5',
   'laguage_score': 1.0},
  'sentiment': 'neutral',
  'sentiment_score': 0.99}]