## Text Analytics and Translator

**Text Analytics** is an Azure AI service that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features.

This tutorial demonstrates using text analytics in Fabric with SynapseML to:

- Detect sentiment labels at the sentence or document level.
- Identify the language for a given text input.
- Extract key phrases from a text.
- Identify different entities in text and categorize them into predefined classes or types.


In [None]:
import synapse.ml.core
from synapse.ml.cognitive.language import AnalyzeText
from pyspark.sql.functions import col

## Sentiment analysis

**Sentiment Analysis** provides a way to detect sentiment labels (such as **"negative"**, **"neutral"**, and **"positive"**) and confidence scores at both the sentence and document levels.

This feature returns confidence scores between **0 and 1** for each document and sentences within it, indicating the strength of **positive**, **neutral**, and **negative** sentiments.


In [None]:
df = spark.createDataFrame([
    ("Great atmosphere. Close to plenty of restaurants, hotels, and transit! Staff are friendly and helpful.",),
    ("What a sad story!",)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("SentimentAnalysis")
        .setOutputCol("response"))



In [None]:
result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("sentiment", col("documents.sentiment"))

result_df = result.select("text", "sentiment")

**Your output should look something like below:**
| text                                                                                                   | sentiment   |
|:-------------------------------------------------------------------------------------------------------|:------------|
| Great atmosphere. Close to plenty of restaurants, hotels, and transit! Staff are friendly and helpful. | positive    |
| What a sad story!                                                                                      | negative    |


In [None]:
display(result_df)

## Language detector

In [None]:
df = spark.createDataFrame([
    (["Hello world"],),
    (["Bonjour tout le monde", "Hola mundo", "Tumhara naam kya hai?"],),
    (["你好"],),
    (["日本国（にほんこく、にっぽんこく、英"],)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("LanguageDetection")
        .setOutputCol("response"))

result_ld = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("detectedLanguage", col("documents.detectedLanguage.name"))

result_ld_df = result_ld.select("text", "detectedLanguage")

**Your output should look something like below:**

| text                                                                                                    | detectedLanguage        |
|:--------------------------------------------------------------------------------------------------------|:-------------------------|
| ["Hello world"]                                                                                       | ["English"]             |
| ["Bonjour tout le monde","Hola mundo","Tumhara naam kya hai?"]                                       | ["French","Spanish","Hindi"] |
| ["你好"]                                                                                               | ["Chinese_Simplified"]  |
| ["日本国（にほんこく、にっぽんこく、にっぽん）"]                                                            | ["Japanese"]            |


In [None]:
display(result_ld_df)

## Key Phrase Extractor

The **Key Phrase Extraction** evaluates unstructured text and returns a list of key phrases. This capability is useful if you need to quickly identify the main points in a collection of documents.

For information on the supported languages for key phrase extraction, refer to the list of enabled languages.


In [None]:
df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Text Analytics is one of the Azure Cognitive Services."),
    ("en", "My cat might need to see a veterinarian.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("KeyPhraseExtraction")
        .setOutputCol("response"))

result_kpe = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("keyPhrases", col("documents.keyPhrases"))



In [None]:
result_kpe_df = result_kpe .select("text", "keyPhrases")

**Your output should look something like below:**

| text                                                                                             | keyPhrases                            |
|:-------------------------------------------------------------------------------------------------|:--------------------------------------|
| Microsoft was founded by Bill Gates and Paul Allen.                                            | [Bill Gates, Paul Allen, Microsoft]  |
| Text Analytics is one of the Azure Cognitive Services that provides advanced text analysis.    | [Azure Cognitive Services, Text Analytics] |
| My cat might need to see a veterinarian.                                                         | [cat, veterinarian]                  |


In [None]:
display(result_kpe_df)

## Named Entity Recognition (NER)

**Named Entity Recognition (NER)** is the process of identifying and categorizing entities in text into predefined classes or types. These classes include:

- **Person**
- **Location**
- **Event**
- **Product**
- **Organization**

For information on supported languages for NER, refer to the NER language support documentation.


In [None]:
df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Pike place market is my favorite Seattle attraction.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("EntityRecognition")
        .setOutputCol("response"))

result_ner = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("entityNames", col("documents.entities.text"))

result_ner_df = result_ner.select("text", "entityNames")

**Your output should look something like below:**

| text                                                                                                    | entityNames                      |
|:--------------------------------------------------------------------------------------------------------|:---------------------------------|
| Microsoft was founded by Bill Gates and Paul Allen.                                                   | [Microsoft, Bill Gates, Paul Allen] |
| Pike Place Market is my favorite Seattle attraction.                                                   | [Pike Place Market, Seattle]      |


In [None]:
display(result_ner_df)

## Entity Linking

**Entity Linking** identifies and disambiguates the entities mentioned in text. It links these entities to additional information, such as Wikipedia entries. For example, in the sentence:

*“We went to Seattle last week.”*

the word **"Seattle"** would be recognized and linked to more detailed information on Wikipedia.

For information on supported languages for entity linking, refer to the documentation on supported languages.


In [None]:
df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Pike place market is my favorite Seattle attraction.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("EntityLinking")
        .setOutputCol("response"))

result_el_df = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("entityNames", col("documents.entities.name"))



**Your output should look something like below:**

| language | text                                                                                                    | AnalyzeText_57d8fa54bd4a_error | response                                                                                                                                                                                                                                                                                | documents                                                                                                                                                                                                                                                                            | entityNames                     |
|:---------|:--------------------------------------------------------------------------------------------------------|:-------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
| en       | Microsoft was founded by Bill Gates and Paul Allen.                                                   |                               | {"documents":{"id":"0","entities":[{"name":"Microsoft","url":"https://en.wikipedia.org/wiki/Microsoft","dataSource":"Wikipedia","bingId":"a093e9b9-90f5-a3d5-c4b8-5855e1b01f85","id":"Microsoft","language":"en","matches":[{"confidenceScore":0.49,"length":9,"offset":0,"text":"Microsoft"}]},{"name":"Bill Gates","url":"https://en.wikipedia.org/wiki/Bill_Gates","dataSource":"Wikipedia","bingId":"0d47c987-0042-5576-15e8-97af601614fa","id":"Bill Gates","language":"en","matches":[{"confidenceScore":0.52,"length":10,"offset":25,"text":"Bill Gates"}]},{"name":"Paul Allen","url":"https://en.wikipedia.org/wiki/Paul_Allen","dataSource":"Wikipedia","bingId":"df2c4376-9923-6a54-893f-2ee5a5badbc7","id":"Paul Allen","language":"en","matches":[{"confidenceScore":0.54,"length":10,"offset":40,"text":"Paul Allen"}]}],"warnings":[]},"modelVersion":"2021-06-01"} | {"id":"0","entities":[{"name":"Microsoft","url":"https://en.wikipedia.org/wiki/Microsoft","dataSource":"Wikipedia","bingId":"a093e9b9-90f5-a3d5-c4b8-5855e1b01f85","id":"Microsoft","language":"en","matches":[{"confidenceScore":0.49,"length":9,"offset":0,"text":"Microsoft"}]},{"name":"Bill Gates","url":"https://en.wikipedia.org/wiki/Bill_Gates","dataSource":"Wikipedia","bingId":"0d47c987-0042-5576-15e8-97af601614fa","id":"Bill Gates","language":"en","matches":[{"confidenceScore":0.52,"length":10,"offset":25,"text":"Bill Gates"}]},{"name":"Paul Allen","url":"https://en.wikipedia.org/wiki/Paul_Allen","dataSource":"Wikipedia","bingId":"df2c4376-9923-6a54-893f-2ee5a5badbc7","id":"Paul Allen","language":"en","matches":[{"confidenceScore":0.54,"length":10,"offset":40,"text":"Paul Allen"}]}],"warnings":[]} | ["Microsoft","Bill Gates","Paul Allen"] |
| en       | Pike Place Market is my favorite Seattle attraction.                                                  |                               | {"documents":{"id":"0","entities":[{"name":"Pike Place Market","url":"https://en.wikipedia.org/wiki/Pike_Place_Market","dataSource":"Wikipedia","bingId":"38b9431e-cf91-93be-0584-c42a3ecbfdc7","id":"Pike Place Market","language":"en","matches":[{"confidenceScore":0.86,"length":17,"offset":0,"text":"Pike place market"}]},{"name":"Seattle","url":"https://en.wikipedia.org/wiki/Seattle","dataSource":"Wikipedia","bingId":"5fbba6b8-85e1-4d41-9444-d9055436e473","id":"Seattle","language":"en","matches":[{"confidenceScore":0.27,"length":7,"offset":33,"text":"Seattle"}]}],"warnings":[]},"modelVersion":"2021-06-01"} | {"id":"0","entities":[{"name":"Pike Place Market","url":"https://en.wikipedia.org/wiki/Pike_Place_Market","dataSource":"Wikipedia","bingId":"38b9431e-cf91-93be-0584-c42a3ecbfdc7","id":"Pike Place Market","language":"en","matches":[{"confidenceScore":0.86,"length":17,"offset":0,"text":"Pike place market"}]},{"name":"Seattle","url":"https://en.wikipedia.org/wiki/Seattle","dataSource":"Wikipedia","bingId":"5fbba6b8-85e1-4d41-9444-d9055436e473","id":"Seattle","language":"en","matches":[{"confidenceS


In [None]:
display(result_el_df)

## Azure AI Translator

This sample demonstrates the use of the prebuilt **Azure AI Translator** in Fabric through RESTful APIs to:

- **Translate Text**: Convert text from one language to another.
- **Transliterate Text**: Convert text from one script to another while maintaining the pronunciation.
- **Get Supported Languages**: Retrieve a list of languages supported by the translator.

For detailed usage and API documentation, refer to the Azure AI Translator documentation.


In [None]:
import synapse.ml.core
from synapse.ml.cognitive.translate import *
from pyspark.sql.functions import col, flatten

## Translation Setup

The **Translate** transformer is configured to translate text into:

- **Simplified Chinese** (`zh-Hans`)
- **French** (`fr`)

This setup allows you to convert text from the source language into these target languages, leveraging Azure AI Translator's capabilities.


In [None]:
df = spark.createDataFrame([
  (["Hello, what is your name?", "Bye"],)
], ["text",])

translate = (Translate()
    .setTextCol("text")
    .setToLanguage(["zh-Hans", "fr"])
    .setOutputCol("translation")
    .setConcurrency(5))

result_tr_df = translate.transform(df)\
        .withColumn("translation", flatten(col("translation.translations")))\
        .withColumn("translation", col("translation.text"))



**Your output should look something like below:**

| text                                           | Translate_314fe58ab89c_error | translation                                   |
|:-----------------------------------------------|:------------------------------|:----------------------------------------------|
| [Hello, what is your name?, Bye]              | None                         | [你好，你叫什么名字？, Bonjour, comment vous appelez-vous ?] |


In [None]:
display(result_tr_df)