# Transform and enrich data seamlessly with AI functions (PySpark)

[Transform and enrich data seamlessly with AI functions - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/overview?tabs=pandas#getting-started-with-ai-functionshttps://learn.microsoft.com/en-us/fabric/data-science/ai-functions/overview?tabs=pandas#getting-started-with-ai-functions)

Use of the AI functions library in a Fabric notebook currently requires certain custom packages. The following code installs and imports those packages. Afterward, you can use AI functions with pandas or PySpark, depending on your preference. This notebook uses PySpark.

In [1]:
%%configure -f
{
    "name": "synapseml",
    "conf": {
        "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.10-spark3.5,com.microsoft.azure:synapseml-internal_2.12:1.0.10.0-spark3.5",
        "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
        "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
        "spark.yarn.user.classpath.first": "true",
        "spark.sql.parquet.enableVectorizedReader": "false"
    }
}

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, -1, Finished, Available, Finished)

* AI functions are supported in the Fabric 1.3 runtime and higher.  
* By default, AI functions are currently powered by the gpt-3.5-turbo (0125) model. 
* Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.  
* During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with Fabric's built-in AI endpoint.  

In [2]:
from synapse.ml.spark.aifunc.DataFrameExtensions import AIFunctions
from synapse.ml.services.openai import OpenAIDefaults
defaults = OpenAIDefaults()
defaults.set_deployment_name("gpt-35-turbo-0125")

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 3, Finished, Available, Finished)

### Calculate similarity with `ai.similarity`

The `ai.similarity` function invokes AI to compare input text values with a single common text value, or with pairwise text values in another column. The output similarity scores are relative, and they can range from **-1** (opposites) to **1** (identical). A score of **0** indicates that the values are completely unrelated in meaning. For more detailed instructions about the use of `ai.similarity`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/similarity).

In [3]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike")
    ], ["names", "companies"])

similarity = df.ai.similarity(input_col="names", other_col="companies", output_col="similarity")
display(similarity)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 4, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 16159f9f-973a-44b8-931d-f24f8ae944ef)

### Categorize text with `ai.classify`

The `ai.classify` function invokes AI to categorize input text according to custom labels you choose. For more information about the use of `ai.classify`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/classify).

In [4]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 5, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, b8750e49-a243-49dc-a3ba-aea22dd4c1f8)

###  Detect sentiment with `ai.analyze_sentiment`

The `ai.analyze_sentiment` function invokes AI to identify whether the emotional state expressed by input text is positive, negative, mixed, or neutral. If AI can't make this determination, the output is left blank. For more detailed instructions about the use of `ai.analyze_sentiment`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/analyze-sentiment).

In [5]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("The cleaning spray permanently stained my beautiful kitchen counter. Never again!",),
        ("I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",),
        ("I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",),
        ("The umbrella is OK, I guess.",)
    ], ["reviews"])

sentiment = df.ai.analyze_sentiment(input_col="reviews", output_col="sentiment")
display(sentiment)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 6, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, d7cad830-b7cc-45ee-be97-d9f6b828bd38)

### Extract entities with `ai.extract`

The `ai.extract` function invokes AI to scan input text and extract specific types of information designated by labels you choose—for example, locations or names. For more detailed instructions about the use of `ai.extract`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/extract).

In [6]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 7, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 5c55fd4c-7e62-485f-902b-47a90511303c)

###  Fix grammar with `ai.fix_grammar`

The `ai.fix_grammar` function invokes AI to correct the spelling, grammar, and punctuation of input text. For more detailed instructions about the use of `ai.fix_grammar`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/fix-grammar).

In [7]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",)
    ], ["text"])

corrections = df.ai.fix_grammar(input_col="text", output_col="corrections")
display(corrections)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 8, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, bdc06415-5845-4b40-8549-dbb32e76dd0f)

### Summarize text with `ai.summarize`

The `ai.summarize` function invokes AI to generate summaries of input text (either values from a single column of a DataFrame, or row values across all the columns). For more detailed instructions about the use of `ai.summarize`, visit [this dedicated article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/summarize).

In [8]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summary")
display(summaries)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 9, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, f07facb9-1d1a-4c63-b0c0-5e0328f3f1f2)

###  Translate text with `ai.translate`

The `ai.translate` function invokes AI to translate input text to a new language of your choice. For more detailed instructions about the use of `ai.translate`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/translate).

In [9]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 10, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 847fea02-c8de-4241-b38a-7b442330effa)

### Answer custom user prompts with `ai.generate_response`

The `ai.generate_response` function invokes AI to generate custom text based on your own instructions. For more detailed instructions about the use of `ai.generate_response`, visit [this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/generate-response).

In [10]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Scarves",),
        ("Snow pants",),
        ("Ski goggles",)
    ], ["product"])

responses = df.ai.generate_response(prompt="Write a short, punchy email subject line for a winter sale.", output_col="response")
display(responses)

StatementMeta(, dde864e7-8dee-47d1-896b-ab492c3a1ee1, 11, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, d4dd2388-ed62-41fd-92d4-cb5c5de40c1a)