Assignment: 1

# Speech  - Provide code and final result in Jupyter Notebook 

- [x] Transcribe voice to text (Speech) - show the transcribed text 

- [x] Summary the call  - Show summary 

## Task1: Speech to text

* Speech SDK: pip install azure-cognitiveservices-speech

* Azure AI | Speech Studio: https://speech.microsoft.com/portal


In [2]:
import os
import time

# Azure congitive services
import azure.cognitiveservices.speech as speechsdk

# Get key and region from system env
subscription=os.getenv("SPEECH_KEY")
region=os.getenv('SPEECH_REGION')

### Real-time speech to text from microphone (for fun!)

In [None]:
"""
# Real-time speech to text from microphone
"""
def recognize_from_microphone():
    # speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
    speech_config = speechsdk.SpeechConfig(subscription, region)
    speech_config.speech_recognition_language="en-US"

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    print("Speak into your microphone:")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

# recognize_from_microphone()

### Single utterance (Tried, but not an acceptable option for this assignment)

In [None]:
"""
# Single utterance - 
#   While using this speech recongnizer, observed an issue that if the audio file starts with noise, music, or non language sounds,
#   these conditions seems treated as 'Silence', after ~500 ms or as 'NoMatchReason.InitialSilenceTimeout' error will be thrown, then 
#   the recognizer ends.# 
# TODO: Is it possible to set speech config with PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs? Looking into SDK has not found a way to set it
#   But there would be always a catch meaning how long the timeout would be?
# This is no go for now.
"""
def recognize_from_audio_file_single_utterance():
    speech_config = speechsdk.SpeechConfig(subscription, region)
    speech_config.speech_recognition_language="en-US"

    # TODO: Is it possible to set speech config with PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs????
    # speech_config.set_properties(SpeechServiceConnection_InitialSilenceTimeoutMs, "45000")

    audio_config = speechsdk.AudioConfig(filename="BillGates_2010.wav")
    # audio_config = speechsdk.AudioConfig(filename="yearwiththesaints_00_anonymous.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    print("recognize_from_audio_file!")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()
    # speech_recognition_result = speech_recognizer.start_continuous_recognition_async()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

# recognize_from_audio_file_single_utterance()

### Continuous recognition

In [4]:
"""
# Continuous recognition - 
#   To improve from Single utterance, this approache equires us to connect to EventSignal to get the recognition results. 
#   To stop recognition, we must call stop_continuous_recognition() or stop_continuous_recognition().
"""
def recognize_from_audio_file_continuous_recognition():
    speech_config = speechsdk.SpeechConfig(subscription, region)
    speech_config.speech_recognition_language="en-US"

    debug = False //True

    # Audio sample files  
    audio_config = speechsdk.AudioConfig(filename="BillGates_2010.wav")
    # audio_config = speechsdk.AudioConfig(filename="time.wav")
    # audio_config = speechsdk.AudioConfig(filename="yearwiththesaints_00_anonymous.wav")
    # filename_from_input = input()
    # audio_config = speechsdk.AudioConfig(filename=filename_from_input)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    done = False
    #  A callback to stop continuous recognition
    def stop_cb(evt):
        if debug:
            print('CLOSING on {}'.format(evt))
        # Stop SpeechRecognizer!
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    all_results = []
    # Collect recognized text from SpeechRecognizer's recognized events
    def handle_final_result(evt):
        all_results.append(evt.result.text)
        # if debug:
        print(evt.result.text)

    """
    # Following are connect callbacks to events sent from SpeechRecognizer. 
    # The events are:
    #   recognizing: Signal for events that contain intermediate recognition results.
    #   recognized: Signal for events that contain final recognition results, which indicate a successful recognition attempt.
    #   session_started: Signal for events that indicate the start of a recognition session (operation).
    #   session_stopped: Signal for events that indicate the end of a recognition session (operation).
    #   canceled: Signal for events that contain canceled recognition results. 
    #       These results indicate a recognition attempt that was canceled as a result of a direct cancelation request. 
    #       Alternatively, they indicate a transport or protocol failure.
    """
    # 1. Session started
    if debug:
        speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    # 2. Recognizing and recognized
    # speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    # speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.recognized.connect(handle_final_result)
    # 3. Session stopped
    if debug:
        speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
        speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    
    # Handle Stop SpeechRecognizer conditions
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start SpeechRecognizer
    speech_recognizer.start_continuous_recognition()

    # Block till recognition is done
    while not done:
        time.sleep(.5)

    if debug:
        print("all_results: ", all_results)
    return all_results

speech_text = recognize_from_audio_file_continuous_recognition()
# print("speech_text:", speech_text)


I'm going to talk today about energy and climate, and that might seem a bit surprising because my full time work at the Foundation is mostly about vaccines and seeds, about the things that we need to invent and deliver to help the poorest 2 billion live better lives.
But energy and climate are extremely important to these people, in fact more important than to anyone else on the planet. The climate getting worse means that many years their crops won't grow, there'll be too much rain, not enough rain. Things will change in ways that their fragile environment simply can't support. And that leads to starvation, that leads to uncertainty, it leads to unrest. So the the climate changes will be terrible for them.
Also, the price of energy is very important to them. In fact, if you could pick just one thing to lower the price of to reduce poverty by far, you would pick energy now. The price of energy has come down over time.
Really advanced civilization is based on advances in in energy. The

## Task2: Summarize the speech

* Language SDK | Document summarization: pip install azure-ai-textanalytics==5.3.0

* Language SDK | Conversation summarization: pip install azure-ai-language-conversations==1.1.0

* Azure AI | Language Studio: https://language.cognitive.azure.com/home

### Document summarization - Extractive Summary

In [5]:
"""
# Document summarization - Extractive Summary

"""
# Get key from environment variables named "LANGUAGE_KEY"
key = os.environ.get('LANGUAGE_KEY')
# endpoint = os.environ.get('LANGUAGE_ENDPOINT') // could hide endpoint as well.
endpoint = 'https://yongailanguage.cognitiveservices.azure.com/'

# from azure.ai.textanalytics import TextAnalyticsClient
from azure.ai.textanalytics import (
        TextAnalyticsClient,
        ExtractiveSummaryAction
    ) 
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Summarizing text in a document
def sample_extractive_summarization(client, document_to_summarize, int_max_sentence_count):
    # Function to convert
    def listToString(s):    
        # initialize an empty string
        str1 = " "    
        # return string
        return (str1.join(s))

    # print("document.len(): ", len(document_to_summarize))
    document = []
    document.append(listToString(document_to_summarize))
    # print("document: ", document)

    # document = [
    #     "The extractive summarization feature uses natural language processing techniques to locate key sentences in an unstructured text document. "
    #     "These sentences collectively convey the main idea of the document. This feature is provided as an API for developers. " 
    #     "They can use it to build intelligent solutions based on the relevant information extracted to support various use cases. "
    #     "Extractive summarization supports several languages. It is based on pretrained multilingual transformer models, part of our quest for holistic representations. "
    #     "It draws its strength from transfer learning across monolingual and harness the shared nature of languages to produce models of improved quality and efficiency. "
    # ]
    # print("document: ", document)

    poller = client.begin_analyze_actions(
        document,
        actions=[
            ExtractiveSummaryAction(max_sentence_count=int_max_sentence_count)
        ],
    )

    document_results = poller.result()
    for result in document_results:
        extract_summary_result = result[0]  # first document, first result
        if extract_summary_result.is_error:
            print("...Is an error with code '{}' and message '{}'".format(
                extract_summary_result.code, extract_summary_result.message
            ))
        else:
            print("Summary extracted: \n{}".format(
                " ".join([sentence.text for sentence in extract_summary_result.sentences]))
            )

# Maximum number of sentences to return. Defaults to 3.
max_sentence_count = 3
sample_extractive_summarization(client, speech_text, max_sentence_count)

Summary extracted: 
But energy and climate are extremely important to these people, in fact more important than to anyone else on the planet. It's a stupid waste of the earth's resources to put money towards that when there are better things we can do. California. My name is Sonia Nacho from the Evans Own Foundation.


### Document summarization - Abstractive Summary

In [7]:
"""
# Document summarization - Abstractive Summary

"""
import document_abstract_summary as das

# Function to convert
def listToString(s):    
    # initialize an empty string
    str1 = " "    
    # return string
    return (str1.join(s))

# print("document.len(): ", len(speech_text))
document = []
document.append(listToString(speech_text))
# print("document: ", document)

das.sample_abstractive_summarization(document)

Summaries abstracted:
Sonia Nacho, a researcher at the Evans Own Foundation, discusses the importance of energy and climate to the world's poorest people. She believes that energy and climate are extremely important to these people, and that the price of energy is very important to them. She discusses the need for energy miracles, and she suggests five solutions that can achieve the big numbers: burning fossil fuels, nuclear, renewable sources, and a breakthrough in nuclear technology. She also discusses climate skeptics and how to persuade them that solving the CO2 problem is not a waste of resources.

