Copy your Speech Service Key (`cognitive_key`) and Service Region (`cognitive_service_region`) to the variables listed below.

In [6]:
cognitive_key = '<YourKey>'
cognitive_service_region = '<YourRegion>'

print('Cognitive services ready at {}'.format(cognitive_service_region))

Cognitive services ready at centralus


Let's install Azure Speech SDK

In [7]:
pip install azure.cognitiveservices.speech

Collecting azure.cognitiveservices.speech
  Using cached azure_cognitiveservices_speech-1.15.0-cp36-cp36m-manylinux1_x86_64.whl (3.1 MB)
Installing collected packages: azure.cognitiveservices.speech
Successfully installed azure.cognitiveservices.speech
Note: you may need to restart the kernel to use updated packages.


### Speech Recognition

The first step for our solution is to set up the `audio_config` that points to our physical audio (WAV) files stored in the file system, and `speech_config` that contains our Azure Cognitive Services Key and Endpoint. 
Once we are done setting up our config objects, we initiate `SpeechRecognizer` from Speech SDK and pass in our config files as parameters, including the recording's language. Finally, we call the `recognize_once` method of our SpeechRecognizer to execute the recognition. The operation returns us the text of the transcription.

In [8]:
import os
import IPython
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer, AudioConfig
import azure.cognitiveservices.speech as speechsdk

file_name = 'speech-sample.wav'
audio_file = file_name

speech_config = SpeechConfig(cognitive_key, cognitive_service_region)
audio_config = AudioConfig(filename=audio_file) 
speech_recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config, language="en-US", audio_config=audio_config)

result = speech_recognizer.recognize_once()

print(result)

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

SpeechRecognitionResult(result_id=52d3d3e4da3446f685133aad9e92e592, text="This is me talking to see how Azure Cognitive Services does speech recognition.", reason=ResultReason.RecognizedSpeech)
Recognized: This is me talking to see how Azure Cognitive Services does speech recognition.


### Speech synthesis

For the speech syntehsis first we have set up a `speech_config` by assigning a voice font to its `speech_synthesis_voice_name` property and a `file_config` that points out to our output file. We use the `AudioOutputConfig` method to create the appropriate file configuration. 
Once our configuration objects are ready we call `SpeechSynthesizer` passing in both audio and speech configuration objects. Finally, we use call `speak_text_async` from our `speech_synthesizer` to execute the operation. Once the operation is complete we will 
have a WAV Audio file named outputaudio.wav in our storage.

In [22]:
speech_config = speechsdk.SpeechConfig(subscription=cognitive_key, region=cognitive_service_region)
voice = "Microsoft Server Speech Text to Speech Voice (en-US, BenjaminRUS)"
text = "Using Azure Speech to Text Capability"

file_name = "outputaudio.wav"
speech_config.speech_synthesis_voice_name = voice
file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)
result = speech_synthesizer.speak_text_async(text).get()

IPython.display.display(IPython.display.Audio(file_name, autoplay=True))

Feel free to delete the output file once you are done with the exercise.

In [21]:
import os
if os.path.exists("outputaudio.wav"):
  os.remove("outputaudio.wav")