# Welcome to the Speech Example with Websockets

This example covers the use of both Text to Speech (tts) (converting text into audio) and Speech to Text (stt) (converting audio into text).  In these examples, you'll use tts to create an audio file from a block of text
and then convert that audio back into text using stt.  

This example uses the websockets tts api,  unlike the 'normal' [example](https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/speech_to_text_v1.py) which uses the batch interface.  The websockets api
allows you to get interim results back, and to queue a bunch of audio files and transcribe them all.  

You'll need a service instance and a set of credentials for both services.
If you don't already have them then you'll need to get them.  You can learn how do that [Here](https://www.ibm.com/watson/developercloud/doc/common/getting-started-credentials.html), which a
tutorial on how to get service credentials. 




In [None]:
# this is only relevant to those running python < 3
from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

This next block is just the imports. 

The `StreamingSpeechToTextProtocol` is the base classed used
for the websockets version of the Speech To Text api.  Websockets allow
the service to send back interim results as it transcribes the audio.  

In [1]:
from watson_developer_cloud import TextToSpeechV1, SpeechToTextV1, StreamingSpeechToTextProtocol
import json
from os.path import join, dirname
import IPython

These are the [Text to Speech](https://console.ng.bluemix.net/registration/?target=/catalog/services/text-to-speech/) credentials

In [2]:
TEXT_TO_SPEECH_USERNAME="YOUR SERVICE USERNAME"
TEXT_TO_SPEECH_PASSWORD="YOUR SERVICE PASSWORD"

And these are the [Speech to Text](https://console.ng.bluemix.net/registration/?target=/catalog/services/speech-to-text/)
credentials.

This will construct an instance of the tts service object and take a block of text to convert to audio.  This
audio file is generated and saved (we'll be using that file later in this notebook). 

In [None]:
SPEECH_TO_TEXT_USERNAME='YOUR SERVICE USERNAME'
SPEECH_TO_TEXT_PASSWORD='YOUR SERVICE PASSWORD'

In [3]:
tts = TextToSpeechV1(username=TEXT_TO_SPEECH_USERNAME,
                     password=TEXT_TO_SPEECH_PASSWORD)

In [4]:
to_say = """Here we have a segment taken from a random wikipedia page:
In the 1880s, technologies emerged that would ultimately form the core of what would become International
Business Machines (IBM). Julius E. Pitrat patented the computing scale in 1885;
Alexander Dey invented the dial recorder (1888);
Herman Hollerith patented the Electric Tabulating Machine;
and Willard Bundy invented a time clock to record a worker's arrival
and departure time on a paper tape in 1889. On June 16, 1911,
their four companies were amalgamated in New York State by Charles Ranlett
Flint forming a fifth company, the Computing-Tabulating-Recording
Company (CTR) based in Endicott, New York.
"""
with open('output.wav','wb') as audio_file:
    audio_file.write(
        tts.synthesize(to_say,
                       accept='audio/wav',
                       voice="en-US_LisaVoice"))

iPython convientently provides a way to play audio files.  This is using the 'Lisa' voice by default,  the [API](https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/) has a bunch of voice you can experiment with. 

In [5]:
IPython.display.Audio("output.wav")

This is the subclass of `StreamingSpeechToTextProtocol` that does all the work.  You can override the `onMessage`, `onOpen`, `onConnect`, `onError`, and `onClose` (be sure to call `super` when you override the methods).  These
methods get called for each websocket.  

In [6]:
class MySSTP(StreamingSpeechToTextProtocol):
    def onMessage(self, payload, isBinary):
        super(MySSTP, self).onMessage(payload, isBinary)
        parsed = json.loads(payload)
        try:
            print("File Name: {}".format(self.metadata['fromfile']))
        except:
            pass
        
        if 'error' in parsed:
            print("Error {}".format(parsed))
        if 'state' in parsed:
            print("State: {}".format(parsed['state']))
            self.addState(parsed['state'])
            # we know that if we've seen two listening messages we're not
            # going to get any more data, so we should close the connection
            if self.countState('listening') == 2:
                self.sendClose(1000)
        if 'results' in parsed:
            for result in parsed['results']:
                if result['final']:
                    print("Final Result")
                
                for alternative in result['alternatives']:
                    if 'confidence' in alternative:
                        print("Confidence {}".format(alternative['confidence']))
                    if 'transcript' in alternative:
                        print("Transcript:")
                        print(alternative['transcript'])
                    print("--------------------------------------------------------")
                    
                if result['final']:
                    print("--------------------------------------------------------")                

Now, take the audio file produced earlier and transcribe it back into text. 

This is using the `en-US_BroadbandModel` as the audio model and returns interim results as they are transcribed.  The subclass `MySSTP` is passed in here as well.  

Then the file is added. Any number of files can be added here,  the sdk uses Twisted to manage things, so this code is NOT thread safe but it is concurrent.  After the audio files are added,  you can call the `runUntilDone` method which 
will process the audio files and shutdown the Twisted reactor when they are all done.  Otherwise, you have to `start` and `stop` the reactor manually.  The program will not exit until the reactor is stopped,  and once it is stopped it cannot be restarted.  

In [7]:
stt = SpeechToTextV1(username=SPEECH_TO_TEXT_USERNAME,
                     password=SPEECH_TO_TEXT_PASSWORD)

params = stt.create_recognize_stream(model='en-US_BroadbandModel',
                                     interim_results=True,
                                     protocol_class=MySSTP)

MySSTP.addAudioFile(filename="output.wav",
                    content_type="audio/wav",
                    metadata={'fromfile': 'output.wav'})
                
MySSTP.runUntilDone()

{'action': 'start', 'continuous': True, 'interim_results': False, 'content-type': 'audio/wav'}
File Name: output.wav
State: listening
File Name: output.wav
Final Result
Confidence 0.953
Transcript:
here we have a segment taken from a random wikipedia page in the eighteen eighties technologies emerged that would ultimately for the core of what would become international business machines IBM 
--------------------------------------------------------
--------------------------------------------------------
Final Result
Confidence 0.915
Transcript:
Julia see Patrick patented the computing scale in eighteen eighty five Alexander day invented the dial recorder eighteen eighty eight Herman Hollerith patented the electric tabulating machine and Willard Bundy invented a time clock to record a worker's arrival and departure time on a paper tape in eighteen eighty nine on June sixteenth nineteen eleven therefore companies were amalgamated in New York state by Charles Randlett flint forming a fift