# Google Cloud Example 


### William Fallas. williamfallas@gmail.com

#### Spech and translation with  GOOGLE API 

The following project shows how to extract audio from video and translate  the same from english to spanish

Local Video file: voice-recognition.mp4 

Main Steps

     -extract audio was format
    - Change stereo to mono
    - upload wav file to google bucket
    - Use the SpeechClient API, long_running_recognize method to get the english transcript
    - Use translate_v2 API to translate the transcript from english to spanish
    - Transform the spanish transcript to voice with  the texttospeech API 
    - Save the result in a mp3 file and play
    

Necessary packages

In [123]:
#Packages ---------------------------------------
#pip install moviepy
#pip install pydub
#pip install --upgrade google-cloud-speech
#pip install --upgrade google-cloud-storage
#pip install google-cloud-translate==2.0.1
#!pip install --upgrade google-cloud-texttospeech
#pip install pygame#

Set the google credentials

In [10]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="" # set your credentials here

In [122]:
import moviepy
from pydub import AudioSegment
import io
import os
import wave
from google.cloud import storage
from IPython.display import Video

# Sample Video from local drive

In [121]:
Video("voice-recognition.mp4")

# Get audio from video 

In [89]:
import moviepy.editor
video = moviepy.editor.VideoFileClip("voice-recognition.mp4")

audio = video.audio

audio.write_audiofile("speech.wav")


chunk:   0%|                                                                        | 0/1546 [00:00<?, ?it/s, now=None]

MoviePy - Writing audio in speech.wav


                                                                                                                       

MoviePy - Done.




In [119]:


def stereo_to_mono(audio_file_name):
    sound = AudioSegment.from_wav(audio_file_name)
    sound = sound.set_channels(1)
    sound.export(audio_file_name, format="wav")


def frame_rate_channel(audio_file_name):
    with wave.open(audio_file_name, "rb") as wave_file:
        frame_rate = wave_file.getframerate()
        channels = wave_file.getnchannels()
        return frame_rate,channels

def upload_blob(bucket_name, source_file_name, destination_blob_name):
    """Uploads a file to the bucket."""
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)
    
    
def delete_blob(bucket_name, blob_name):
    """Deletes a blob from the bucket."""
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)

    blob.delete()
    
    
def write_transcripts(transcript_filename,transcript):
    f= open( transcript_filename,"w+")
    f.write(transcript)
    f.close()

In [92]:
file_name="speech.wav"

frame_rate, channels = frame_rate_channel(file_name)
    
if channels > 1:
    stereo_to_mono(file_name)

Once we have the the audio file, its neccesary upload to google bucket, audio files more than 1 MB must be processed from google bucket

In [96]:
upload_blob('wfallas_testbucket',"speech.wav","speech.wav")

Use the long_running_recognize method for long audio files, and wait for 10 seconds for completion

In [110]:
gcs_uri = 'gs://' + "wfallas_testbucket" + '/' + "speech.wav"
transcript = ''
    
client = speech.SpeechClient()
audio = dict(uri=gcs_uri)


config = speech.RecognitionConfig(
        language_code="en-US",
        sample_rate_hertz=frame_rate,
)


    # Detects speech in the audio file
operation = client.long_running_recognize(config=config, audio=audio)
response = operation.result(timeout=10000)

for result in response.results:
    transcript += result.alternatives[0].transcript


#  English transcript from video

In [111]:
transcript

"web accessibility perspectives voice recognition sometimes it's just easier to speak one of the advances of technology is voice recognition whether it's searching the web 19th century architecture send email hard or controlling your navigation app many people with physical disabilities reline voice recognition to use the computer order for that to happen websites and apps need to be property coded cancel voice recognition can help with some other people with temporary limitations to light an injured arm injury over people simply prefer invoice with accessibility essential to some useful. You AI perspectives the more information on voice recognition"

In [114]:
delete_blob("wfallas_testbucket", "speech.wav")

In [120]:
write_transcripts("EnglishTranscript.txt",transcript)

# Translate english transcript to spanish

Get the spanish translation with translate_v2 API

In [199]:


def translate_text(target, text):
    
    translation=""
    import six
    from google.cloud import translate_v2 as translate

    translate_client = translate.Client()

    if isinstance(text, six.binary_type):
        text = text.decode("utf-8")

    # Text can also be a sequence of strings, in which case this method
    # will return a sequence of results for each text.
    result = translate_client.translate(text, target_language=target)
    
    translation=result["translatedText"]
    return translation


In [200]:
translation=""
esp=translate_text("ES",transcript)


In [202]:
esp

'perspectivas de accesibilidad web reconocimiento de voz a veces es más fácil hablar uno de los avances de la tecnología es el reconocimiento de voz, ya sea que busque en la web arquitectura del siglo XIX envíe correo electrónico con fuerza o controle su aplicación de navegación muchas personas con discapacidades físicas reconectan el reconocimiento de voz para usar el orden de la computadora para que para que suceda, los sitios web y las aplicaciones deben tener un código de propiedad, cancelar el reconocimiento de voz puede ayudar a otras personas con limitaciones temporales a encender una lesión en el brazo lesionado en lugar de que las personas simplemente prefieran la factura con accesibilidad esencial para algunos útiles. Sus perspectivas AI cuanta más información sobre el reconocimiento de voz'

# Transform spanish transcript to audio

In [203]:
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text=esp)

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
    language_code="es-ES", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)

# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config,timeout=10000
)



# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)


# Play the Final result

In [206]:
from pygame import mixer  # Load the popular external library

mixer.init()
mixer.music.load('output.mp3')
mixer.music.play()