# Captions Translation

Purpose: Captions Translation (from English to Portuguese)

Importing libraries:

In [30]:
# textblob is used to translate sentences
from textblob import TextBlob
# os is used to provide a way of using operating system dependent functionality such as to manipulate file paths
# glob is used to find all the path names that matches a specific pattern according to user chosen rules
import os, glob

Firstly I will describe how to translate one specific file:

In [51]:
# open the file that we want to translate. Then, specify "r" since we just want to read the file 
to_translate = open("moveautogenerated.vtt", "r")

In [52]:
# create the file where the translation will be stored. Specify "w" since we want to write a file
translated_file = open("translated_file.vtt", "w")

This is how the file to be translated looks like:

In [53]:
cat moveautogenerated.vtt

WEBVTT

0
00:01.410 --> 00:11.550
Lesson 3.2 Move to move objects type "M" on the command bar, select the objects that you want

1
00:11.550 --> 00:15.270
to move from and hit enter.

2
00:15.410 --> 00:20.080
Select the moving reference point then select the destination point

Any attempt to simply translate the whole file, will generate the following translation:

In [35]:
# looping through the lines in the file and translating as it goes:
for line in to_translate:    
    blob = TextBlob(line)
    translated_file.write(str(blob.translate(from_lang="en",to="pt"))+"\n")

NotTranslated: Translation API returned and empty response.

These errors happened because of the following:
1) the first line which is "WEBVTT"
2) empty lines which are stored as "\n"
3) and lines that stores the caption timing such as "00:00.710 --> 00:06.010"

These lines are not to be translated, therefore, any attempt to translate those lines will return a empty response or a unchanged line, which causes errors on the code.

In order to fix that, I decided to handle those 3 cases on the following way: 
If any of those errors occurs,  the code must return the original line from the previous file instead of trying to translate it.


The fixed code will be:

In [54]:
# looping through the lines in the file and translate as it goes:
for line in to_translate: 
    if line[0].isdigit() or line[:2] == '\n' or line[:6] == "WEBVTT":
        translated_file.write(line)
    else:
        blob = TextBlob(line)
        translated_file.write(str(blob.translate(from_lang="en",to="pt"))+"\n")

# closing both files 
translated_file.close()
to_translate.close()

Notes:

1)the "\n" at the end of the next statement creates a new line at the end of each translated line

2)I used the parameter to="pt" because the code for Portuguese language is "pt"
The list of codes for each language can be found at: https://www.loc.gov/standards/iso639-2/php/code_list.php

Checking the output of the translation:

In [55]:
# create the file where the translation will be stored. We specify "w" since we want write a file
translated_file = open("translated_file.vtt", "r").read()
print(translated_file)

WEBVTT

0
00:01.410 --> 00:11.550
Lição 3.2 Mover para mover objetos tipo "M" na barra de comandos, selecione os objetos que você deseja

1
00:11.550 --> 00:15.270
para mover e aperte enter.

2
00:15.410 --> 00:20.080
Selecione o ponto de referência em movimento e selecione o ponto de destino



Now its clear that the file translator works well and it is handling the special cases the way that I expected.
However, in order to handle another exceptions that might occur and that I haven't thought about, I will just adjust the code including a "try" and "except" statements which will assure to return an unchanged line in case of any unexpected exeption. 

In [50]:
for line in to_translate: 
    if line[0].isdigit() or line[:2] == '\n' or line[:6] == "WEBVTT":
        translated_file.write(line)
    else:
        try:
            blob = TextBlob(line)
            translated_file.write(str(blob.translate(to="pt"))+"\n")
        except:
            translated_file.write(line) 

That was all steps that I took to translate a caption file from English to Portuguese. 

As I was asked to translate multiple files, I updated the code to run through all files that I had saved in a folder, as it follows:

In [None]:
# specifying the input directory
input_dir = '\Users\filepath'
# creating a new folder with the translations
# where os.path.join simply writes a file path for the new folder called "translation" 
# by combining the file path of the input directory + "tranlation"
os.mkdir(os.path.join(input_dir,'translation'))


# runs through all files inside the main folder
for file in glob.glob(os.path.join(input_dir,"*.vtt")):
    # write a file path for the file that will be translated
    from_path = os.path.join(input_dir,os.path.basename(file))
    # write a file path for the file that will store the translation by joining the
    # path of the input directory + "translation" folder + the name of file that was 
    # translated begining with "translated_"
    # it will look like the following: "input_dir/translation/translated_filename"
    to_path = os.path.join(input_dir,"translation/translated_"+os.path.basename(file))
    to_translate_file = open(from_path, "r")
    translated_file = open(to_path, "w")

    for line in to_translate:
        if line[0].isdigit() or line[:2] == '\n' or line[:6] == "WEBVTT":
            translated_file.write(line)
        else:
            try:
                blob = TextBlob(line)
                translated_file.write(str(blob.translate(to="bn"))+"\n")
            except:
                translated_file.write(line)

    translated_file.close()
    to_translate_file.close()
    

Using the code above, I successfully translated all files inside the same folder.
If you try to replicate this code and you need any help, please let me know and I will do my best to help the way I can. In addition, if you see any room for improvement, please let me know!