# Translation notebook

Google Translate
* https://pypi.org/project/googletrans/ (gir feilmelding - åpen issue) 
* https://pypi.org/project/pygoogletranslation/ (feilmelding)
* https://pypi.org/project/google-trans-new/ (feilmelding)
* https://pypi.org/project/pyGoogleTranslate/ (gir feilmelding og er lite effektiv)
* https://github.com/Animenosekai/translate (virker å fungere bra)
* Google sitt eget API (koster penger) 

Andre oversettere
* https://blog.api.rakuten.net/top-10-best-translation-apis-google-translate-microsoft-translator-and-others/

Nevrale modeller
* https://github.com/UKPLab/EasyNMT (har kun prøvd en av modellene) 

Errormeldinger: 
* https://github.com/ssut/py-googletrans/issues/234
* https://github.com/ssut/py-googletrans/pull/237

## Imports 

In [None]:
#!poetry add <package>

In [None]:
import sys
import os
import pandas as pd

from googletrans import Translator #fungerer ikke 
import pyGoogleTranslate as pgt 
import translatepy
from easynmt import EasyNMT

import warnings
warnings.filterwarnings("ignore")

## Load data

In [None]:
path = '/Users/vildearntzen/Desktop/master_kode/master_kode/data/'
df = pd.read_csv(path + 'dk.csv')
df.head(2)

In [None]:
for i in range(15,20):
    print(df["opus-mt"][i])
    print()

In [None]:
for i in range(15,20):
    print(df["cleaned"][i])
    print()

## Translation functions

Functions from different translation libraries/models. The implementation should be correct but note that at the time of writing there are some open issues on some of the libraries causing the functions not to work. However, the functions are kept in case of they working in the future. The error messages are mentioned in the comment above each non-working function.

In [None]:
# AttributeError: 'NoneType' object has no attribute 'group'
def google_trans(df, col):
    '''
    df: dataframe
    col: text column to translate
    '''
    translator = Translator()
    df["no"] = df.apply(translator.translate, src="da", dest ="no").apply(gettatr, args=("text",))
    return df


# 'An error occured while translating: translation not found.'
def pgt_trans(df, col):
    '''
    df: dataframe
    col: text column to translate
    '''
    pgt.browser("chrome", executable_path = '/usr/local/bin/chromedriver')
    df["no"] = df[col].apply(pgt.translate, destination_language = "no", source_language = "da")
    return df


def _translatepy(text):
    '''
    helpfunction: translating text to Norwegian using translatepy
    '''
    translator = translatepy.Translator()
    return translator.translate(text, destination_language = "Norwegian").result


def translatepy_translate(df, col):
    translator = translatepy.Translator()
    df["translatepy_no"] = df[col].apply(_translatepy)
    return df


def _translateeasynmt(text, model):
    '''
    helpfunction: translating text to Norwegian using easynmt
    '''
    try:
        res = model.translate(text, source_lang="da", target_lang = "no")
        return res
    except:
        print("\n.....................\n")
        print(text, "was not translated")
        print("\n.....................\n")
    return text
    
    

def easynmt_translate(df, col, model_name):
    '''
    df: dataframe
    col: text column to translate
    model: model used for translation ['opus-mt', 'mbart50_m2m' 'm2m_100_418M', 'm2m_100_1.2B']
    notes: 
    opus-mt does not translate very well for da-no 
    mbart50_m2m does not support da-no
    
    '''
    model = EasyNMT(model_name)
    df["easynmt_no" + "_" + model_name] = df[col].apply(_translateeasynmt, model = model)
    return df
    

In [None]:
df_trans = translatepy_translate(df, "cleaned")

In [None]:
df_trans = df_trans[["Text", "Translated Text", "uid", "Source", "Sub-Task A", "Sub-Task B", "Sub-Task C", "cleaned", "translatepy_no"]]
df_trans.head(3)

In [None]:
df_trans = easynmt_translate(df_trans, "cleaned", "m2m_100_418M")

In [None]:
df_trans.head(3)

In [None]:
#df_trans = df_trans[["Text", "Translated Text", "uid", "Source", "Sub-Task A", "Sub-Task B", "Sub-Task C", "cleaned", "translatepy_no"]]
#df_trans

In [None]:
df_trans.to_csv(path + "dk_preprocessed_translations.csv")

In [None]:
pd.read_csv(path +  "dk_preprocessed_translations.csv", index_col = 0)