# Machine Translation

Two examples scenarios where MT may be required:
- Our client’s products are used by people around the world who leave reviews on social media in multiple languages. Our client wants to know the general sentiment of those reviews. For this, instead of looking for sentiment analysis tools in multiple languages, one option is to use an MT system, translate all the reviews into one language, and run sentiment analysis for that language.
- We work with a lot of social media data (e.g., tweets) on a regular basis and notice that it’s unlike the kind of text we encounter in typical text documents. For example, consider the sentence, “am gud,” which, in formal, well-formed English is, “I am good.” (More details on how social media text differs from normal, well-formed text are in Chapter 8.) MT can be used to map these two sentences by treating the conversion from “am gud” to “I am good” as an informal-to-grammatical English translation problem.

# Practical Advice

First, as we explained earlier, don’t build your own MT system if you don’t have to. It’s more practical to make use of translation APIs. When using such APIs, it’s important to pay close attention to pricing policies. Considering the costs involved, it might be a good idea to store the translations of frequently used text (called a translation memory or a translation cache).

> Maintain a translation memory, which can be used for translations that repeat frequently.

> Data augmentation is a useful approach to collect more training data for building an MT system.

In [1]:
import os, requests, uuid, json

In [2]:
#You will need a subscription key - you can use trial version
subscription_key = "XXXX"
endpoint = "https://api-nam.cognitive.microsofttranslator.com"
path = '/translate?api-version=3.0'
params = '&to=de' #From English to German (de)
constructed_url = endpoint + path + params

In [3]:
headers = {
    'Ocp-Apim-Subscription-Key': subscription_key,
    'Content-type': 'application/json',
    'X-ClientTraceId': str(uuid.uuid4())
}

body = [{'text' : 'How good is Machine Translation?'}]
request = requests.post(constructed_url, headers=headers, json=body)
response = request.json()

print(json.dumps(response, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "error": {
        "code": 401000,
        "message": "The request is not authorized because credentials are missing or invalid."
    }
}
