## Quick start
use the api directly with hardcoded endpoints. A future update might break these.

### In python with requests

In [1]:
import requests
# src & tgt are 2 letter iso 639 codes
api_url = 'https://lindat.cz/translation/api/v2/languages/?src=cs&tgt=uk'
text_to_translate = 'Test'
response = requests.post(api_url, {'input_text': text_to_translate})
print(response.content.decode('utf-8')) # decoding might not be necessary, but my jupyter on windows has some issues...

Тест



The response can be json - an array of individual sentences. Also the `src` and `target` can be part of `POST` data

In [2]:
url = 'https://lindat.cz/translation/api/v2/languages'
source_language = 'cs'
target_language = 'uk'
multi_line_text = 'Test na prvním řádku.\nTest na druhém řádku. Další věta na druhém řádku.'
response = requests.post(url, {'src': source_language, 'tgt': target_language, 'input_text': multi_line_text}, headers={'Accept': 'application/json'})
print(response.json())

['Тест на першій лінії.\n', 'Тест на другому рядку.', 'Наступне речення на другому рядку.\n']


### sending files
the only supported content type at the moment is `text/plain`

In [3]:
# cat quick_start_file.txt
with open('quick_start_file.txt', 'r') as f:
    print(''.join(f.readlines()))

"Pokus v souboru." 



In [4]:
# the tuple is filename, content, content type
files = {"input_text": ('quick_start_file.txt', open('quick_start_file.txt', 'rb'), 'text/plain')}
print(requests.post(api_url, files=files).content.decode('utf-8'))

"Спроба в файлі".




### Translate 'Test' with json response from command line with curl

In [5]:
!curl -F "input_text=Test" -H "Accept: application/json" -s "https://lindat.cz/translation/api/v2/languages/?src=cs&tgt=uk"

["\u0422\u0435\u0441\u0442\n"]


You can also send files via curl; just be careful when passing stdin; curl doesn't autodetect the content type
```
cat quick_start_file.txt | curl -F "input_text=@-;type=text/plain" "https://lindat.cz/services/translation/api/v2/languages/?src=cs&tgt=en"
```
note the type is explicitely defined

### FUP
There is a global timeout of 10mins. You'll most likely get `504 Gateway Timeout` at 10min mark. If you don't receive the translation in this time the service might be overloaded. You can try at another time and/or try sending smaller portions of the text (eg. one paragraph) at a time.
The requests and connections in a second per ip address are limited. You'll usually receive `429 Too Many Requests` if you are over the requests per seconds rate.
Do check the status (codes) of the response and let the service rest for a while if you are seeing any of the above codes, or `503`.


Also see the [errors](#Errors) part of this document.



### The api at https://lindat.cz/translation/ supports also the following fields/parameters
- `logInput=true` - with this you give us a permission to keep your inputs and use them to further improve the system.
- `author=NAME` - if `logInput=true` this will keep the `NAME` string with the input; it's meant to identify individual/organizations providing the inputs.
- `frontend=XXX` - if `logInput=true` this will keep the `XXX` string with the input; a way of distinguishing between alternative frontends (eg. "web" or "android app")

## Advanced usage

Advanced usage with discovery. Note this is using a different endpoint than the one used in quick start. In essence the api is the same (except for the additional parameters mentioned above), they are configured with different models/languages.

Also see https://lindat.cz/services/translation/docs and/or https://lindat.cz/services/translation/api/v2/doc (and/or see swagger.json at https://lindat.cz/services/translation/api/v2/swagger.json which the interactive api doc is using)

### The responses
The `GET` `application/json` responses are based on `hal+json` format and you can inspect the interactively through https://lindat.cz/services/translation/static/hal-browser/browser.html#/services/translation/api/v2/

In [6]:
import json # for pretty printing

host_url = 'https://lindat.cz'
api_url = '/services/translation/'
host_and_api = host_url + api_url
headers = {'Accept': 'application/json'}
r = requests.get(host_and_api, headers=headers)
print(json.dumps(r.json(), indent=2))

{
  "_links": {
    "self": {
      "href": "/services/translation/api/v2/"
    },
    "models": {
      "href": "/services/translation/api/v2/models/",
      "name": "models"
    },
    "languages": {
      "href": "/services/translation/api/v2/languages/",
      "name": "languages"
    }
  }
}


The response provides a self link and two additional endpoints. The `models` endpoint lets you interact with the api on a "lower level", meaning you might need some extra knowledge about a particular model and why you'd want to use it; eg. at one  point there was a multi lingual model for the medical domain available. The translation quality for texts outside of that domain was not particularly interesting, so it was left out of the "easier" `languages` endpoint.
The `languages` endpoint lets you only pick the source and target language. You have no say in what models are used (currently we have multiple versions for cs<->en) and you actually don't know if it's a direct translation or a pivoted one (eg. cs->hi, might actually be cs->en->hi).
Also note there are no guarantees the translation works in both directions, eg. you can translate to Hindi but not from Hindi.

In [7]:
languages_url = host_url + r.json()['_links']['languages']['href'] 
languages = requests.get(languages_url, headers=headers)
print(json.dumps(languages.json(), indent=2))

{
  "_links": {
    "item": [
      {
        "href": "/services/translation/api/v2/languages/cs",
        "name": "cs",
        "title": "Czech"
      },
      {
        "href": "/services/translation/api/v2/languages/en",
        "name": "en",
        "title": "English"
      },
      {
        "href": "/services/translation/api/v2/languages/fr",
        "name": "fr",
        "title": "French"
      },
      {
        "href": "/services/translation/api/v2/languages/de",
        "name": "de",
        "title": "German"
      },
      {
        "href": "/services/translation/api/v2/languages/hi",
        "name": "hi",
        "title": "Hindi"
      },
      {
        "href": "/services/translation/api/v2/languages/pl",
        "name": "pl",
        "title": "Polish"
      },
      {
        "href": "/services/translation/api/v2/languages/ru",
        "name": "ru",
        "title": "Russian"
      }
    ],
    "self": {
      "href": "/services/translation/api/v2/languages/"
    },
    "

The above lists the `languages` endpoint response. There's a list of languages (either source or target) in `[_links][item]` array. And, to save a request, the `[_embedded]` contains the responses you'd get by querying the individual languages: eg. compare:

In [8]:
# print the embedded object for 'cs'
print(json.dumps(list(filter(lambda x: x['name'] == 'cs', languages.json()['_embedded']['item'])), indent=2))

[
  {
    "_links": {
      "translate": {
        "href": "/services/translation/api/v2/languages{?src,tgt}",
        "templated": true
      },
      "sources": [
        {
          "href": "/services/translation/api/v2/languages/ru",
          "name": "ru",
          "title": "Russian"
        },
        {
          "href": "/services/translation/api/v2/languages/en",
          "name": "en",
          "title": "English"
        },
        {
          "href": "/services/translation/api/v2/languages/de",
          "name": "de",
          "title": "German"
        },
        {
          "href": "/services/translation/api/v2/languages/fr",
          "name": "fr",
          "title": "French"
        },
        {
          "href": "/services/translation/api/v2/languages/pl",
          "name": "pl",
          "title": "Polish"
        }
      ],
      "targets": [
        {
          "href": "/services/translation/api/v2/languages/ru",
          "name": "ru",
          "title": "Russian"


and

In [9]:
# get /services/translation/api/v2/languages/cs
cs_href = list(filter(lambda x: x['name'] == 'cs', languages.json()['_links']['item']))[0]['href']
cs = requests.get(host_url + cs_href, headers=headers)
print(json.dumps(cs.json(), indent=2))

{
  "_links": {
    "translate": {
      "href": "/services/translation/api/v2/languages{?src,tgt}",
      "templated": true
    },
    "sources": [
      {
        "href": "/services/translation/api/v2/languages/ru",
        "name": "ru",
        "title": "Russian"
      },
      {
        "href": "/services/translation/api/v2/languages/en",
        "name": "en",
        "title": "English"
      },
      {
        "href": "/services/translation/api/v2/languages/de",
        "name": "de",
        "title": "German"
      },
      {
        "href": "/services/translation/api/v2/languages/fr",
        "name": "fr",
        "title": "French"
      },
      {
        "href": "/services/translation/api/v2/languages/pl",
        "name": "pl",
        "title": "Polish"
      }
    ],
    "targets": [
      {
        "href": "/services/translation/api/v2/languages/ru",
        "name": "ru",
        "title": "Russian"
      },
      {
        "href": "/services/translation/api/v2/languages/en",


The `sources` and `targets` arrays are languages that can be on the source (target) side of the selected language ('cs' in the example). Eg. you can translate from pl (pl is in sources) to cs and you can also translate from cs to pl (pl is in targets). Also note that Hindi is only in the targets array, ie. you can translate from Czech to Hindi but not the other way around.

### Translation
Let's say you somehow choose a valid source (cs) & target (en) language names. Use those to interpolate the `_links.translate.href` (`_links.translate.templated` is `true` you know you can interpolate that), that will give you an url to which you `POST` your input text

In [10]:
src='cs'
tgt='en'
text_to_translate = 'Pokus.'
translate_url = host_url + languages.json()["_links"]["translate"]["href"]
from uritemplate import expand
interpolated_translate = expand(translate_url, src=src, tgt=tgt)
print(requests.post(interpolated_translate, {'input_text': text_to_translate}).content.decode('UTF-8'))

Attempt.



### Models endpoint
Just for completeness the `models` endpoint response is quite similar; the most important part for translation is the `supports` object on individual models. The object is a mapping where keys are source languages (codes) and values are lists of languages that this model can translate from/to.
```
        "supports": {
          "en": [
            "cs"
          ]
        }
```
means the model supports translation from en to cs. At one point there were multilingual models; one that supported translation from English to seven other languages and from each of the indvidual languages back to english.

In [11]:
models_url = host_url + r.json()['_links']['models']['href'] 
models = requests.get(models_url, headers=headers)
print(json.dumps(models.json(), indent=2))

{
  "_links": {
    "item": [
      {
        "href": "/services/translation/api/v2/models/en-cs",
        "name": "en-cs",
        "title": "en-cs (English->Czech (CUBBITT))"
      },
      {
        "href": "/services/translation/api/v2/models/cs-en",
        "name": "cs-en",
        "title": "cs-en (Czech->English (CUBBITT))"
      },
      {
        "href": "/services/translation/api/v2/models/doc-en-cs",
        "name": "doc-en-cs",
        "title": "doc-en-cs (English->Czech (CUBBITT document level))"
      },
      {
        "href": "/services/translation/api/v2/models/doc-cs-en",
        "name": "doc-cs-en",
        "title": "doc-cs-en (Czech->English (CUBBITT document level))"
      },
      {
        "href": "/services/translation/api/v2/models/en-hi",
        "name": "en-hi",
        "title": "en-hi (English->Hindi)"
      },
      {
        "href": "/services/translation/api/v2/models/en-fr",
        "name": "en-fr",
        "title": "en-fr (English->French (CUBBITT))"
    

If you don't specify the src/tgt params for model, some get chosen for you. This does the right thing if there is only one source and target pair supported

In [12]:
requests.post(host_url + '/services/translation/api/v2/models/cs-en', {'input_text': 'Pokus.'}).text

'An attempt.\n'

### Errors
Errors are communicated mostly via http status codes; as always 4xx are client side errors and 5xx are server side.
Most often you'll see 404 if you try to translate on unsupported language pair (or model). You can also encounter 429 - which means you are sending too many requests per second.
If you are seeing 504 (Gateway timeout), or any other sort of timeout; one thing you might try is to send smaller chunks (one paragraph perhaps) of the input text.
503 generally means the service is overloaded

In [13]:
print("no input")
rsp = requests.post(host_url + '/services/translation/api/v2/languages?src=cs&tgt=cs')
print(rsp.status_code)
print(rsp.content)
print("--------------")
print("now input is there; but trying to translate cs->cs")
rsp = requests.post(host_url + '/services/translation/api/v2/languages?src=cs&tgt=cs', {'input_text': 'Test'})
print(rsp.status_code)
print(rsp.content)
print("--------------")
print("bad content type")
files = {"input_text": ('quick_start_file.txt', open('quick_start_file.txt', 'rb'), 'text/html')}
rsp = requests.post(host_url + '/services/translation/api/v2/languages?src=cs&tgt=en', files=files)
print(rsp.status_code)
print(rsp.content)

no input
400
b'{"message": "No text found in the input_text form/field or in request files"}\n'
--------------
now input is there; but trying to translate cs->cs
404
b'{"message": "Can\'t translate from cs to cs"}\n'
--------------
bad content type
415
b'{"message": "Can only handle text/plain files."}\n'
