Skip to content

Tatoeba API specification 2

hjpotter92 edited this page Mar 13, 2014 · 95 revisions

Introduction

The Tatoeba API uses JSON-RPC 2.0 as its protocol: http://www.jsonrpc.org/specification

API and protocol versioning

Each request and response must contain a version field which indicates which version of the given method is wanted: ... , "params":{"ver":1, ...},... Versioning the methods helps to provide greater flexibility to the development and overall functionality of the API. Versioning strings are simply "1", "2", "3", etc. and they simply qualify the method name. The server and client implement method versioning (and naming) in each its own way.

Pagination

Whenever a processed query generates a long list of results, not all the results are sent back to the client. The client has to specify the required range of the results. If no range is specified, an error is returned.

The range is specified by providing the startIndex and count:"page" : [startIndex, count];, the item with the startIndex has to be included in the returned array.

"page" : [0, 1];   // this corresponds to the first 2 items of the search results.
"page" : [30, 10]; // this corresponds to the results 30-39.
"page" : [25, 3];  // this corresponds to the results 25, 26 and 27.

Client is solely responsible for implementing the pagination logic, the server only knows how to provide a specified range of results.

Language codes

The language encoding is ISO 639-3, which is 3 lowercase ASCII letters. The list is long so it was put in a separate page, but here is a sample:

  • "ara" => "Arabic
  • "eng" => "English"
  • "jpn" => "Japanese"
  • "fra" => "French"

JSON-RPC header

A complete JSON-RPC 2.0 request object includes several mandatory fields:

{  
    "jsonrpc" : "2.0",  
    "id" : 123,                // int
    "method" : "search",   
    "params" : {
        "version" : 1 
        "query": "honger",       
        "from" : "nld",          
        "to" : "eng",            
        "page" : [0,15],        
        "options" : 0x1 | 0x4, 
    }  
}

The "jsonrcp", "id", and "method" fields are the mandatory part, the "header". The "id" field is to correlate the request with the response. The "params" field is where we store the data, the "message". For the rest of this document, the mandatory fields will be omitted and only the "params" field will be illustrated.

Minification

For transmission the messages will have to be minified. Each method handles its own minification.

The following message:

    "query": "honger",
    "from" : "nld",          
    "to" : "eng",            
    "page" : [0,15],        
    "options" : 0x1, 
    "ver" : 1 

Can be minified to this:

"q":"honger","f":"nld","t":"eng","p":[0,15],"o":0x1,"v":1

Two steps to minify:

  1. use the short aliases for the fields
  2. remove all white-space characters.

Method Reference

Unless otherwise noted, all "id" fields here exactly match the id fields currently registered in the Tatoeba database. "user_id" and "username" is returned with each sentence/comment/message so that the client has only enough information to display the username (the "owner") and construct a link to the user's profile.

Please pay special attention to what options are available.

1) search()

A search for sentences optionally including their translations. Basically the same thing as the search bar at the top of the page on tatoeba.org

request

    "version" : 1
    "query": "honger",       // string  The query
    "from" : "nld",          // string  The source language 
    "to" : "eng",            // string  The target language
    "page" : [0,15],
    "options" : 0x1 | 0x2 | 0x4

The "options" field indicates if translations and/or comments are requested.:

0x1 = include sentence meta
0x2 = direct translations (limited to 5)
0x4 = indirect translations (limited to 5)

The default value of "options" will be 0x1. The number of direct, indirect translations and comments returned are limited by default (translations aren't paginated). The user will have to explicitly request more (eg by clicking a "see more translations" button). Sentence meta are the fields "tags", "audio", "user_id", and "username", and "comments".

This method can be used in the following way:

  1. Call the method with all options and display the results
  2. Display a "view more direct translations" widget
  3. This widget calls getSentenceDetails() with option 0x1 and passes the id's of the translations you want
  4. Use these results to display more translations
  5. Likewise with comments

response

Here is a sample response which includes full options. A separate request must made to retrieve the comments belonging to the sentence (using the comment id's returned in the meta).

    "version" : 1,
    "total" : 214,
    "sentences" : [
        {
            "id" : 123,
            "text" : "Ik heb honger.",
            "lang" : "nld",
            "tags" : [45, 24, 234, 2434],
            "audio" : 0,
            "user_id" : 789,
            "username" : "snape",         // current owner of the sentence (not the same as author)
            "comments" : [342, 352, 2213],
            "direct": [234, 345, 43],     // these fields can get up to 30 entries long
            "indirect" : [678, 343, 5]
        },
        {
            "id" : 234,                  // first direct
            "text" : "I am hungry.",
            "lang" : "eng",
            "tags" : [45, 24, 234, 2434],    
            "audio" : 1,
            "user_id" : 123,
            "username" : "george"
        },
        ...
        {
            "id" : 678,                 // first indirect
            "text" : "I want to eat.",
            "lang": "eng",
            "tags" : [45, 24, 234, 2434],
            "audio" : 1,
            "user_id" : 345,
            "username" : "ballface69"
        },
        ...
        ,
        {
            // next sentence
        }
    ]

2) getSentenceDetails()

Request details for a single or many sentences, by id.

request

    "version" : 1,
    "id": [8341, 342, 5252]
    "options" : 0x1 | 0x2 | 0x4 

The following options are supported:

0x1 = no translations or comments
0x2 = direct translations (limited to 5)
0x4 = indirect translations (limited to 5)
0x8 = comments (limited to 8)

The default value of "options" will be to include direct and indirect translations. Use 0x1 if retrieving sentences as translations.

response

    "version" : 1
    "sentence" : [
        {
             "id" : 8341,
             "text" : "Ik heb honger.",
             "lang" : "nld",
             "tags" : [34, 56],
             "audio" : 1,
             "user_id" : 123,
             "username" : "ronnal42",
             "created" : "2013-04-15 01:12:34",
             "modified" : "2013-06-01 07:14:01",
             "tags" : [342,23,423],
             "direct": [985, 34232, 34224],
             "indirect" : [278, 8676, 3242]
        },
        {
            "id" : 985,
            "text" : "I am hungry.",
            "lang" : "eng",
            "tags" : [34, 56],
            "audio" : 0,
            "user_id" : 123,
            "username" : "hermonie granger"
        },
        ...
        { 
            "id" : 278,
            "text" : "I am hungry",
            "lang": "eng",
            "tags" : [34, 56],
            "audio" : 0,
            "user_id" : 123,
            "username" : "Harry Potter"
        },
        ...
    ],
    "comments" : [
        {
            "id" : 134,
            "sentence_id" : 5352,
            "user_id" : 1,
            "username" : "albus dumbledor",
            "created" : "2011-12-20 13:01:43",
            "modified" : "2011-12-20 13:01:43",
            "text" : "Yeah. Indeed!"
        },
        ...
    ]

3) getComments()

Fetch a list of comments specified by their id's

request

    "version" : 1,
    "id" : [32, 5426, 21]
    "options" : 0x1 | 0x2

response

    "version" : 1 ,
    "comments" : [
        {
            "id" : 32,
            "sentence_id" : 5352,
            "lang" : "en",
            "text" : "Skyrim belongs to the Nords!",
            "user_id" : 1,
            "username" : "lydia"
            "created" : "2011-12-20 13:01:43",
            "modified" : "2011-12-20 13:01:43"
        },
        ...
    ]

4) getUserProfile()

Get a user profile by id

request

    "version" : 1
    "id": 809

response

    "version" : 1
    "user" : {
        "id" : 809,
        "group_id" : 2,
        "username" : "bob69",
        "name" : "Bob Smith",
        "lang" : "jp",
        "country" : "Japan",
        "since" : "2011-12-20 13:01:43",
        "last_active" : "2013-01-01 10:09:04",
        "desc" : "My name is Bob Smith. I work at Microsoft.",
        "birthday" : "1988-06-15",
        "homepage" : "http://www.celebpics.com",
        "img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
        "send_notifications" : 0,
        "level" : 1
    }

5) getUsers()

To retrieve a list of users by their id's. Call getUserProfile() if you want to display a users profile. This is mostly used for the members view. See the options below.

request

    "version" : 1,
    "id" :  34 | [34,542,123],   // a single int or an array
    "page" : [0, 10],
    "options" : 0x1 | 0x2

The default option is 0x1, which is intended for the members view where the members need to be ordered by group_id (admin, corpus maintainer, advanced contributer, etc. see http://www.tatoeba.org/users/all). Set to 0x2 if you want to avoid this ordering and get users by their id's.

response

    "version" : 1
    "users" : [
        {
            "id" : 34,
            "group_id" : 2,
            "username" : "bob69",
            "since" : "2011-12-20 13:01:43",
            "img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
        },
        {
            "id" : 542,
            "group_id" : 2,
            "username" : "big_chuck",
            "since" : "2011-12-20 13:01:43",
            "img" : "http://www.tatoeba.org/img/usrs/232.jpg",
        },
        ...
    ]

6) searchUsers()

For searching users by name. This method will return a list of users with similar names so it won't return enough information to display a user profile. After you find the right user, call getUserProfile() to display the profile.

request

    "version" : 1,
    "query" : "bob smith"    // username 
    "page" : [0, 10],

response

Same response as getUsers()

    "users" : [
        {
            "id" : 34,
            "group_id" : 2,
            "username" : "bob smith",
            "since" : "2011-12-20 13:01:43",
            "img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
        },
        {
            "id" : 542,
            "group_id" : 2,
            "username" : "bobby smith",
            "since" : "2011-12-20 13:01:43",
            "img" : "http://www.tatoeba.org/img/usrs/232.jpg",
        },
        ...
    ]

7) fetchWall()

Get recent messages from wall. Use this method to display the most recent messages on the wall view. For displaying a specific thread (a wall message and all its replies) use fetchWallThread().

request

    "version" : 1

This method is not paginated. Since this method is unique to one view, it will always return 8 wall posts with up to 5 replies each. If a wall post has more than 5 replies call fetchWallReplies(), which does paginate the replies. Does not return replies to replies.

response

Note that each post can be a reply and itself have replies, the message structure (shown below) does not indicate the reply structure. The length of the "replies" field indicates how many replies there are.

    "wallPosts" : [
        {
            "id" : 00001,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "So how is the work on the API going?",
            "replies" : [42342, 3324, 42422, 543534]
        },
        {
            "id" : 42342,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "Good.",
            "replies" : [342, 531]
        },
        ...
    ]

8) fetchWallThread()

Get a wall message by ID with up to 10 replies. If there are more than 10 replies, call fetchWallReplies().

request

    "version" : 1,
    "id" : 3432

response

    "version" : 1,
    "wallPosts" : [
        {
            "id" : 3432,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "The US government spies on the entire world.",
            "replies" : [3324, 42422, 543534]
        },
        {
            "id" : 3324,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "Yes, but massive unprecedented surveillance keeps people safe.",
            "replies" : [342, 531]
        },
        ...
    ]

9) fetchWallReplies()

For getting replies to a wall post. This method is paginated so consecutive calls may be required to get all the replies for a given wall post. This method returns only replies, not replies to replies. If a reply has a reply, the client needs to retrieve that reply.

request

    "version" = 1,
    "wallPost_id" = 34234   // id of wall post
    "page" = [0,43]         // paginate the replies

response

A paginated list of wall replies will be returned.

    "version" : 1,
    "wallPosts" : [
        {
            "id" : 3432,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "The US government spies on the entire world.",
            "replies" : [3324, 42422, 543534]
        },
        {
            "id" : 3324,
            "user_id" : 1,
            "username" : "sarah",
            "created" : "2011-12-21 15:06:21",
            "modified" : "2013-01-15 09:04:30",
            "text" : "Yes, but massive unprecedented surveillance keeps people safe.",
            "replies" : [342, 531]
        },
        ...
    ]

Error codes and messages

A typical error packet in terms of JSON-RPC looks like this:

    "id": "1",
    "jsonrpc": "2.0", 
    "error": {
       "code": -32601, 
       "message": "Method not found"
    }

The fields "jsonrpc", "error" and "id" are considered to be the "header". For the sake of simplicity the following error packet descriptions will only contain the content of the "error" field.

Hence:

    "code": -32601, 
    "message": "Method not found"

The proposed range of the error codes for the Tatoeba project is -1000..-5000. For the details on error representation see "5.1 Error object" at JSON-RPC 2.0 Specification.

Sentence not found

The sentence with the given ID could not be found.

    "code": -1010, 
    "message": "Sentence not found"

Incorrect method version

The method of the provided version was not found. The field "incorrect_ver" reflects the wrong version, requested by user.

    "code": -1020, 
    "message": "Incorrect method version",
    "incorrect_ver": 914

Incorrect language

The provided language code was not recognized.

    "code": -1030, 
    "message": "Incorrect language"

Results range error

Two possible reasons for this error:

  1. No results range has been specified (See "pagination").
  2. Invalid range has been specified.
    "code": -1040, 
    "message": "No range or wrong range was requested."

Optimization

This section is meant for notes on optimization of the protocol.