-
Notifications
You must be signed in to change notification settings - Fork 7
Tatoeba API specification 2
The Tatoeba API uses JSON-RPC 2.0 as its protocol: http://www.jsonrpc.org/specification
Each request and response must contain a version field which indicates which version of the given method is wanted:
... , "params":{"ver":1, ...},...
Versioning the methods helps to provide greater flexibility to the development and overall functionality of the API. Versioning strings are simply "1", "2", "3", etc. and they simply qualify the method name. The server and client implement method versioning (and naming) in each its own way.
Whenever a processed query generates a long list of results, not all the results are sent back to the client. The client has to specify the required range of the results. If no range is specified, an error is returned.
The range is specified by providing the startIndex and count:"page" : [startIndex, count];
, the item with the startIndex has to be included in the returned array.
"page" : [0, 1]; // this corresponds to the first 2 items of the search results.
"page" : [30, 10]; // this corresponds to the results 30-39.
"page" : [25, 3]; // this corresponds to the results 25, 26 and 27.
Client is solely responsible for implementing the pagination logic, the server only knows how to provide a specified range of results.
The language encoding is ISO 639-3, which is 3 lowercase ASCII letters. The list is long so it was put in a separate page, but here is a sample:
- "ara" => "Arabic
- "eng" => "English"
- "jpn" => "Japanese"
- "fra" => "French"
A complete JSON-RPC 2.0 request object includes several mandatory fields:
{
"jsonrpc" : "2.0",
"id" : 123, // int
"method" : "search",
"params" : {
"version" : 1
"query": "honger",
"from" : "nld",
"to" : "eng",
"page" : [0,15],
"options" : 0x1 | 0x4,
}
}
The "jsonrcp", "id", and "method" fields are the mandatory part, the "header". The "id" field is to correlate the request with the response. The "params" field is where we store the data, the "message". For the rest of this document, the mandatory fields will be omitted and only the "params" field will be illustrated.
For transmission the messages will have to be minified. Each method handles its own minification.
The following message:
"query": "honger",
"from" : "nld",
"to" : "eng",
"page" : [0,15],
"options" : 0x1,
"ver" : 1
Can be minified to this:
"q":"honger","f":"nld","t":"eng","p":[0,15],"o":0x1,"v":1
Two steps to minify:
- use the short aliases for the fields
- remove all white-space characters.
Unless otherwise noted, all "id" fields here exactly match the id fields currently registered in the Tatoeba database. "user_id" and "username" is returned with each sentence/comment/message so that the client has only enough information to display the username (the "owner") and construct a link to the user's profile.
Please pay special attention to what options are available.
A search for sentences optionally including their translations. Basically the same thing as the search bar at the top of the page on tatoeba.org
"version" : 1
"query": "honger", // string The query
"from" : "nld", // string The source language
"to" : "eng", // string The target language
"page" : [0,15],
"options" : 0x1 | 0x2 | 0x4
The "options" field indicates if translations and/or comments are requested.:
0x1 = include sentence meta
0x2 = direct translations (limited to 5)
0x4 = indirect translations (limited to 5)
The default value of "options" will be 0x1. The number of direct, indirect translations and comments returned are limited by default (translations aren't paginated). The user will have to explicitly request more (eg by clicking a "see more translations" button). Sentence meta are the fields "tags", "audio", "user_id", and "username", and "comments".
This method can be used in the following way:
- Call the method with all options and display the results
- Display a "view more direct translations" widget
- This widget calls
getSentenceDetails()
with option 0x1 and passes the id's of the translations you want - Use these results to display more translations
- Likewise with comments
Here is a sample response which includes full options. A separate request must made to retrieve the comments belonging to the sentence (using the comment id's returned in the meta).
"version" : 1,
"total" : 214,
"sentences" : [
{
"id" : 123,
"text" : "Ik heb honger.",
"lang" : "nld",
"tags" : [45, 24, 234, 2434],
"audio" : 0,
"user_id" : 789,
"username" : "snape", // current owner of the sentence (not the same as author)
"comments" : [342, 352, 2213],
"direct": [234, 345, 43], // these fields can get up to 30 entries long
"indirect" : [678, 343, 5]
},
{
"id" : 234, // first direct
"text" : "I am hungry.",
"lang" : "eng",
"tags" : [45, 24, 234, 2434],
"audio" : 1,
"user_id" : 123,
"username" : "george"
},
...
{
"id" : 678, // first indirect
"text" : "I want to eat.",
"lang": "eng",
"tags" : [45, 24, 234, 2434],
"audio" : 1,
"user_id" : 345,
"username" : "ballface69"
},
...
,
{
// next sentence
}
]
Request details for a single or many sentences, by id.
"version" : 1,
"id": [8341, 342, 5252]
"options" : 0x1 | 0x2 | 0x4
The following options are supported:
0x1 = no translations or comments
0x2 = direct translations (limited to 5)
0x4 = indirect translations (limited to 5)
0x8 = comments (limited to 8)
The default value of "options" will be to include direct and indirect translations. Use 0x1 if retrieving sentences as translations.
"version" : 1
"sentence" : [
{
"id" : 8341,
"text" : "Ik heb honger.",
"lang" : "nld",
"tags" : [34, 56],
"audio" : 1,
"user_id" : 123,
"username" : "ronnal42",
"created" : "2013-04-15 01:12:34",
"modified" : "2013-06-01 07:14:01",
"tags" : [342,23,423],
"direct": [985, 34232, 34224],
"indirect" : [278, 8676, 3242]
},
{
"id" : 985,
"text" : "I am hungry.",
"lang" : "eng",
"tags" : [34, 56],
"audio" : 0,
"user_id" : 123,
"username" : "hermonie granger"
},
...
{
"id" : 278,
"text" : "I am hungry",
"lang": "eng",
"tags" : [34, 56],
"audio" : 0,
"user_id" : 123,
"username" : "Harry Potter"
},
...
],
"comments" : [
{
"id" : 134,
"sentence_id" : 5352,
"user_id" : 1,
"username" : "albus dumbledor",
"created" : "2011-12-20 13:01:43",
"modified" : "2011-12-20 13:01:43",
"text" : "Yeah. Indeed!"
},
...
]
Fetch a list of comments specified by their id's
"version" : 1,
"id" : [32, 5426, 21]
"options" : 0x1 | 0x2
"version" : 1 ,
"comments" : [
{
"id" : 32,
"sentence_id" : 5352,
"lang" : "en",
"text" : "Skyrim belongs to the Nords!",
"user_id" : 1,
"username" : "lydia"
"created" : "2011-12-20 13:01:43",
"modified" : "2011-12-20 13:01:43"
},
...
]
Get a user profile by id
"version" : 1
"id": 809
"version" : 1
"user" : {
"id" : 809,
"group_id" : 2,
"username" : "bob69",
"name" : "Bob Smith",
"lang" : "jp",
"country" : "Japan",
"since" : "2011-12-20 13:01:43",
"last_active" : "2013-01-01 10:09:04",
"desc" : "My name is Bob Smith. I work at Microsoft.",
"birthday" : "1988-06-15",
"homepage" : "http://www.celebpics.com",
"img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
"send_notifications" : 0,
"level" : 1
}
To retrieve a list of users by their id's. Call getUserProfile()
if you want to display a users profile. This is mostly used for the members view. See the options below.
"version" : 1,
"id" : 34 | [34,542,123], // a single int or an array
"page" : [0, 10],
"options" : 0x1 | 0x2
The default option is 0x1, which is intended for the members view where the members need to be ordered by group_id (admin, corpus maintainer, advanced contributer, etc. see http://www.tatoeba.org/users/all). Set to 0x2 if you want to avoid this ordering and get users by their id's.
"version" : 1
"users" : [
{
"id" : 34,
"group_id" : 2,
"username" : "bob69",
"since" : "2011-12-20 13:01:43",
"img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
},
{
"id" : 542,
"group_id" : 2,
"username" : "big_chuck",
"since" : "2011-12-20 13:01:43",
"img" : "http://www.tatoeba.org/img/usrs/232.jpg",
},
...
]
For searching users by name. This method will return a list of users with similar names so it won't return enough information to display a user profile. After you find the right user, call getUserProfile()
to display the profile.
"version" : 1,
"query" : "bob smith" // username
"page" : [0, 10],
Same response as getUsers()
"users" : [
{
"id" : 34,
"group_id" : 2,
"username" : "bob smith",
"since" : "2011-12-20 13:01:43",
"img" : "http://www.tatoeba.org/img/usrs/465465.jpg",
},
{
"id" : 542,
"group_id" : 2,
"username" : "bobby smith",
"since" : "2011-12-20 13:01:43",
"img" : "http://www.tatoeba.org/img/usrs/232.jpg",
},
...
]
Get recent messages from wall. Use this method to display the most recent messages on the wall view. For displaying a specific thread (a wall message and all its replies) use fetchWallThread()
.
"version" : 1
This method is not paginated. Since this method is unique to one view, it will always return 8 wall posts with up to 5 replies each. If a wall post has more than 5 replies call fetchWallReplies()
, which does paginate the replies. Does not return replies to replies.
Note that each post can be a reply and itself have replies, the message structure (shown below) does not indicate the reply structure. The length of the "replies" field indicates how many replies there are.
"wallPosts" : [
{
"id" : 00001,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "So how is the work on the API going?",
"replies" : [42342, 3324, 42422, 543534]
},
{
"id" : 42342,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "Good.",
"replies" : [342, 531]
},
...
]
Get a wall message by ID with up to 10 replies. If there are more than 10 replies, call fetchWallReplies()
.
"version" : 1,
"id" : 3432
"version" : 1,
"wallPosts" : [
{
"id" : 3432,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "The US government spies on the entire world.",
"replies" : [3324, 42422, 543534]
},
{
"id" : 3324,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "Yes, but massive unprecedented surveillance keeps people safe.",
"replies" : [342, 531]
},
...
]
For getting replies to a wall post. This method is paginated so consecutive calls may be required to get all the replies for a given wall post. This method returns only replies, not replies to replies. If a reply has a reply, the client needs to retrieve that reply.
"version" = 1,
"wallPost_id" = 34234 // id of wall post
"page" = [0,43] // paginate the replies
A paginated list of wall replies will be returned.
"version" : 1,
"wallPosts" : [
{
"id" : 3432,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "The US government spies on the entire world.",
"replies" : [3324, 42422, 543534]
},
{
"id" : 3324,
"user_id" : 1,
"username" : "sarah",
"created" : "2011-12-21 15:06:21",
"modified" : "2013-01-15 09:04:30",
"text" : "Yes, but massive unprecedented surveillance keeps people safe.",
"replies" : [342, 531]
},
...
]
A typical error packet in terms of JSON-RPC looks like this:
"id": "1",
"jsonrpc": "2.0",
"error": {
"code": -32601,
"message": "Method not found"
}
The fields "jsonrpc", "error" and "id" are considered to be the "header". For the sake of simplicity the following error packet descriptions will only contain the content of the "error" field.
Hence:
"code": -32601,
"message": "Method not found"
The proposed range of the error codes for the Tatoeba project is -1000..-5000. For the details on error representation see "5.1 Error object" at JSON-RPC 2.0 Specification.
The sentence with the given ID could not be found.
"code": -1010,
"message": "Sentence not found"
The method of the provided version was not found. The field "incorrect_ver" reflects the wrong version, requested by user.
"code": -1020,
"message": "Incorrect method version",
"incorrect_ver": 914
The provided language code was not recognized.
"code": -1030,
"message": "Incorrect language"
Two possible reasons for this error:
- No results range has been specified (See "pagination").
- Invalid range has been specified.
"code": -1040,
"message": "No range or wrong range was requested."
This section is meant for notes on optimization of the protocol.