Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Add endpoints for source and translation string details. #5

Merged
merged 1 commit into from
Aug 28, 2014

Conversation

davidmason
Copy link
Contributor

Required some changes to how endpoints are specified so that earlier endpoints would take precedence, and so that different query string parameters can be specified.

@alex-sl-eng
Copy link
Member

  • Given in editor its always pairs of source and target, shouldn't the end point always return info for both source and target in single request (projects/p/{projectSlug}/iterations/i/{versionSlug}/r/{encodedDocId})/source?ids={list-of-ids}/{localeId}?
  • Does this returns history of the translation as well?

@davidmason
Copy link
Contributor Author

Given in editor its always pairs of source and target, shouldn't the end point always return info for both source and target in single request (projects/p/{projectSlug}/iterations/i/{versionSlug}/r/{encodedDocId})/source?ids={list-of-ids}/{localeId}?

One use-case for downloading just source is if the editor were loaded before selecting a language, such as if it is deployed as a standalone application. It would also be useful in any case where someone wants to look at source strings.

The use-cases for downloading just translations include changing language, and requesting translations from another language for reference. In both cases the editor already has the source strings so it would be wasteful to make the server look them up from the database and transmit them over the connection.

@davidmason
Copy link
Contributor Author

Does this returns history of the translation as well?

No, just the head revision of the translation. I can imagine plenty of use-cases for wanting the head revision without the history, so it should not be included by default. It would make sense to allow clients to optionally request history, either in a separate request or as part of a trans or source+trans request, so I could add endpoints to include translation history:

  • .../history/{localeId}?ids={list-of-ids}
  • .../trans+history/{localeId}?ids={list-of-ids}
  • .../source+trans+history/{localeId}?ids={list-of-ids}

I will create an issue for adding these - this pull request has plenty in it already, and the endpoints work without history.

We may eventually want source history as well. I'm not sure if we have the data for that, but there are some file formats for which it makes sense (anything that provides an ID for strings). We can design a sensible endpoint for that to make sure we aren't painting ourselves into a corner with the proposed URLs. It could just use something like sourcehistory and be included with + like the others.

@davidmason
Copy link
Contributor Author

shouldn't the end point always return info for both source and target in single request

The editor can just ask for .../source+trans/{localeId}?ids={list-of-ids} when it needs both together.

@carlosmunoz
Copy link
Member

Just happened to look at the endpoints and had a question:
Wouldn't it be better to go with something like
.../textflow/{locale}?ids={list-of-ids}&includeTrans=true&includeHistory=true

I think I like the source+trans+history business, just wondering if that's a new convention for naming your REST endpoints.

@alex-sl-eng
Copy link
Member

@davidmason, so we are allowing the editor to be loaded with no selected locale (with selected document I assume), how is that gonna display on the text area?

@davidmason
Copy link
Contributor Author

@carlosmunoz

Wouldn't it be better to go with something like
.../textflow/{locale}?ids={list-of-ids}&includeTrans=true&includeHistory=true

Either one provides sufficient information to generate the response. I have a few reasons to prefer my proposal to this suggestion:

  • It is obvious what is being requested. Someone looking at .../source?ids=1234,1235 would expect source for a couple of IDs. Similarly .../source+trans/fr?ids=1234,1235 makes it pretty clear to me that it will return combined source and translations for the same IDs. What should someone expect if they see .../textflow/de?ids=1234,1235? Would it just give the source since it lacks an includeTrans parameter? We could discuss sensible defaults, but that discussion (and its reiteration by anyone trying to use the endpoint) can be avoided completely by being explicit.
  • No one needs to know what a "textflow" is. When designing the endpoint I was considering whether to use "textflow", "textunit", "transunit" or something else - none of them are good options because they probably don't mean much to most people. Using "source+trans" avoids the confusing terminology entirely.
  • A locale id is not needed to just get source data.
  • .../source and .../trans/fr are distinct things, it feels somewhat unnatural to have to conceptually combine them to a "textflow" then tease the concepts apart if only one is needed.

There is one aspect of my proposed endpoints that is not really sensible: the whole leading portion of the URL is redundant. Since I opted for the (Long) ID rather than resId, there is enough information to get all the data without knowing the project, version or document. I can only think of two reasons to keep all that other stuff in the URL, and both are weak:

  • In case we want to extend it later to allow the use of resId, which is only unique within a document. This could get messy, and it would probably be better to have different endpoints anyway to prevent confusion between id and resId.
    • In case they are needed for some future security checks if projects are restricted. There is enough information in the numeric ID to infer the project information that would be needed for such checks, so I don't think it stands up.

So basically I should probably be using /rest/source?ids={list-of-ids} rather than /rest/projects/p/{projectSlug}/iterations/i/{versionSlug}/r/{encodedDocId}/source?ids={list-of-ids}.

@alex-sl-eng
Copy link
Member

+1 on /rest/source?ids={list-of-ids}

@davidmason
Copy link
Contributor Author

@aeng

so we are allowing the editor to be loaded with no selected locale (with selected document I assume), how is that gonna display on the text area?

I'm not sure if we are doing this at the moment, but it seems likely we will want to do so in the future. Off the top of my head, I can picture showing the source on the left as it already does, and on the right showing "select a language".
Even if we never actually use the endpoint source-only endpoint, we still need the code to look up the source information, so it would not cost us much. At first we should only be implementing the endpoints that are actually being used by the server anyway - the rest can just stay in the fake server as proposals.

@carlosmunoz
Copy link
Member

So, in terms of /rest/source? vs /rest/projects/p/{projectSlug}/iterations/i/{versionSlug}/r/{encodedDocId}/source?ids={list-of-ids}, it really comes down to the question of business keys vs "storage" keys.

Sure, /rest/source is shorter and seemingly easier, but the alternative version has more meaning to it. I would do it like this instead: /rest/project/{p}/iteration/{i}/{encodedDocId}/source?, but the fact that we don't have a slug for text flows (sources) means we still need a way to identify those resources.

In terms of the editor, since it will probably do paging of some sort, why not use the index in the document to identify it. In the end, the url would be something like:
/rest/project/{p}/iteration/{i}/{encodedDocId}/source?from=0&count=100 (for easy paging)
/rest/project/{p}/iteration/{i}/{encodedDocId}/source?idx=1&idx=2&...&idx=34 (for random access)
/rest/project/{p}/iteration/{i}/{encodedDocId}/source+trans?from=0&count=100 (to get translations as well)

@davidmason
Copy link
Contributor Author

@carlosmunoz

the fact that we don't have a slug for text flows (sources) means we still need a way to identify those resources.

The closest we have is resId, which appears to be an arbitrary string - the lack of constraint on resId leads to 2 problems:

  • The URL to request several text flows could get very long in a document that uses longer IDs - even just the content hash that is used for gettext documents is pretty long.
  • Since resId can contain any character, it would need very pessimistic escaping that would further bloat the URL, and require every client to escape consistently - this strikes me as making people's lives worse, so I would like to have significant benefit to balance that out.

why not use the index in the document to identify it

That would be a reasonable option - good for URL length (shorter on average than database ID) and for readability, and more semantically meaningful than the database ID. It is also expressive enough to fetch data about any individual row or set of rows from the document.

It does add one limitation that does not exist when using database IDs: if the editor condenses multiple small documents into a single view (as has been requested by users, and presumably it one day will do so), the editor would not be able to request text flow data in a range that crosses a document boundary. This would be most noticeable when loading a set of single-string documents: if a page is 20 rows, it would require 20 requests to fetch the first page, whereas using database ID would require only one'*.

'* I would probably use 3 though: fetch the first row on its own immediately, then the next 3 or so, then the rest of the page, the goal being to allow the user to start editing immediately without waiting for the whole page to load.

@davidmason
Copy link
Contributor Author

@carlosmunoz I might as well add both database-ID and document-row endpoints to the fake server, so we can try both in the editor and evaluate them with a bit of practical experience. I'll create an issue to track the document-row endpoint creation. I will keep /r/ in the URL since it is necessary to keep it consistent with other URLs (without it, a document named "r" would be indistinguishable from a request for all documents metadata).

@carlosmunoz
Copy link
Member

This would be most noticeable when loading a set of single-string documents

That shouldn't be a very common case.

whereas using database ID would require only one

Maybe splitting those is not such a bad idea. Keeping your requests short and quick is a good way of preventing long lived operations from hogging the system (even if they are reads). some applications restrict the number of elements that can be read (written) in a single operation, maybe we should consider the same.

'* I would probably use 3 though: fetch the first row on its own immediately, then the next 3 or so, then the rest of the page, the goal being to allow the user to start editing immediately without waiting for the whole page to load.

I think that's a very good idea.

I might as well add both database-ID and document-row endpoints to the fake server

Sounds good

@davidmason
Copy link
Contributor Author

That shouldn't be a very common case.

It is common enough that translators to have suggested combining documents several times. I think it is worth considering. There is a chance that we would decide that designs that cater well to the tiny-documents use-case require too much of a trade-off in other areas, but if we end up with a couple of options that are equally worthy in other areas, tiny-document handling could be the decider.

some text flows.
- may respond with 403 (Forbidden) if the list of ids is too long. The
limit should be set at a sufficiently high number as to prevent the editor
getting a 403 response in normal operation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some applications restrict the number of elements that can be read (written) in a single operation, maybe we should consider the same.

I agree completely. We can figure out a sensible limit once we see what the editor tends to use normally. I suspect it will also force better async design in the editor, by making us think about breaking up requests that would be too large.

@davidmason davidmason added bug and removed bug labels Aug 28, 2014
@alex-sl-eng
Copy link
Member

👍

This adds 3 endpoints: source strings, translation strings, and both.
The endpoints use query parameter 'ids' to request a specific set of
translation units using the textflow numeric id (comma-separated list).

This also replaces the simple list of path strings with a list of thunks,
allowing far more flexibility in specifying paths, particularly sets
of similar paths that would require a lot of repetition.
@davidmason davidmason merged commit 332abc0 into master Aug 28, 2014
@davidmason davidmason deleted the add-tu-detail-endpoints branch August 28, 2014 05:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants