Skip to content

Mathoid: Add public and back-end endpoints#339

Merged
d00rman merged 12 commits intowikimedia:masterfrom
d00rman:mathoid/post-routes
Oct 9, 2015
Merged

Mathoid: Add public and back-end endpoints#339
d00rman merged 12 commits intowikimedia:masterfrom
d00rman:mathoid/post-routes

Conversation

@d00rman
Copy link
Copy Markdown
Contributor

@d00rman d00rman commented Sep 17, 2015

This PR adds two public routes:

  • POST /{domain}/media/math/{format}
  • GET /{domain}/media/math/{format}/{hash}

The first allows clients to perform a POST request to ask for a formula to be rendered. It takes the same parameters as Mathoid, namely q (the formula to be rendered) and type (the format or type of the supplied formula), with format (the desired output format) being present in the URI. Only q is mandatory while type defaults to tex if not supplied. Internally, the request is saved using the post_data module. The request hash is therefore calculated and is used to determine if the given formula has already been rendered by Mathoid. If so, the stored version is returned. Otherwise, a request is issued to Mathoid (to its /complete endpoint) and its result is stored. The API returns only the portion (body, headers) relevant to the sought format. In either case, the response contains the x-resource-location header which can be used to perform subsequent, low latency requests to obtain the same render (but possibly in a different format) via the second public endpoint.

Note that the endpoints and their storage are organised in such a way as to be domain-independent. This PR introduces a new, global domain - wikimedia.org - and all Mathoid requests are internally remapped to it. In other words, if a render with the data hash 12345 exists, it can be retrieved equally from all domains, regardless of the initial request's domain.

Also, this PR reorganises slightly the configuration files.

Bug: T102030

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Sep 17, 2015

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Sep 17, 2015

Bike-shedding about window.wm.org @ T103811.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I like formula much better than q, but left it for compatibility.

@physikerwelt
Copy link
Copy Markdown
Member

@d00rman I'm not really sure how to use this.
Currently my understanding of the restified rendering process would be the following:
the math extension sends one post request with the q, format etc paremeters. Gets the response and with the hash ($hash) and delivers the following information to the users browser

  • Inline MathML element
  • Link to a fallback SVG image
  • Link to a fallback PNG image

What would be the actual link to the images:
GET /{domain}/media/math/{hash}
does not provide an option to specify the format (SVG/PNG). Did I miss something?

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Sep 17, 2015

@physikerwelt I agree that what you have described would be the ideal usage. Originally I thought about supplying format = json to Mathoid regardless of the desired render, so that all of them are stored, and one can then subsequently simply retrieve the others. However, there are currently two main obstacles to that:

Field Extraction

The current spec-driven way of hooking up services in RESTBase does not allow us to extract different response fields. When supplied with format = json, Mathoid returns a JSON comprising all of the renders it is able to produce. Consequently, RESTBase would store that as a JSON blob. But, when a specific format is sought, we'd need to extract only that field. This should be solvable on the RESTBase end, though.

Headers

The bigger problem I see are the headers - they variate depending on the actual rendering format (which, of course, makes perfect sense). However, that also means that we would need to keep track of them in two places - Mathoid and RESTBase. I am really not a fan of that idea. Potentially, though, this could be solved if Mathoid were to supply RESTBase with the headers as well for format = json. A possible candidate response could be:

{
  "mml": "...",
  "mml-headers": {
    "content-type": "xml+mathml",
    // other possibly mml-specific headers
  }
  // etc ...
}

Current Work-around

In order to achieve the behaviour you described, you would need to make three separate requests: one for format = mml, the other for format = svg and the third one for format = png (which isn't enabled in production, by the way). Each response contains an x-resource-location header holding the hash to use when making a request to /media/math/{hash}.

@gwicke
Copy link
Copy Markdown
Member

gwicke commented Sep 22, 2015

The current spec-driven way of hooking up services in RESTBase does not allow us to extract different response fields

This surprises me. The tests do access different body parts of the request while templating a new request, so I would think this should work for named responses as well.

The bigger problem I see are the headers

Previously, we addressed a very similar issue by creating a mime-like format for Parsoid pagebundles:

{ 
  svg: {
    headers: { .... },
    body: '<svg>...</svg>'
  },
  png: {
    headers: {...},
    body: '<base64-encoded-binary (yeah..)>'
  },
  mml: {
   headers: {..},
   body: '...'
}

We could recycle this.

What would be the actual link to the images: GET /{domain}/media/math/{hash}

The idea would be that the hash is sufficent to construct a URL for each of the variants. Example:

  • /{domain}/media/math/svg/{hash}
  • /{domain}/media/math/mml/{hash}
  • /{domain}/media/math/png/{hash}

Constructing this URL shouldn't be harder than concatenating two strings, so the POST end point wouldn't necessarily need to return more than the hash.

Returning the MathML content on POST could be doable as a special case. Alternatively, the extension could GET MathML, and POST the spec / re-request the MML only if that 404s.

@physikerwelt
Copy link
Copy Markdown
Member

"Alternatively, the extension could GET MathML, and POST the spec / re-request the MML only if that 404s."
In the current draft, the extension does not know the hash. I think it should POST, and get the MathML and the hash (that was calculated by restbase). The extension would than link to the PNG, SVG images. Those links would only be fetched when the users browsers decides that it needs that information.

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Sep 30, 2015

In the current draft, the extension does not know the hash. I think it should POST, and get the MathML and the hash (that was calculated by restbase). The extension would than link to the PNG, SVG images. Those links would only be fetched when the users browsers decides that it needs that information.

I'l get moving on that probably tomorrow. I'll need to do some changes to Mathoid first, though (most notably, allow it to send back the headers as well for format = json).

So the sequence of events should end up being:

  • the extension sends a POST request with the q and type params (no format)
  • RESTBase asks Mathoid for the same thing, but adds format = json and stores the response
  • the extension gets back the MML in the body and the hash in the header, which can then later be used for subsequent requests to GET the desired format.

@physikerwelt does the extension depend on the exact format returned by Mathoid for format = json ? I'd need to change that (and deploy that ASAP). If it currently does, I'm thinking of adding an alternative endpoint to Mathoid which would mainly serve RESTBase's needs. Thoughts?

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 1, 2015

I'l get moving on that probably tomorrow. I'll need to do some changes to Mathoid first, though (most notably, allow it to send back the headers as well for format = json).

I've added a new output format (complete) that implements this in Mathoid in Gerrit 242847.

@d00rman d00rman force-pushed the mathoid/post-routes branch from 2b46c62 to 90bacf7 Compare October 1, 2015 12:07
@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 1, 2015

This surprises me. The tests do access different body parts of the request while templating a new request, so I would think this should work for named responses as well.

So, I tried to do this:

return:
  status: 200
  headers: '{$$.merge({$$.merge($.mathoid.headers, {"x-resource-location": $.request.headers.x-resource-location})},$.mathoid.body.{$.request.params.format}.headers)}'
  body: '{$.mathoid.body.{$.request.params.format}.body}'

... but I'm getting syntax errors from tassembly. For the headers line it's:

SyntaxError: Expected ":", [ \t\n] or [a-z0-9_$-]i but "." found.

at offset 12, column 13, while for the body line the error is:

SyntaxError: Expected [a-z_$]i but "{" found.

So it seems like extracting fields based on a named param is not going to fly :( Perhaps #347 might help there @gwicke ?

@gwicke
Copy link
Copy Markdown
Member

gwicke commented Oct 2, 2015

@d00rman, try this syntax (untested):

return:
  status: 200
  headers: '{$$.merge($$.merge($.mathoid.headers,{"x-resource-location": $.request.headers.x-resource-location}),$.mathoid.body[$.request.params.format].headers)}'
  body: '{$.mathoid.body[$.request.params.format].body}'

@d00rman d00rman force-pushed the mathoid/post-routes branch from 90bacf7 to 6e052f3 Compare October 2, 2015 17:04
@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 2, 2015

@d00rman, try this syntax (untested):

This did the trick. Thanks a lot @gwicke !

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 2, 2015

Soo, now the checks are failing because all of the pageviews tests from #350 are failing. WTH?

@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 5, 2015

This PR is now ready for review @wikimedia/services && @physikerwelt . I have changed the PR description to reflect the new way of interacting with the API (TL;DR just as @physikerwelt wanted it to be)

@physikerwelt
Copy link
Copy Markdown
Member

@d00rman at a first glance it looks reasonable to me. However, given the long and informative discussion in this github issue, I think it would be good to add some more documentation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #336 and #360 we've added a client-ip check for endpoints that are using saving to post_data module.

So, this endpoint should be protected with a header_match security stanza, but I'm not sure how(if) it's going to work with it (who is POSTing here?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pchelolo the security stanza has been added to the POST public endpoint in spec/media/v1/mathoid.yaml

Marko Obrovac added 6 commits October 8, 2015 10:54
This commit adds two public routes:

- `POST /{domain}/media/math`
- `GET /{domain}/media/math/{hash}`

The first allows clients to perform a POST request to ask for a formula
to be rendered. It takes the same parameters as Mathoid, namely `q` (the
formula to be rendered), `type` (the format or type of the supplied
formula) and `format` (the desired output format). Only `q` is
mandatory, `type` defaults to `tex`, while `format` is presumed to be
`json` if not supplied. Internally, the request is saved using the
*post_data* module. The request hash is therefore calculated and is used
to determine if the given formula has already been rendered by Mathoid
in the sought output format. If so, the stored version is returned.
Otherwise, a request is issued to Mathoid and its result is stored and
returned. In either case, the response contains the `x-resource-location`
header which can be used to perform subsequent, low-latency requests to
obtain the same render via the second public endpoint.

Note that the endpoints and their storage are organised in such a way as
to be domain-independent. This commit introduces a new, global domain -
`window.wikimedia.org` - and all Mathoid requests are internally
remapped to it. In other words, if a render with the data hash `12345`
exists, it can be retrieved equally from all domains, regardless of the
initial request's domain.

Also, this commit reorganises slightly the configuration files.
Global domain's /sys/ spec mistakenly contained a second definition of
the pageviews module, which was causing pageviews tests to fail. This
commit removes it.
@d00rman d00rman force-pushed the mathoid/post-routes branch from 35e608b to 64b96c0 Compare October 8, 2015 09:24
@d00rman
Copy link
Copy Markdown
Contributor Author

d00rman commented Oct 8, 2015

@physikerwelt I have added a better description of the endpoints in the public spec documentation so that users know how to use it.

@physikerwelt
Copy link
Copy Markdown
Member

👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this module to default-sys domain? It's used only by method, which's only in global-sys domain. May be better to inline it there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, we actually can't. Both of them are used by the routes loaded during testing form specs/test.yaml so they have to stay in default-sys

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have config.test.yaml for that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And have these slight differences between the two? I'm voting for sanity here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, it was just a little thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, neither of these modules create any resources on start-up, so the net result is the same

@Pchelolo
Copy link
Copy Markdown
Contributor

Pchelolo commented Oct 8, 2015

LGTM

d00rman pushed a commit that referenced this pull request Oct 9, 2015
Mathoid: Add public and back-end endpoints
@d00rman d00rman merged commit 318d718 into wikimedia:master Oct 9, 2015
@d00rman d00rman deleted the mathoid/post-routes branch October 9, 2015 10:31
physikerwelt pushed a commit to physikerwelt/mathoid-server that referenced this pull request Nov 10, 2015
This patch set adds the 'complete' output format which is equal to the
'json' one except that it also includes the headers for individual types
in the response body as well. The output specification is identical,
except that the 'mml', 'svg' and 'png' fields are now objects containing
the 'body' and 'headers' fields.

Note: this format is needed by RESTBase for
wikimedia/restbase#339

Bug: T102030
Change-Id: I37d45cda1ceb5255a580b3a2f291268c1f270c53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants