Skip to content

Set up citoid proxy endpoint#699

Merged
d00rman merged 5 commits into
wikimedia:masterfrom
Pchelolo:citoid
Dec 2, 2016
Merged

Set up citoid proxy endpoint#699
d00rman merged 5 commits into
wikimedia:masterfrom
Pchelolo:citoid

Conversation

@Pchelolo
Copy link
Copy Markdown
Contributor

As the first step and the beginning of the discussion, set up a simple proxying Citoid endpoint in RESTBase.

Bug: https://phabricator.wikimedia.org/T108646
cc @wikimedia/services

@mvolz
Copy link
Copy Markdown
Collaborator

mvolz commented Oct 28, 2016

One possible hiccup that I'm not sure affects this change in a bug upstream in citoid master: bibtex is currently getting returned as a Buffer for some reason and not plain text. In tests, I get Buffer from bibtex requests. Requests behave fine with curl. The browser doesn't seem to handle it well. https://phabricator.wikimedia.org/T115271

Also, this seems to be missing a query param, or does this all go into /{query}/ somehow? There's an additional param called basefields, documented here: https://www.mediawiki.org/wiki/Citoid/API#Arguments

@Pchelolo
Copy link
Copy Markdown
Contributor Author

@mvolz Just tested the buffer issue, it seems to work fine. Also added the basefields parameter.

Also, fyi, in the current version of the patch nothing is stored in RESTBase, the requests are just proxied.

@gwicke
Copy link
Copy Markdown
Member

gwicke commented Oct 28, 2016

I would hold off on adding the baseFields parameter until we understand how it's used, and whether it could be folded into format. It is essentially requesting a different format after all.

@mvolz
Copy link
Copy Markdown
Collaborator

mvolz commented Oct 31, 2016

Yeah, actually we could just roll it into format; basefields is only a valid param for mediawiki format, so we could proxy mediawikibasefields into basefields = 1 and then just mediawiki to leave off basefields param.

@gwicke
Copy link
Copy Markdown
Member

gwicke commented Oct 31, 2016

Naming ideas for the basefields=1 format: mediawiki_extended, mediawiki_complete

@mvolz
Copy link
Copy Markdown
Collaborator

mvolz commented Oct 31, 2016

The addition of the base fields is meant to be temporary for backwards compatibility issues. In the future, it should be only base fields replacing the type specific fields. There's not really a completeness element to this.

Comment thread v1/citoid.yaml Outdated
method: get
uri: '{{options.host}}/api?format={format}&search={query}'
headers:
accept-language: '{{accept-language}}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user doesn't provide the header, we should default to the assumed language based on the domain (en, fr, etc)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's not super easy actually.. For simple stuff like en of fr that's trivial, but there're a whole bunch of exceptions, like simple or mediawiki.org.

So ideally we would want to fetch the wiki language from the MediaWiki SiteInfo api general.lang property - it won't be a huge overhead since all of the siteInfo requests are cached in memory, so we normally have all of the config locally. However, siteInfo fetch returns a promise, and we don't support promises in the functions we call from the template. So, I will need to add support for promises there before we can proceed with this feature.

Another option seem a bit hackier but way simpler to me - create a /sys endpoint in RESTBase that would expose the siteInfo by domain, and then fetch it in a step prior to the execution of the actual request and transclude the default from there.

I'm on the edge here - adding promise support is hard and it might have perfomance implications on the Template code which would affect how fast everything works, adding a /sys endpoint is quite hacky, but certainly easier.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the other option hackier? I think an endpoint like this might be useful for other services, too. I doubt it would need all of siteInfo. If we start with the minimal properties required that would be enough for a start. More could be added later if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berndsi That would be an internal RESTBase-only endpoint, we have no intension to expose it publicly or to other services (right now, we may change the opinion if that feels useful)

Why is that more hacky is because normally client would set the accept-language header, so the information obtained from that endpoint would be useful only on a small percentage of requests, but our current tech powering the request templates would require us to fetch it regardless of whether the accept-language was provided or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for exposing the site info via /sys. Agreed that it's a bit hacky, but it will certainly be useful and much easier than the alternative.

@d00rman
Copy link
Copy Markdown
Contributor

d00rman commented Nov 2, 2016

Naming ideas for the basefields=1 format: mediawiki_extended, mediawiki_complete

Perhaps mw-complete or mw-extended - shorter and nicer. In any case, +1 on putting basefields as a different format

@gwicke
Copy link
Copy Markdown
Member

gwicke commented Nov 2, 2016

Another option: mediawiki++. Just kidding.

I have to say that to an outsider, the purposes of all these very similar format options appear rather vague. The docs really need to explain why you would use format a, b, or c, and what the trade-offs are. If we struggle to find a strong reason for using one of those formats over the others, then that could be a strong hint for deprecating or even outright removing it. Less is often more.

@Pchelolo
Copy link
Copy Markdown
Contributor Author

Pchelolo commented Nov 2, 2016

The docs really need to explain why you would use format a, b, or c, and what the trade-offs are.

I don't have any experience with these formats, so some help from @mvolz on the docs side would be much appreciated.

@mvolz
Copy link
Copy Markdown
Collaborator

mvolz commented Nov 2, 2016

There is some documentation here:
https://www.mediawiki.org/wiki/Citoid/API#Requests

Zotero and bibtex are standard formats used by other people and software.
mediawiki, which as you can tell by its name, is designed specifically for
mediawiki, to be used with templateData. The VE team decided together (Me,
jdforrester, trevor et al. ) on a format that would work given the limits
of TemplateData. We would have just used the Zotero format directly except
it had structural properties which made using it with TD difficult.

Mediawiki uses the same field names as Zotero for convenience. Zotero has
"basefields" which are basically super field names for fields and are in
parentheses in the typemap list. For example, if you look at
https://aurimasv.github.io/z2csl/typeMap.xml#map-bookSection, the basefield
for 'bookTitle' is 'publicationTitle'. The basefields parameter was
intended to replace the field with the parenthesis field name where
present. However, currently for backwards compatibility, it merely includes
them as well without replacing them.

MWDeprecated was a very early format that we discarded once we made changes
to the TD extension. I do know some user gadgets were made using it and I
don't know if all of them switched over. We should definitely have a plan
to eliminate this entirely.

There is actually an even larger set of formats to export from would be
trivial for us to allow, see:
https://www.zotero.org/support/dev/data_formats.

On Wed, Nov 2, 2016 at 6:03 PM, Petr Pchelko notifications@github.com
wrote:

The docs really need to explain why you would use format a, b, or c, and
what the trade-offs are.

I don't have any experience with these formats, so some help from @mvolz
https://github.com/mvolz on the docs side would be much appreciated.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#699 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAVbQNA3kVwiPfsyyR4LG_phU7qYepiLks5q6NB1gaJpZM4Ki6ig
.

@Pchelolo
Copy link
Copy Markdown
Contributor Author

Pchelolo commented Nov 4, 2016

I've updated the PR trying to resolve all the comments. Because the mediawiki-baseields format is rewritten to the format=mediawiki&basefields=true we need a JS module for it. Later we can update Citoid to allow the mediawiki-basefields format, then we'd be able to get rid of the JS module.

@d00rman
Copy link
Copy Markdown
Contributor

d00rman commented Dec 2, 2016

Prod config patchset is here.

@d00rman d00rman merged commit d1aef9e into wikimedia:master Dec 2, 2016
@d00rman d00rman added the API label Dec 2, 2016
Comment thread v1/citoid.yaml
get:
tags:
- Citation
summary: Get citation data given an article identifyer.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identifier is misspelled :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants