-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added PDF endpoint #696
Added PDF endpoint #696
Conversation
Create a PR for swagger-ui to show the download link nicer |
method: get | ||
uri: 'https://pdf-electron.wmflabs.org/pdf?accessKey=secret&url=https://{{domain}}/wiki/{title}' | ||
- return_response: | ||
return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a new step, the return
block can be put directly in the previous block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, you say it's done, but I still see this as being a two-step procedure - one to fetch the pdf, the other to return it.
- get_pdf_from_backend: | ||
request: | ||
method: get | ||
uri: 'https://pdf-electron.wmflabs.org/pdf?accessKey=secret&url=https://{{domain}}/wiki/{title}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better to contact the instance present in BetaCluster (deployment-pdfrender.deployment-prep.eqiad.wmflabs
) since that one will match the one that will be in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it's not exposed publicly right now? Should we assing a domain to it, something like pdfrender-beta.wmflabs.org ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, scratch my earlier comment. We need to parametrise two things here:
- the service's URI (much like we do with Parsoid and all the others)
- the
accessKey
query attribute, as it changes from one environment to the other; it also needs to beencodeURIComponent
-ed since in production it contains unsafe characters
description: The PDF render of an article | ||
schema: | ||
type: file | ||
# TODO: Add an etag somehow. How??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get the latest rev ID from storage, can't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now in the revision-table-access-check-filter we indeed go to the revision table, so we know the latest rev id. However, when we switch to the restriction-table-access-check-filter, we wouldn't know the latest revision any more. Obtaining it would require one more round-trip to Cassandra, which is not ideal. It's still not a huge deal since the PDF generation is very slow and a round-trip to cassandra could be done in parallel.
Other approach is to make MW emit an ETag (or just some custom header with a revision number) and make the service pass it through.
Yet another approach is to apply the new printable styles to Parsoid HTML and render the PDF based on Parsoid HTML and preserve the etag, but that's actually not a solution since Parsoid HTML doesn't have the table of contents and in general the output PDF is much worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doing another round trip to Cassandra is not such a big penalty, given that we don't expect much traffic here and the amount of time it takes to actually render a page in pdf.
Minor nits and comments, otherwise LGTM overall, but we probably don't want to deploy this before the service hits production. |
Ye, for sure don't deploy yet, wait until it's deployed in prod. I've created a PR to avoid loosing the code Also, I wanna continue the discussion about adding an optional? |
@@ -32,6 +32,8 @@ default_project: &default_project | |||
cache_control: s-maxage=86400, max-age=86400 | |||
# 10 days Varnish caching, one day client-side | |||
purged_cache_control: s-maxage=864000, max-age=86400 | |||
# Cache PDF for 5 minutes since it's not purged | |||
pdf_cache_control: s-maxage=300, max-age=300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be 10 minutes probably.
The service can accept a window height and width, so this would be possible. The defaults in prod are set to 1024x768, so yeah, we could discuss both the default options and the need for |
LGTM in general, modulo minor comments (in-lined). |
@d00rman Updated to address your comments. Would require a puppet patch to set up the options before deployment. Additionally I've upgraded our swagger-ui fork with this PR: swagger-api/swagger-ui#2465 so it should whow the correct link now and properly download the file. |
pdf: | ||
# Cache PDF for 5 minutes since it's not purged | ||
cache_control: s-maxage=600, max-age=600 | ||
uri: https://pdf-electron.wmflabs.org/pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have kind of gotten into the habit of putting only the protocol://host:port
part in the parameters, so let's keep the consistency with the other config options. Additionally, if we ever decide to use the other formats supported by the service (png, jpg), we won't need to change the options in the config.
Puppet config change patch is here |
@d00rman Done! |
👍 |
I've already almost lost this code once in my local branches, creating a PR to avoid loosing it again.
We have 2 issues:
cc @wikimedia/services