-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor document serve view to be agnostic of file storage backends #1420
Comments
This could be done as follows:
|
For 3. Would it be possible to make use of a new |
Sure, that makes sense. (I thought about suggesting modified_date as another field on the documents table, but I dare say the Etag approach is more robust...) |
Hi all - has there been any progress on setting content-disposition to display PDFs in the browser instead of downloading them? I saw the |
I spent some time looking over this yesterday. A few thoughts/questions:
|
Assigning to @robmoorman to review |
OK - so basically, the distinction between
Yep, the method on the model seems like the right place to put the logic for returning one URL or the other depending on how it's configured.
You mean being able to switch storage backends after documents have already been uploaded, without breaking the existing ones? I don't think that's a requirement - I'd say there's no expectation for us to provide any more 'magic' than Django's storage API gives us already.
Afraid I can't quite follow what you're proposing here - is this the suggested behaviour for
I'd be happy with that. Perhaps we can also have a setting like |
@gasman to provide a custom
causes this migration to be created:
Even though I'm not keen on the idea of making our own storage class wrapper and would rather split this (supporting different storage backends for images and files) off into a separate issue if possible. This also feels like something that should be solved in Django rather than in Wagtail. What do you think? |
As another thought, I believe providing a custom storage is already possible via the |
For posterity: I chatted with @gasman on Slack and we agreed it makes sense to defer implementing any ability to customize the file storage backend (beyond what Django gives you) at this time. |
Today I had to override the document model only to change the storage backend. The problem is documents were uploaded to the Changing the configuration so sendfile serves from If |
@jdavid The problem is that changing |
Maybe (draft):
There would be a migration in Wagtail, but then projects could change It doesn't solve the problem to use something else than
|
…/document/doc_id/filename. This is a workaround for a wart in wagtail that insists on serving documents as attachments. It is a mild compromise in that the proported inside wagtail features of counting document references and controlling access to files is circumvented. However, by design, we don't want different access controls to files for different viewers of the IETF website, and we are looking at different ways to gather information like references. See wagtail/wagtail#4359, wagtail/wagtail#1158, and the knot at wagtail/wagtail#1420.
I don't see any mention of a "manual override" for Content-Type, where site owners can specify a mapping that effects bypassing the mimetype guessing, like:
|
@Pomax If we go with the approach described in #1158 (comment) (which is probably a better place to continue this discussion, as the HTTP headers are a somewhat separate detail from serving methods), it'll be possible to override the default I wouldn't be against adding a mapping to the settings (similar to the proposed |
Mostly to ensure people stay in control of what can happen: some folks will want to force specific extensions to "inert" form so that browsers never try to interpret them (or quite the opposite, making it active content) , where others may want to serve a content type that And having that custom document model would be valuable, of course, but I would imagine that most people would make use of a settings.py list if they had that option (not having to create custom files but setting a list means a lower bug and maintenance surface, which is always a big plus). Happy to "redo" this comment in #1158 if that helps move it over to the right place! |
The main part of this (the WAGTAILDOCS_SERVE_METHOD setting) is now completed in #5296, and #1158 has been opened to cover the content-type / content-disposition headers. The remaining points mentioned here are the If-Modified-Since, Content-Length and Etag headers, and I don't think those are important enough to justify keeping this open (PRs still welcome of course...) - so I'm calling this complete. |
For Wagtail 1.0, the document serving view was rewritten, based around the django-sendfile API, to provide various enhancements: support for django-sendfile backends, serving the correct MIME type, and support for If-Modified-Since headers. However, the django-sendfile API assumes the use of a local storage backend (i.e. one that implements the
path
operation), and so we had to implement a fallback (#1417) to restore support for remote storage backends such as boto-s3. As with previous versions of Wagtail, this fallback code involves reading the file content through theopen
operation and serving it through Django, which is particularly inefficient for remote storage.I propose the following new approach for document serving:
Add a
WAGTAILDOCS_FILE_STORAGE
setting, which by default is the same asDEFAULT_FILE_STORAGE
, and use that as the storage backend for thefile
field of the Document model. This makes it possible to set a storage backend for documents independently of other media (e.g. images).Introduce a new configuration setting (proposed name
WAGTAILDOCS_SERVE_METHOD
) to specify the overall top-level behaviour for document serving. This will be one of the following values (names are provisional, and not all will necessarily be implemented in the first instance):document.file.url
(if the backend implements it; if not, we fall back on the URL of the 'serve' view as usual) whenever we want to refer to the document URL, e.g. as the result of expanding a<a linktype="document">
link in rich text. This means that we end up bypassing thedocument_served
signal, and any other application logic that might exist in the serve view (download stats tracking, access restrictions) - this is most appropriate for fully static sites (django-medusa / bakery).document.file.url
(assuming the backend implements it). This is probably the best option for remote storage backends; it means that thedocument_served
signal will still be fired for people following the link, but it's still possible to bypass it if they know the direct URL (which presents a loophole if access restrictions are in use).open
operation (i.e. the current Wagtail 1.0 behaviour). This is also the only option which doesn't require the backend to expose aurl
property; ideally, the site implementer should ensure that the file isn't available through a direct URL, to prevent bypassing the serve view.(Open question: Given that the 'sensible default' for this setting will vary depending on the backend in use, is there a good way of selecting a suitable default value at startup, in the case that
WAGTAILDOCS_FILE_STORAGE
/DEFAULT_FILE_STORAGE
is specified butWAGTAILDOCS_SERVE_METHOD
isn't?)For all methods except 'serve_view', we're now done. For 'serve_view', we continue as follows (note that wherever possible, we're using the Django storages API rather than using direct file access, and catching
NotImplementedError
to account for varying capabilities - this way, all storage backends are following the same code path for the most part):Handling If-Modified-Since: If the backend implements the
modified_time
property, check this and return a 'not modified' response if applicable. (Aside: if we added anupdated_at
field to Document, we could do this without relying on the storage backend - provided we don't care about modifications that are done directly on the filestore, bypassing the database)Set a Content-Type header using mimetypes.guess_type(filename), as our sendfile logic currently does. (We could potentially sniff file content too, either for local storages only, or for all backends if we did it at upload time and stored it in the database - but that's something for another ticket...)
If the backend implements the
size
property, set the Content-Length header. (TODO: confirm that this is indeed optional, as per discussion on Restore ability to serve docs from non-local storage backends #1417 (diff))If the backend implements the
path
property, and aSENDFILE_BACKEND
is specified, hand off to django-sendfile. (Since sendfile is now optional and only activated when this setting is present, this possibly negates the need to bundle our own sendfile in wagtail.utils. On the other hand, if django-sendfile duplicates most of the header handling logic above, we might be better off with our own stripped-down implementation...)Otherwise, serve the file via
StreamingHttpResponse
.TODO: decide how to incorporate #733 (setting content-disposition to display PDFs in the browser) into this. Special case for PDF, or make it configurable?
The text was updated successfully, but these errors were encountered: