Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Don't trust content_type of Http::UploadedFile #869

Closed
wants to merge 7 commits into
from

Conversation

Projects
None yet
4 participants

rfc2822 commented May 12, 2012

Unfortunately, browsers often sent a wrong Content-Type for uploaded files because the file type is not registered on the client system or for other reasons. From the application provider's sight, this information should be only used as fallback when the determination via the file command (like for non-uploaded files) fails. I adapted the UploadedFileAdapater to do it like the FileAdapter, but use the Content-Type as fallback. Tests included.

(I hope this pull request is OK. It should be only +37 -3 lines, but it shows 6 commits because I had to get my fork up-to-date from upstream.)

Please tell me if you can't use this pull request for some reasons and how I can change it, because I'd rather like to use the upstream paperclip in my application than my custom one.

I have also updated the documentation to point out that ATTACHMENT_updated_at is required for hashing (at least, when :updated_at is used, which it is by default).

Thanks for paperclip.

rfc2822 commented May 12, 2012

test/fixtures/unknown.type contains random data.

Contributor

sikachu commented May 12, 2012

Hmm ... not sure if this is the right approach we should take. See on #858 that user actually want we to honor the content type from the browser instead.

@jyurek what do you think?

rfc2822 commented May 12, 2012

In my case, I have an attachment that should be validated to be an .xcfbz2 (compressed GIMP graphics) file. Mozilla/Linux sends application/octet-stream although the file is either application/bzip2 or image/x-compressed-xcf, as you like. Other browers may send other values (I especially noticed Internet Explorer to send various wrong Content-Type values). So, I suggest that the file type is checked on server-side. Personally, I like the file approach more than MIME::Types, but I have taken the way you chose for FileAdapter and adapted it for UploadedFileAdapter.

Contributor

phene commented May 14, 2012

I'm +1 for not trusting uploaded file's content_type. Because of this, I have built in my own override of the content_type checker/override.

One compromise we can make, though, is by creating an API for defining where the content type comes from, and which order we check our content type matchers. I, for one, really abhor the MIME::Types collection and shelling out to the file command. For this reason, I override all of the content_type matchers with my own.

Instead, it would be nice to define an abstract ContentTypeMatcher, and specify, at the paperclip level, the order in which they are checked for a match. For example, we could have an option to specify the ordering of:

ProvidedMatcher
MimeCollectionMatcher
FileCommandMatcher

and I could define my own FileMagicMatcher, using the native file magic gem, and place it in front of the others.

So I don't support this particular implementation, but I would like to see (and will likely submit) a refactor that supports this sort of content type negotiation.

As a side note, I wish paperclip made it easier to replace various components that shell out commands with components that use native bindings. I've had to patch Geometry.from_file and create my own thumbnail processing just to use RMagick.

rfc2822 commented May 15, 2012

+1 for phene's suggestion. In the meanwhile, I suggest using my patch as a temporary solution.

Owner

mike-burns commented May 17, 2012

@phene See https://github.com/jpmcgrath/paperclip-thumbnailer for a better Paperclip thumbnailing abstraction.

rfc2822 commented Jun 5, 2012

Any news on this? Shall I cancel the pull request and open another one for the documentation issue?

Owner

mike-burns commented Jun 6, 2012

I don't take pull requests with temporary solutions.

We are having a conversation internally on MIME types and their use inside paperclip. I see it as two situations: a) the processor needs to know something about the file, and b) the HTTP headers need to know something about the file. Currently we use the same data for both, but perhaps this does not make sense.

Please weigh in with any thoughts.

Thanks,
-Mike

@rfc2822 rfc2822 closed this Jun 6, 2012

rfc2822 commented Oct 30, 2012

I have made a new pull request #1065 that introduces an option that allows to detect the MIME type by the server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment