Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect type from file content #73

Merged
merged 3 commits into from
Nov 28, 2018
Merged

Detect type from file content #73

merged 3 commits into from
Nov 28, 2018

Conversation

justisb
Copy link
Contributor

@justisb justisb commented Nov 28, 2018

Currently the content type/file extension is extracted from the data url provided by the user, rather than determined by the system based on the actual decoded content. This is not ideal since the user could maliciously provide the wrong type, or because the user's system may unwittingly provide an incorrect or generic type.

This PR adds an additional library to determine the actual MIME type and extension from the decoded content, and only falls back to the user-provided value if it is unable to be determined.

A specific example- using FileReader.readAsDataURL client-side on an OTF file on a Mac returns a generic application/octet-stream type, rather than a font type. So then when the data url string is passed to an uploader with a content_type_whitelist only allowing font types, it would incorrectly deny the file due to the invalid user-provided content type value. This problem may be part of issue #40.

Even in the gem's own specs, the adapter_spec and base64_string_io_spec tests were using a PNG image incorrectly labeled as a JPEG image, as well as several other incorrect fixture types. I updated the specs to use a small actual file for each previously tested file type, which are now correctly determined by the content.

Also included is a fix to prefer the last-added MIME::Type, which should be the user's customized type, if one has been provided. I added a MIME::Type override to the test setup to simulate a user changing a preferred file extension, so that is properly tested now as well.

Current workarounds for the main issue include skipping CarrierWave's validations and using another file validator to determine the type after the file is parsed by the uploader, but preferably it would be handled inside the gem so that CarrierWave's built-in whitelists can be used.

@y9v
Copy link
Owner

y9v commented Nov 28, 2018

@justisb thank you for this great pull request, I will release a new gem version tomorrow

@y9v y9v merged commit 918abf5 into y9v:master Nov 28, 2018
@y9v
Copy link
Owner

y9v commented Nov 29, 2018

Released version 2.8.0

@AlexandreK38
Copy link

Hi!
I updated today from 2.7 to 2.8 but I have an issue due to this pull request.
I am sending a custom mime type, let says "application/xyz", (with .xzy file extension) which is basically a zip file and register this type as explained in the doc.
Now, as the mime type is recognized and not read from the bytes, Carrierwave recognizes it directly as "application/zip" and no more my custom content, and as a consequence the content_type_whitelist doesn't pass (requiring "application/xyz").
Is there a way to still achieve that?
Thanks

@justisb
Copy link
Contributor Author

justisb commented Dec 24, 2018

You don't want to rely on the content type provided by the user, or even trust their file extension, as they could maliciously set the wrong type. I would probably take a step back and see if it's necessary that you need to implement a custom MIME type.

That said, you could add your own detection to MimeMagic (which is now used for file type detection) via their docs (https://www.rubydoc.info/github/minad/mimemagic/MimeMagic#add-class_method).

This gem could potentially expose its own "add" method that would add the custom types to both the mime-types and mimemagic dependencies, if someone has time to put together that PR.

@AlexandreK38
Copy link

Thanks for the answer!
To explain a bit the context, the content type sent by the user is actually from an iOS client, created by myself too.
As it's a file that is public and I didn't want to expose that it is a zip archive, I used my own content type.
Thanks to you, I managed to use MimeMagic.add with "magic" depending on my archive files, I think it was the best solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants