Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#908) Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #1155

Merged
merged 22 commits into from
Nov 11, 2022

Conversation

monkut
Copy link
Collaborator

@monkut monkut commented Jul 28, 2022

migrate #971 to lastest master

Contribution by @Quidge

Description

This PR re-implements the same functional change that was shown in a PR from the pre-fork Miserlou/zappa repository.

The change in that PR didn't make it into the Zappa 52 release from what I can tell. My company has been using that PR's code (in combination with the flask_compress library) for a couple weeks without issue so I'm opening this PR that copies that code.

Also, when using whitenoise for caching, which provides compression, binary types may include mimetypes, "text/", "application/json":

response.mimetype.startswith("text/")
response.mimetype == "application/json"
Assuming that Content-Encoding will be set (as whitenoise apparently does) this allows compression to be applied to assets which have mimetypes "text/" and "application/json", while allowing for uncompressed "text/" and "application/json" still to be served.

About Content-Encoding:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

This PR adds a new settings, additional_text_mimetypes, to allow the user to define types they want to be considered as text when BINARY_SUPPORT is True.

GitHub Issues

#908

@monkut
Copy link
Collaborator Author

monkut commented Jul 28, 2022

Migrating conversions from previous PR

zeevt commented on 9 Dec 2021

CSV (and plain text) can be in any character encoding, it is not guaranteed that it's ASCII or UTF-8.
If just .decode() works, great.
If there is a header like Content-Type: text/csv; charset=windows-1252 (there usually isn't) then it can be decoded to str without relying on default charset.
But if neither of those is true, then you have bytes that should be base64 encoded.

javulticat commented on 9 Dec 2021

.zeevt that's my understanding as well, so I believe it would be fair to say that this is a potential concern for any text file, correct? I was confused because .Quidge's message implied this was a concern specific to CSVs (which I definitely may have just misinterpreted - sorry!)
If so, would a solution be to sniff the encoding of plain text and use that to decode it using the proper encoding (and, if that fails, fall back on using base64-encoded bytes)? Something like:

import chardet

# If plain text
try:
    encoding = chardet.detect(response.data)["encoding"]
    decoded = response.data.decode(encoding)
except Exception:
    # base64 encode the bytes

zeevt commented on 9 Dec 2021

javulticat You understood me correctly, yes. I have two concerns with this, now that I thought about it a tiny bit more:
Maybe someone is trying to return an http response with "content-type: text/foo" and with a body encoded with whatever encoding on purpose, and the proposed code would decode it and re-encode it in UTF-8 if chardet guesses correctly. This would be a bug. They should be able to ensure the exact bytes they intended are received by the HTTP client, without forcing application/octet-stream content type.
How fast is chardet? I would hesitate running it on every response without measuring first.
I now think it's only safe to decode bytes to str if it actually decodes as UTF-8.

@coveralls
Copy link

coveralls commented Jul 28, 2022

Coverage Status

Coverage increased (+0.2%) to 74.303% when pulling 15532f1 on feature/issue-908-update-binarysupport-handling into 34e3065 on master.

@monkut
Copy link
Collaborator Author

monkut commented Jul 28, 2022

It appears that the CSV related discussion is not directly related to BINARY_SUPPORT = True, but relates to the line below where text is decoded using the werkzug Response object's get_data(as_text=True) method.

zappa_returndict["body"] = response.get_data(as_text=True)

If that's correct, I think text decode is a separate issue from what the BINARY_SUPPORT feature is trying to do in this PR.

@monkut monkut marked this pull request as ready for review August 5, 2022 10:00
♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py
@monkut
Copy link
Collaborator Author

monkut commented Sep 8, 2022

@javulticat If you have some time, can you take a look at this, I tried to update it and and cleanup the implementation.

@monkut monkut mentioned this pull request Oct 20, 2022
@monkut
Copy link
Collaborator Author

monkut commented Oct 25, 2022

(currently running this branch in a personal project of mine to confirm stability)

🎨 change commented lines to docstring for test app
Copy link
Collaborator

@souravjamwal77 souravjamwal77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests ran fine on my local and on CI pipelines also. So, approved.

@monkut monkut merged commit bad24dd into master Nov 11, 2022
@Quidge
Copy link

Quidge commented Nov 11, 2022

Thank you maintainers for running with this for as long as you have. 🙏

Ian288 pushed a commit to tackle-io/Zappa that referenced this pull request Jul 11, 2023
… if data is binary (zappa#1155)

* 🔧 migrate zappa#971 to lastest master

* 🎨 run black/isort

* ♻️ refactor to allow for other binary ignore types based on mimetype. (currently openapi schema can't be passed as text.

* 🎨 run black/fix flake8

* 🔧 add EXCEPTION_HANDLER setting

* 🐛 fix zappa_returndict["body"] assignment

* 📝 add temp debug info

* 🔥 delete unnecessary print statements

* ♻️ Update comments and minor refactor for clarity

* ♻️ refactor for ease of testing and clarity

* 🎨 fix flake8

* ✨ add `additional_text_mimetypes` setting
✅ add testcases for additional_text_mimetypes handling

* 🔧 Expand default text mimetypes mentioned in zappa#1023
♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py

* 🎨 run black/isort

* 🎨 run black/isort

* 🎨 remove unnecesasry comment (black now reformats code)
🎨 change commented lines to docstring for test app
BarNehemia added a commit to Lightricks/Zappa that referenced this pull request Aug 10, 2023
…zappa-0.57.0

* commit '0b1eab14ca39c3a3bfb4e915347e07495171dcba': (27 commits)
  updating workflow actions to remove deprecation warnings (zappa#1243)
  📝 CHANGELOG.md update for 0.57.0 (zappa#1246)
  fixing compatibility with Django 4.2 (zappa#1237)
  Update Readme with patreon and donors (zappa#1234)
  adding support for Python 3.10 (zappa#1231)
  Nose to pytest (zappa#1239)
  lint: updating code style with `make black` (zappa#1238)
  Alternative way to check if running in Docker (zappa#1204)
  📝 CHANGELOG.md update for 0.56.0 release (zappa#1187)
  Improve `get_best_match_zone()` to match by most matched components instead of by length of domain (zappa#1193)
  Bypass python version runtime check when code run in docker (zappa#1180)
  Be able to pass in a batch window when using batch size (zappa#1118)
  Correction to README. (zappa#1177)
  (zappa#908) Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary (zappa#1155)
  Remove special check for Django<1.7, fix for zappa#1158  (zappa#1159)
  Resolve (zappa#410) Logs are missing query strings (zappa#1165)
  Handle spaces in x-forwared-for/remove six (zappa#1127) (zappa#1163)
  add feature (zappa#704) Check if args/kwargs are JSON Serializable while running locally (zappa#1154)
  docs: Add documentation for s3 event object key_filters (zappa#1169)
  Add feature pyenv virtualenv detecting .python-version file (zappa#1153)
  ...

# Conflicts:
#	zappa/cli.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Migrated] Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary
6 participants