Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix non-ASCII header exception in Python 3 #2357

Closed
wants to merge 1 commit into from
Closed

fix non-ASCII header exception in Python 3 #2357

wants to merge 1 commit into from

Conversation

BennyThink
Copy link

Hi there!

This Pull Request contains fix for non-ASCII(emoji, CJK characters, etc.) http headers in Python 3.
For example, refer to the following code:

def get(self):
    file_name, file_content = 'hello馃榿.txt', 'how are you'
    self.set_header("Content-Type", "application/bin; charset='utf-8'")
    self.set_header("Content-Disposition", "attachment; filename=%s" % file_name)
    self.write(file_content)

Above code will trigger browser to download hello馃榿.txt instead of open it.
However, above code would raise such exception:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 45-46: ordinal not in range(256)
And also, non-ASCII header set in any field would raise this exception,
for example self.set_header("laugh馃榿", "emoji")

The error occurs in the L385 of http1connection.py:
lines.extend(l.encode('latin1') for l in header_lines)
Of course encoding an emoji or CJK in latin1 would throw an exception.

The solution varies, whether url_escape(file_name) when we programs or fix the implementation in http1connection.py

Though according to RFC7230 section 3.2.4:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

HTTP Header should be limited to ASCII value.

However, this PR seems like a temporary solution since the Python 2 version just send the header in raw.

Anyway, thanks for your precious time in reviewing this.

@bdarnell
Copy link
Member

The RFC is clear here: headers must be sent in latin1, or, for specific headers that permit this, RFC2047 encoding (the content-disposition header is one that permits RFC2047 encoding). The python 3 behavior is correct. If we change anything, it should be to make python 2 reject these invalid headers in the same way python 3 does. (and maybe some helpers to make it easier to work with RFC2047)

@BennyThink BennyThink closed this Apr 22, 2018
@kinow kinow mentioned this pull request Sep 6, 2018
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants