Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode Non-ASCII HTTP Headers #45

Merged
merged 1 commit into from
Oct 9, 2018
Merged

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Oct 6, 2018

To address the issue in #38, add %-encoding of HTTP headers that are non-ascii.
(While latin-1/iso-8859-1 headers are technically allowed, to be extra safe all non-ascii headers are encoded)

If a header such as:

Content-Disposition: attachment; filename="Lancement du Système d’Échange Local (SEL).pdf"

it will be converted to:

Content-Disposition: attachment; filename*=UTF-8''Lancement%20du%20Syst%C3%A8me%20d%E2%80%99%C3%89change%20Local%20%28SEL%29.pdf

before being written to WARC (per https://tools.ietf.org/html/rfc8187#section-3.2.3)

Adds StatusAndHeaders.to_ascii_bytes() to perform this conversion where necessary.

…tempt to

%-encode any non-ascii headers as utf-8 as specified in RFC 5987, RFC 8187
Add StatusAndHeaders.to_ascii_bytes() which ensures an ascii only encoding of the headers
Addresses #38
@ikreymer ikreymer requested a review from N0taN3rd October 6, 2018 16:28
@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 99.844% when pulling eca8182 on percent_encode_non_ascii_headers into 3f6e2d7 on develop.

@ikreymer ikreymer merged commit 29204d0 into develop Oct 9, 2018
@ikreymer ikreymer deleted the percent_encode_non_ascii_headers branch October 9, 2018 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants