Skip to content

Commit

Permalink
Fixes #575 by improved documentation. (#593)
Browse files Browse the repository at this point in the history
  • Loading branch information
sallner committed May 9, 2019
1 parent 7130902 commit ef0ef3b
Showing 1 changed file with 22 additions and 5 deletions.
27 changes: 22 additions & 5 deletions src/ZPublisher/WSGIPublisher.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
##############################################################################
#
# Copyright (c) 2002 Zope Foundation and Contributors.
Expand Down Expand Up @@ -283,12 +284,28 @@ def publish_module(environ, start_response,

path_info = environ.get('PATH_INFO')
if path_info and PY3:
# The WSGI server automatically treats the PATH_INFO as latin-1 encoded
# bytestrings. Typically this is a false assumption as the browser
# delivers utf-8 encoded PATH_INFO. We, therefore, need to encode it
# again with latin-1 to get a utf-8 encoded bytestring.
# BIG Comment, see discussion at
# https://github.com/zopefoundation/Zope/issues/575
#
# The WSGI server automatically treats headers, including the
# PATH_INFO, as latin-1 encoded bytestrings, according to PEP-3333. As
# this causes headache I try to show the steps a URI takes in WebOb,
# which is similar in other wsgi server implementations.
# UTF-8 URL-encoded object-id 'täst':
# http://localhost/t%C3%A4st
# unquote('/t%C3%A4st'.decode('ascii')) results in utf-8 encoded bytes
# b'/t\xc3\xa4st'
# b'/t\xc3\xa4st'.decode('latin-1') latin-1 decoding due to PEP-3333
# '/täst'
# We now have a latin-1 decoded text, which was actually utf-8 encoded.
# To reverse this we have to encode with latin-1 first.
path_info = path_info.encode('latin-1')
# But in Python 3 we need text here, so we decode the bytestring.

# So we can now decode with the right (utf-8) encoding to get text.
# This encode/decode two-step with different encodings works because
# of the way PEP-3333 restricts the type of string allowable for
# request and response metadata. The allowed characters match up in
# both latin-1 and utf-8.
path_info = path_info.decode('utf-8')

environ['PATH_INFO'] = path_info
Expand Down

0 comments on commit ef0ef3b

Please sign in to comment.