Skip to content

Directory listing in SimpleHTTPRequestHandler does not work well in non-UTF-8 locale #133889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
serhiy-storchaka opened this issue May 11, 2025 · 2 comments
Assignees
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented May 11, 2025

For directory, SimpleHTTPRequestHandler generates an index.html page containing a list of files. It uses the filesystem encoding for the page, which is reasonable, because file names are encoded with that encoding. The problem is that the directory patch, included in the title, can contain a query part of the URL, which may be not encodable with the filesystem encoding.

This causes test failure when running in non-UTF8 locale:

$ LC_ALL=uk_UA ./python -m test -vuall test_httpservers -m test_undecodable_parameter
...
test_undecodable_parameter (test.test_httpservers.SimpleHTTPServerTestCase.test_undecodable_parameter) ... ----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 48062)
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 318, in _handle_request_noblock
    self.process_request(request, client_address)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 349, in process_request
    self.finish_request(request, client_address)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 721, in __init__
    super().__init__(*args, **kwargs)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 766, in __init__
    self.handle()
    ~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 485, in handle
    self.handle_one_request()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 473, in handle_one_request
    method()
    ~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 725, in do_GET
    f = self.send_head()
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 769, in send_head
    return self.list_directory(path)
           ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 874, in list_directory
    encoded = '\n'.join(r).encode(enc, 'surrogateescape')
  File "/home/serhiy/py/cpython/Lib/encodings/koi8_u.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 178: character maps to <undefined>
encoding with 'koi8-u' codec failed
----------------------------------------
ERROR

======================================================================
ERROR: test_undecodable_parameter (test.test_httpservers.SimpleHTTPServerTestCase.test_undecodable_parameter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_httpservers.py", line 559, in test_undecodable_parameter
    response = self.request(self.base_url + '/?x=%bb').read()
               ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/test/test_httpservers.py", line 131, in request
    return self.connection.getresponse()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 1430, in getresponse
    response.begin()
    ~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
                             " response")
http.client.RemoteDisconnected: Remote end closed connection without response

----------------------------------------------------------------------

I suspect that there may also be issues if some files in the directory have non-decodable or the path of the directory is non-decodable, but I have not tested this yet.

Linked PRs

@serhiy-storchaka serhiy-storchaka added 3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes labels May 11, 2025
@serhiy-storchaka serhiy-storchaka self-assigned this May 11, 2025
@serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label May 11, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label May 11, 2025
@StanFromIreland
Copy link
Contributor

There is no good way that guarantees it will work with simple locale, so why not encode in utf-8 instead? It is becoming default in Python anyway, and I believe is the default in the majority of web browsers. We could make it optional to use system encoding, and default to the web standard?

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 16, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 17, 2025
…-134102)

(cherry picked from commit fcaf009)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 17, 2025
…-134102)

(cherry picked from commit fcaf009)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue May 17, 2025
…) (GH-134122)

(cherry picked from commit fcaf009)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue May 17, 2025
…) (GH-134121)

(cherry picked from commit fcaf009)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka
Copy link
Member Author

Always using the UTF-8 encoding, as well as the "xmlcharrefreplace" errors handler will fix the failure. But it will also change the page representation. In long run it may be good, but note that currently you get the same binary representation of file names independently from the locale of the server.

On other hand, I think that applying urllib.parse.unquote() to the whole URL is incorrect. The path can contain ? or # (percent-encoded in the URL), after unquoting it will look as wrong URL. Since query and fragment are ignored in any case, I think that they should not be shown in the page title. Then we will not have a problem of encoding non-encodable characters in a page.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 17, 2025
…rTestCase page

The query and the fragment are ambiguous and not used.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 17, 2025
…stHandler page

The query and the fragment are ambiguous and not used.
serhiy-storchaka added a commit that referenced this issue May 18, 2025
…ler page (GH-134135)

The query and fragment are ambiguous and not used.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 18, 2025
…stHandler page (pythonGH-134135)

The query and fragment are ambiguous and not used.
(cherry picked from commit 5cbc8c6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 18, 2025
…stHandler page (pythonGH-134135)

The query and fragment are ambiguous and not used.
(cherry picked from commit 5cbc8c6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue May 18, 2025
…estHandler page (GH-134135) (GH-134190)

The query and fragment are ambiguous and not used.
(cherry picked from commit 5cbc8c6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue May 18, 2025
…estHandler page (GH-134135) (GH-134191)

The query and fragment are ambiguous and not used.
(cherry picked from commit 5cbc8c6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants