Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in curl_httpclient's _curl_debug() #3183

Open
eliasp opened this issue Sep 4, 2022 · 1 comment
Open

UnicodeDecodeError in curl_httpclient's _curl_debug() #3183

eliasp opened this issue Sep 4, 2022 · 1 comment

Comments

@eliasp
Copy link

eliasp commented Sep 4, 2022

We ran into an issue in SaltStack with curl_httpclient when using a proxy and trying to download a binary file:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py", line 497, in _curl_debug
    debug_msg = native_str(debug_msg)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/escape.py", line 219, in to_unicode
    return value.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 1: invalid continuation byte

It seems, the fix for this issue is quite similar to the one for #1608 in d7d9c46, but I'm not sure whether this would have also to be extended to the other cases a few lines below as 27a6103 changed the usage of native_str().

This hotfix (applied to the Tornado bundled with Salt) made it at least work for me and it seems .decode('latin1') isn't used in current master as well.

diff --git a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
index 8652343..6ef3349 100644
--- a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig
+++ b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
@@ -494,7 +494,7 @@ class CurlAsyncHTTPClient(AsyncHTTPClient):

     def _curl_debug(self, debug_type, debug_msg):
         debug_types = ('I', '<', '>', '<', '>')
-        debug_msg = native_str(debug_msg)
+        debug_msg = native_str(debug_msg.decode('latin1'))
         if debug_type == 0:
             curl_log.debug('%s', debug_msg.strip())
         elif debug_type in (1, 2):
@eliasp eliasp changed the title UnicodeDecodeError in curl_httpclient's __curl_debug() UnicodeDecodeError in curl_httpclient's _curl_debug() Sep 4, 2022
@bdarnell
Copy link
Member

Hmm, I can't find any documentation of what character encoding libcurl uses for its debug messages. It looks like it must not be utf-8, though, which is what native_str hard-codes. So replacing native_str(debug_msg) with debug_msg.decode('latin1') is probably the right thing to do (using both native_str and decode was for python 2/3 compatibility and now only one is needed at a time).

However, if the message is not in fact latin1 text (say it's a binary blob, or some other encoding entirely), this could just print out garbage. It may be better to catch the UnicodeDecodeError and if we see one, to log it with %r instead of %s as we do a few lines below.

When you added the decode('latin1') did the message print legibly or did you get garbage? (byte 0xf4 is ô in latin1, is that what you expected in the message?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants