Skip to content

Loading…

Encoding issue #91

Closed
tnorth opened this Issue · 8 comments

3 participants

@tnorth

Posted by Vesa Halttunen on the RedHat bugzilla:

"I changed the current directory (with cd) to a directory that
had its name encoded in ISO-8859-15 while the current locale was UTF-8. The
directory name had the character ä in it (0xe4 in ISO-8859-1)."

Error message:
:utf_8.py:16:decode:UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in
position 68: invalid continuation byte
:
:Traceback (most recent call last):
: File "/usr/bin/autojump", line 314, in
: success=shell_utility()
: File "/usr/bin/autojump", line 226, in shell_utility
: dicadd(path_dict, decode(args[-1]))
: File "/usr/bin/autojump", line 69, in decode
: return text.decode(encoding,errors)
: File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
: return codecs.utf_8_decode(input, errors, True)
:UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 68:
invalid continuation byte
:
:Local variables in innermost frame:
:input:
'/media/f217cada-c236-44a7-850c-51ee16003a3a/Backup/Vesuri/Pictures/j\xe4te'
:errors: 'strict'

Full bug report is here:
https://bugzilla.redhat.com/show_bug.cgi?id=783833

@joelthelion
Collaborator

This is an interesting issue... Autojump uses sys.getfilesystemencoding() (http://docs.python.org/library/sys.html#sys.getdefaultencoding) to figure out the encoding of the filesystem. However, this function doesn't take an argument, so it's bound to fail if different filesystems have different encodings.

Does anyone know what the right thing to do would be? Is this a bug in the python standard library? How are you supposed to find out the encoding of a filesystem on the various Unixes?

@wting
Owner

FYI, from Python3UnicodeDecodeError:

On UNIX (and other operating systems), it's possible to mount different file systems using different charsets. sys.getdefaultencoding() will be the same for the different file systems since this encoding is only used between Python and the Linux kernel, not between the kernel and the file system which may uses a different charset.

I tried to reproduce the bug locally but couldn't, so I've requested more info in the RedHat bug report.

@tnorth

BTW, another similar bug was filed:
https://bugzilla.redhat.com/show_bug.cgi?id=835974

Might not be more informative though :/

@wting
Owner

Thanks for following up on this issue.

I need to set aside some time to work on it. However at the same time, I'm hoping when RedHat upgrades to a newer version of Autojump using Python 3 these Unicode related problems will magically go away. One can hope...

@tnorth

Python3 is available in Fedora. Should we force autojump to use it ?

@wting
Owner

Yes.

Right now Autojump supports Python 2 and 3. When it detects Python 2, it applies a basic UTF wrapping around strings. The traceback from the second bug shows a problem with one of those functions (called encode() and decode()). Since Python 3 uses Unicode strings natively, I'm hoping that will solve the 2nd bug.

What Tanguy Ortolo (Debian maintainer) did was apply a quick patch to replace the first line of autojump with #!/usr/bin/python#!/usr/bin/python3 and change the respective package dependencies.

@tnorth

Ok, I might do that then. And update to the last version, I guess that I am a bit behind again :)

@wting
Owner

No problem. I hope to release v21 in a month or two. v21-rc is available right now, I've been quite busy over the past few months (changelog).

@wting wting closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.