-
Notifications
You must be signed in to change notification settings - Fork 326
Unicode: use "target encoding" while transcoding for output #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 7813470994
💛 - Coveralls |
| for line in o: | ||
| for line in output: | ||
| if isinstance(line, bytes): | ||
| line = line.decode("utf-8", "replace") # noqa: PLW2901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this buggy line screams: we should review Canvas logic and ultimately switch to the Unicode strings in canvas contents.
Currently we have standard chain for 99,9% of widgets str -> bytes -> str (terminal encoding independent).
During transcoding we're using "target encoding", but in fact it useful only in input processing (not everywhere: in windows and with curses we already can have decoded data).
urwid/display/_raw_display_base.py
Outdated
| output.append(escape.SO) | ||
| lastcs = cs | ||
| o.append(run) | ||
| output.append(run.decode(util.get_encoding(), "replace")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this code was originally working with bytes so that applications could be running with a non-utf-8 encoding without converting to Unicode and back (potentially losing data in the process) but I guess everything in raw display uses Unicode now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ python
Python 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
>>> sys.stdout.write("test str")
test str8
>>> sys.stdout.write(b"test bytes")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: write() argument must be str, not bytes
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>
IO is now using strings, so we have to use decode/encode to operate bytes.
Curses is using strings and can return keystrokes as strings and decoded mouse input.
Windows raw display has access natively to str and bytes input from the same object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this code was originally working with bytes so that applications could be running with a non-utf-8 encoding without converting to Unicode and back (potentially losing data in the process) but I guess everything in raw display uses Unicode now?
Output in unicode was as minimum in version 2.0 (and even ignored target encoding): https://github.com/urwid/urwid/blob/release-2.0.0/urwid/raw_display.py#L848
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wardi in python 3 we have STDOUT, which is always "Unicode". And it's not always buffered (so we cannot use buffer for output).
8cc01eb to
4cc8933
Compare
* apply `Canvas.content` method API normalization and annotation: * * Overloaded methods should accept arguments as base class method
4cc8933 to
96cfaaa
Compare
Canvas.contentmethod API normalization and annotationChecklist
masterorpython-dual-supportbranchtoxsuccessfully in local environment