Skip to content

Conversation

@penguinolog
Copy link
Collaborator

  • apply Canvas.content method API normalization and annotation
    • Overloaded methods should accept arguments as base class method
    • Overloaded methods can use own defaults
Checklist
  • I've ensured that similar functionality has not already been implemented
  • I've ensured that similar functionality has not earlier been proposed and declined
  • I've branched off the master or python-dual-support branch
  • I've merged fresh upstream into my branch recently
  • I've ran tox successfully in local environment
  • I've included docstrings and/or documentation and/or examples for my code (if this is a new feature)

@penguinolog penguinolog added the Unicode Issues related to Unicode <-> bytes conversion label Jan 31, 2024
@penguinolog penguinolog requested a review from wardi January 31, 2024 16:02
@coveralls
Copy link

coveralls commented Jan 31, 2024

Pull Request Test Coverage Report for Build 7813470994

  • -5 of 14 (64.29%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 70.89%

Changes Missing Coverage Covered Lines Changed/Added Lines %
urwid/display/curses.py 1 2 50.0%
urwid/display/_raw_display_base.py 0 4 0.0%
Totals Coverage Status
Change from base Build 7813019446: 0.002%
Covered Lines: 8774
Relevant Lines: 12462

💛 - Coveralls

for line in o:
for line in output:
if isinstance(line, bytes):
line = line.decode("utf-8", "replace") # noqa: PLW2901
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this buggy line screams: we should review Canvas logic and ultimately switch to the Unicode strings in canvas contents.
Currently we have standard chain for 99,9% of widgets str -> bytes -> str (terminal encoding independent).
During transcoding we're using "target encoding", but in fact it useful only in input processing (not everywhere: in windows and with curses we already can have decoded data).

output.append(escape.SO)
lastcs = cs
o.append(run)
output.append(run.decode(util.get_encoding(), "replace"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this code was originally working with bytes so that applications could be running with a non-utf-8 encoding without converting to Unicode and back (potentially losing data in the process) but I guess everything in raw display uses Unicode now?

Copy link
Collaborator Author

@penguinolog penguinolog Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ python
Python 3.11.4 (main, Dec  7 2023, 15:43:41) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
>>> sys.stdout.write("test str")
test str8
>>> sys.stdout.write(b"test bytes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: write() argument must be str, not bytes
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>

IO is now using strings, so we have to use decode/encode to operate bytes.
Curses is using strings and can return keystrokes as strings and decoded mouse input.
Windows raw display has access natively to str and bytes input from the same object

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this code was originally working with bytes so that applications could be running with a non-utf-8 encoding without converting to Unicode and back (potentially losing data in the process) but I guess everything in raw display uses Unicode now?

Output in unicode was as minimum in version 2.0 (and even ignored target encoding): https://github.com/urwid/urwid/blob/release-2.0.0/urwid/raw_display.py#L848

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wardi in python 3 we have STDOUT, which is always "Unicode". And it's not always buffered (so we cannot use buffer for output).

@penguinolog penguinolog requested a review from wardi January 31, 2024 17:49
@penguinolog penguinolog force-pushed the typing_canvas branch 4 times, most recently from 8cc01eb to 4cc8933 Compare February 6, 2024 08:21
* apply `Canvas.content` method API normalization and annotation:
* * Overloaded methods should accept arguments as base class method
@penguinolog penguinolog merged commit ea3e3ab into urwid:master Feb 7, 2024
@penguinolog penguinolog deleted the typing_canvas branch February 7, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Unicode Issues related to Unicode <-> bytes conversion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants