Enable pagination for sign/verify message #1337

mmilata · 2020-11-04T02:27:38Z

Fixes #1271. Required rather intrusive changes to render_text().

Some notes:

Some of the recorded screens changed because we were printing normal-font space (5px) when showing monospace text (12px space).
Changed cardano confirm transaction to use the same pagination logic. As a result there are 5 lines on the initial page instead of 4, don't know if the empty last line was intentional.
Noticed that we're using normal font for signing message but monospace for verifying, is it intentional?
Some paginated messages don't look very good:

matejcik · 2020-11-04T08:56:20Z

Some paginated messages don't look very good:

we should make the right margin wider in such case

matejcik · 2020-11-04T09:01:54Z

core/src/trezor/ui/text.py

 ) -> None:
+    if not pre_linebreaked:


I don't like having both pre_linebreaked and new_lines. What is currently the difference? Could we maybe get rid of new_lines now?

both=False: words are treated as words, if a word fits on single line it is never broken, if it doesn't fit it's broken such that as many characters as possible stay on the first line. Spaces are treated as any other character when breaking a word containing them.

new_lines=True: words are treated more as a lines, inserting BR after every one. Otherwise behaves as above. This is the most common use in our codebase.

pre_linebreaked=True: words are expected to contain linebreaks at the right places and are not processed before rendering.

Thinking about this, the result will be the same often for the latter two options.

Then there's another behaviour we want but is not part of this PR - accept strings containing spaces and break on spaces wherever possible. Perhaps it would make sense to merge all these flags into something like mode which would take three different values?

i'm still a little confused so I'm going to make a full table:

new_lines pre_linebreaked Behavior

False False Text is split at (a) explicit BRs, (b) end of screen

True False Text is split at (a) explicit BRs, (b) end of screen, (c) end of individual item. This is most common.

Any True Text is split at (a) explicit BRs. I.e, no magic happens if the text overflows.

Is this correct?

I believe that going forward, we will be able to completely get rid of the (c) variant, "end of individual item". But for this PR it stays.

I'm not sure, however, how valuable is keeping the (b) variant. If we removed that, then there is no difference in behavior between pre_linebreaked=True and `False.

Also implementation-wise it's just a matter of "reflowing" the words parameter via a separate call.

(BTW, I believe the reason new_lines is a parameter of render_text() is to save memory and/or processing time otherwise needed to insert the BR symbols to the list. But going forward we'll be splitting strings anyway so I don't think this is relevant, but let's keep that in mind)

Anyway.
I think pre_linebreaked should only be argument of Text class, not render_text call, and Text should be responsible for calling break_lines

also i'd call it something else, something along the lines of auto_linebreaks=True instead of pre_linebreaked=False

Yeah, I'd say the table is correct. Agreed that it probably doesn't make sense to keep (b) long-term.

Regarding memory & time, this PR probably makes things worse but not sure how to get around that.

Will look into your suggestions, thanks!

Regarding memory & time, this PR probably makes things worse but not sure how to get around that.

Total allocations for device tests in text.py went from 180000 to 390000, that's pretty bad ...

matejcik · 2020-11-04T14:02:25Z

core/src/trezor/ui/text.py

+    offset_x: int = TEXT_MARGIN_LEFT,
+    offset_y: int = TEXT_HEADER_HEIGHT + TEXT_LINE_HEIGHT,
+    offset_x_max: int = ui.WIDTH,
+) -> List[List[TextContent]]:


why return List[List] if you convert it to a flat list anyway?

In apps/common/signverify.py (and then apps/common/layout) we first call chunks on the result before passing it to render_text which flattens it.

mmilata · 2020-11-05T01:40:58Z

Changed:

move break_lines out of render_text
rename pre_linebreaked to auto_linebreaks
get rid of nested lists
wider margin when scroll-dots are displayed

matejcik · 2020-11-05T11:06:53Z

Yeaahh... I guess I take everything back 👀 this here is the reason rewriting the UI in Rust would be so nice.

Total allocations for device tests in text.py went from 180000 to 390000, that's pretty bad ...
(...)
we first call chunks on the result before passing it to render_text

Although the refactor looks good on paper, I'm afraid this is showing that it's the wrong approach. The advantage of the original render_text() method is that it does almost no allocations. The new approach does a lot; even the original unmodified split_message() is doing work that is technically not necessary, and implies copying (!) the whole message to be signed.

(my code from #1312 also does text.split("\n") which might actually be better handled inside render_text)

rough outline of a solution that looks like it would work better in this regard:

a function measure_line(string, string_offset=0, break_words=False) -> int returns a number of characters such that string[string_offset : string_offset + result] fits on one line. If break_words is False, then either (a) string_offset + result == len(string), (b) string[string_offset + result] is whitespace, or (c) the respective part of a string is an unbroken word that does not fit on the line
the above is used to calculate number of lines in text, and number of characters that fit on single page
the above is also used in render_text, where we need additional checks that tell us if we should stick a "-" or "..." at end

then the message pagination is done either by pre-breaking the message at given indexes, or we add a string_offset argument to Text so that you can point to the same string on multiple pages. I'm trying to find out whether string slices make copies -- if not, slicing would be a more readable solution.

and we use this to get completely rid of split_message(), because paginate_lines() will do all the work

matejcik · 2020-11-05T11:10:08Z

I'm trying to find out whether string slices make copies

not sure why, but it seems that they do

matejcik · 2020-11-05T11:11:05Z

of course, if your "get rid of nested lists" modification is OK memory-wise, we don't actually need to do the above :)

mmilata · 2020-11-05T12:42:19Z

Removing the nesting decreased the allocations from 390k to 330k which is far from OK. Back to the drawing board then ...

Changed: - use MONO-sized space instead of NORMAL-sized space when rendering MONO text - fix corner case where word is broken before 0-th character - add flag for breaking text on spaces

Avoid enumerate(). Decreases number of allocations by ~90k.

Turn the line breaking generator into a function that returns list. Decreases the number of allocations by ~30k.

Get rid of the next_line() closure in break_lines(). Decreases the number of allocations by ~90k.

mmilata · 2020-11-15T20:14:12Z

I've force-pushed a version where break_lines returns a list of indexes (3-tuples: list index, char index, break marker). It's less performant than I thought it will be and I'll be grateful for some high-level feedback. Here's a summary of how it looks in terms of number of allocations over all device tests:

what	text.py #allocs	Δ
master	182000
factor out `break_lines`, returns indexes	513000	+331k
extend `modtrezorui` to avoid string slices	446000	-67k
do not use `enumerate`	360000	-86k
turn `break_lines` from generator to function that returns list	327000	-33k
manually inline `next_line` closure	230000	-97k

…ne content

matejcik · 2020-12-01T09:46:19Z

thanks for the measurements, these show hotspots nicely

so, a problem I see is returning and manipulating tuples repeatedly. each "return tuple()" is an allocation. I'm not sure about this but maybe tuple deconstruction is also an allocation? Furthermore, you're reusing the results of break_lines, which necessitates storing them, so all the tuples are kept around?

my suggestion would be reworking so that:
(1) what is now next_sentinel would fill out a passed-in structure instead of returning
(2) then you don't need the helper and can do this directly in break_lines
(3) ...making it again a generator instead of returning a list.

I'm not sure I completely understand the reason for next_sentinel helper, so I'm not sure if it makes sense to get rid of it. (it doesn't seem to make things worse, but it complicates reading the code somewhat)
Does it exist to avoid wrapping everything in try/except? if yes, maybe next_sentinel should return true/false to make the detection postponeable?

I believe that by modifying break_lines/next_sentinel to fill out a structure, we completely avoid the issue that you worked around in e06cdb2, and thus we can again call break_lines multiple times, sacrificing runtime for memory efficiency. Ideally, running break_lines should cause one allocation, for instantiating the generator.

class BreakEntry:
	def __init__(self):
		self.set(0, 0, None)

	def set(self, word_index: int, char_index: int, break_split: Optional[str]) -> None:
		self.word_index = word_index
		self.char_index = char_index
		self.break_split = break_split

def break_lines(...arguments, result: BreakEntry) -> Iterable[None]:
	...
	result.set(word_index, char_index, break_split)
	yield
	...
	return

def next_helper(break_iterator: Iterable[None]) -> bool:
	try:
		next(break_iterator)
		return True
	except StopIteration:
		return False

def usage(...arguments):
	break_entry = BreakEntry()
	break_lines_iter = break_lines(...arguments, break_entry)
	should_continue = True
	while should_continue:
		should_continue = next_helper(break_lines_iter)
		if break_entry.char_index > blabla:
			...

For now I wouldn't worry about calling break_lines often, if the above approach does lead to decrease in allocations. (as the code stands, you'd have to call it once as "count lines" in paginate_content, and second time to assign a fresh generator to text.breaks. I believe we can get rid of that with some careful factoring, but let's solve the allocation thing first.

matejcik · 2020-12-01T09:48:49Z

what worries me is that the next_line closure is so costly.

perhaps we could extend the BreakEntry trick above, and add the other variables (like line_no etc.) to that class? then next_line could happen as part of entry.set() ?
or something like that

matejcik · 2020-12-01T09:54:37Z

i'm also kind of surprised that measure_line is relatively cheap. still, perhaps it would be worth using the same trick, make it return None and fill out a passed-in List[int]

matejcik · 2020-12-01T09:56:23Z

what worries me is that the next_line closure is so costly.

would you mind retrying with next_line as closure, if it turns out that the BreakEntry thing helps? i suspect that its bad behavior could be some sort of artifact

mmilata requested review from matejcik and tsusanka as code owners November 4, 2020 02:27

matejcik reviewed Nov 4, 2020

View reviewed changes

mmilata added 7 commits November 14, 2020 23:50

refactor(core): factor out line breaking logic from render_text()

1c7732b

Changed: - use MONO-sized space instead of NORMAL-sized space when rendering MONO text - fix corner case where word is broken before 0-th character - add flag for breaking text on spaces

feat(core): paginate message to be signed/verified

b814e1c

chore(core): increase memory profiler timeout

416b7c2

perf(core): avoid copying string when rendering multiline content

7dc1308

fixup! perf(core): avoid copying string when rendering multiline content

b3ab727

Avoid enumerate(). Decreases number of allocations by ~90k.

fixup! perf(core): avoid copying string when rendering multiline content

e06cdb2

Turn the line breaking generator into a function that returns list. Decreases the number of allocations by ~30k.

fixup! perf(core): avoid copying string when rendering multiline content

f5da228

Get rid of the next_line() closure in break_lines(). Decreases the number of allocations by ~90k.

mmilata force-pushed the mmilata/sign-verify-pagination branch from 1598add to f5da228 Compare November 15, 2020 19:55

mmilata requested a review from prusnak as a code owner November 15, 2020 19:55

fixup! fixup! perf(core): avoid copying string when rendering multili…

d9ae628

…ne content

tsusanka modified the milestones: 2020-12, 2021-01 Nov 18, 2020

tsusanka removed their request for review November 25, 2020 17:56

matejcik mentioned this pull request Dec 14, 2020

Pagination for sign/verify message #1384

Merged

tsusanka modified the milestones: 2021-01, 2021-02 Dec 29, 2020

mmilata closed this Jan 4, 2021

mmilata deleted the mmilata/sign-verify-pagination branch September 29, 2021 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pagination for sign/verify message #1337

Enable pagination for sign/verify message #1337

mmilata commented Nov 4, 2020

matejcik commented Nov 4, 2020

matejcik Nov 4, 2020

mmilata Nov 4, 2020

matejcik Nov 4, 2020

mmilata Nov 4, 2020

mmilata Nov 4, 2020

matejcik Nov 4, 2020

mmilata Nov 4, 2020

mmilata commented Nov 5, 2020

matejcik commented Nov 5, 2020

matejcik commented Nov 5, 2020

matejcik commented Nov 5, 2020

mmilata commented Nov 5, 2020

mmilata commented Nov 15, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020

new_lines	pre_linebreaked	Behavior
False	False	Text is split at (a) explicit BRs, (b) end of screen
True	False	Text is split at (a) explicit BRs, (b) end of screen, (c) end of individual item. This is most common.
Any	True	Text is split at (a) explicit BRs. I.e, no magic happens if the text overflows.

Enable pagination for sign/verify message #1337

Enable pagination for sign/verify message #1337

Conversation

mmilata commented Nov 4, 2020

matejcik commented Nov 4, 2020

matejcik Nov 4, 2020

Choose a reason for hiding this comment

mmilata Nov 4, 2020

Choose a reason for hiding this comment

matejcik Nov 4, 2020

Choose a reason for hiding this comment

mmilata Nov 4, 2020

Choose a reason for hiding this comment

mmilata Nov 4, 2020

Choose a reason for hiding this comment

matejcik Nov 4, 2020

Choose a reason for hiding this comment

mmilata Nov 4, 2020

Choose a reason for hiding this comment

mmilata commented Nov 5, 2020

matejcik commented Nov 5, 2020

matejcik commented Nov 5, 2020

matejcik commented Nov 5, 2020

mmilata commented Nov 5, 2020

mmilata commented Nov 15, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020

matejcik commented Dec 1, 2020