New realization / Text show rules now work across elements #4876

laurmaedje · 2024-09-02T12:16:04Z

This pull request contains a full rewrite of Typst's realization subsystem. This work is the result of a long time of planning and incremental improvements toward making these changes possible.

Motivation

Realization is the process by which arbitrary user content is turned into well known elements ready for further processing. Among other things, it applies show rules, various transformations, and inserts tags for introspection. So far, the realization process was hardcoded to produce a specific structure ingested by the layout engine. This PR makes the process much more flexible and less fixated on layout.

A large motivation for these changes is upcoming work on HTML which imposes various requirements on realization differing from those of layout. Another, more distant, motivation is the ability for more powerful selector mechanisms based on the neighbourhood of elements.

The changes also made it much easier to group textual elements (text, spaces, linebreaks, and smartquotes) for regex show rule processing. As a result, regex show rules now apply across these four types of elements as long as their styles are uniform.

#show "Lorem's": set text(red)
#lorem(1).trim(".")'s theorem.

Fixes #86
Fixes #3693

Details

Grouping

The primary new ability of the realization are dynamic grouping rules that define transformations that shall be applied. Previously, these groupings were hard-coded for layout. This allowed the use of the real realization instead of a fake approximation in math. While interaction of math and layout remain quite broken, it demonstrates that the more flexible process is working.

Behaviours

This PR removes the concept of behaviours from the realization process. Behaviours used to define how various elements interact (spaces, weak spacing, and so on) and collapse. Behaviours had various edge case problems. For example, it is often not possible to fully resolve weak spacing at the realization stage: Spacing may collapse due to layout artifacts and determining the maximum of multiple weak spacings might require region information for relative sizing. These problems are now fixed: Weak spacing is now fully resolved during layout while space collapsing is handled in the dynamic paragraph grouping rule during realization. This means the realization process itself makes much less assumptions about the content passing through it. While there are a few hard-coded things, most are handled via show or grouping rules.

Performance

There hasn't been sufficient investigation to say how much of a bottleneck realization is in a typical Typst document. However, due to its stateful nature, it is one of the least incrementalized parts of Typst. Therefore, I think it is somewhat safe to say that it can become a bottleneck in incremental compilations of large document.

For this reason, the PR is written in a quite performance-sensitive way. Most of realization and grouping occurs within a single mutable buffer (similar to how the parser works). It also heavily relies on efficient arena allocation (more so than the old realization).

However, I didn't yet benchmark and properly tune things. The reason for this is that some of Typst's core infrastructure (in particular, the StyleChain) is conceptually not capable of supporting the desired performance. Therefore, I believe all the performance work already done would be hard to notice with the current inefficient style chain traversals. This does not mean the work was in vain—there are plans to improve style matching in the future, which will then also unlock the performance benefits of the work done here.

Andrew15-5 · 2024-09-02T22:47:12Z

So, does parbreak still split paragraph text into ungroupable parts?

a
b

I don't remember if we actually have an issue opened for that, but I remember that I faced it before. I usually split paragraph into 80 char lines like a typical code/Markdown file. But then it becomes impossible to apply show "a b": "c".

So I either have to be careful when formatting text or not use multi-word show rules at all. Or add an ugly #[] wrapper.

laurmaedje · 2024-09-03T06:43:47Z

What you showed should work fine now. A single linebreak is converted to a space, which can be matched.

A parbreak will not be matched by a space, but it is only parsed from two consecutive linebreaks, i.e. one blank.

Andrew15-5 · 2024-09-03T08:18:12Z

Oh, alright. Maybe I misremembered/mixed up parbreak and line break when I had that conversation a long time ago.

I skimmed through tests (twice) and didn't find any example like mine. Only examples with backslash at the end. I think example without backslash also should be added. I can do that.

laurmaedje · 2024-09-03T08:19:54Z

Feel free to make a PR with a test for this.

davystrong · 2024-09-03T08:42:44Z

This looks great! However, I tried to use it with ellipsis and it didn't quite work as I expected: show "..." doesn't match ... in the text, but show "…" (the single character) does. I'm not saying this should be different, I'm just asking is this intended?

laurmaedje · 2024-09-03T11:38:02Z

This is intentional. Text show rules always match on the fully realized text. The only exception are smart quotes because they are a bit ill-defined without their neighborhood and also because it would just not be particularly useful.

But in the case of an ellipsis, I think it's good that it's possible to distinguish between three dots and an ellipsis.

Rewrite realization and fix text show rules

4493259

laurmaedje added this pull request to the merge queue Sep 2, 2024

Merged via the queue into main with commit cfde809 Sep 2, 2024
12 checks passed

laurmaedje deleted the realize-refactor branch September 2, 2024 18:56

Andrew15-5 mentioned this pull request Sep 3, 2024

Add show-text line wrapping test #4890

Merged

This was referenced Sep 4, 2024

#show does not affect all spaces character #4672

Closed

Spaces after « and before » should be nobreak in french #1920

Open

EpicEricEE mentioned this pull request Sep 15, 2024

Heading spacing ignored in bibliography #4963

Closed

1 task

knuesel mentioned this pull request Sep 20, 2024

Text show rule doesn't match smartquote #4991

Open

1 task

rwmpelstilzchen mentioned this pull request Sep 23, 2024

Show rule doesn’t catch spaces around formatted text #5009

Open

1 task

davystrong mentioned this pull request Oct 2, 2024

Add default typst implementation code where possible #5095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New realization / Text show rules now work across elements #4876

New realization / Text show rules now work across elements #4876

laurmaedje commented Sep 2, 2024

Andrew15-5 commented Sep 2, 2024

laurmaedje commented Sep 3, 2024

Andrew15-5 commented Sep 3, 2024 •

edited

Loading

laurmaedje commented Sep 3, 2024

davystrong commented Sep 3, 2024

laurmaedje commented Sep 3, 2024

New realization / Text show rules now work across elements #4876

New realization / Text show rules now work across elements #4876

Conversation

laurmaedje commented Sep 2, 2024

Motivation

Details

Grouping

Behaviours

Performance

Andrew15-5 commented Sep 2, 2024

laurmaedje commented Sep 3, 2024

Andrew15-5 commented Sep 3, 2024 • edited Loading

laurmaedje commented Sep 3, 2024

davystrong commented Sep 3, 2024

laurmaedje commented Sep 3, 2024

Andrew15-5 commented Sep 3, 2024 •

edited

Loading