Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New realization / Text show rules now work across elements #4876

Merged
merged 1 commit into from
Sep 2, 2024

Conversation

laurmaedje
Copy link
Member

This pull request contains a full rewrite of Typst's realization subsystem. This work is the result of a long time of planning and incremental improvements toward making these changes possible.

Motivation

Realization is the process by which arbitrary user content is turned into well known elements ready for further processing. Among other things, it applies show rules, various transformations, and inserts tags for introspection. So far, the realization process was hardcoded to produce a specific structure ingested by the layout engine. This PR makes the process much more flexible and less fixated on layout.

A large motivation for these changes is upcoming work on HTML which imposes various requirements on realization differing from those of layout. Another, more distant, motivation is the ability for more powerful selector mechanisms based on the neighbourhood of elements.

The changes also made it much easier to group textual elements (text, spaces, linebreaks, and smartquotes) for regex show rule processing. As a result, regex show rules now apply across these four types of elements as long as their styles are uniform.

#show "Lorem's": set text(red)
#lorem(1).trim(".")'s theorem.

Fixes #86
Fixes #3693

Details

Grouping

The primary new ability of the realization are dynamic grouping rules that define transformations that shall be applied. Previously, these groupings were hard-coded for layout. This allowed the use of the real realization instead of a fake approximation in math. While interaction of math and layout remain quite broken, it demonstrates that the more flexible process is working.

Behaviours

This PR removes the concept of behaviours from the realization process. Behaviours used to define how various elements interact (spaces, weak spacing, and so on) and collapse. Behaviours had various edge case problems. For example, it is often not possible to fully resolve weak spacing at the realization stage: Spacing may collapse due to layout artifacts and determining the maximum of multiple weak spacings might require region information for relative sizing. These problems are now fixed: Weak spacing is now fully resolved during layout while space collapsing is handled in the dynamic paragraph grouping rule during realization. This means the realization process itself makes much less assumptions about the content passing through it. While there are a few hard-coded things, most are handled via show or grouping rules.

Performance

There hasn't been sufficient investigation to say how much of a bottleneck realization is in a typical Typst document. However, due to its stateful nature, it is one of the least incrementalized parts of Typst. Therefore, I think it is somewhat safe to say that it can become a bottleneck in incremental compilations of large document.

For this reason, the PR is written in a quite performance-sensitive way. Most of realization and grouping occurs within a single mutable buffer (similar to how the parser works). It also heavily relies on efficient arena allocation (more so than the old realization).

However, I didn't yet benchmark and properly tune things. The reason for this is that some of Typst's core infrastructure (in particular, the StyleChain) is conceptually not capable of supporting the desired performance. Therefore, I believe all the performance work already done would be hard to notice with the current inefficient style chain traversals. This does not mean the work was in vain—there are plans to improve style matching in the future, which will then also unlock the performance benefits of the work done here.

@laurmaedje laurmaedje added this pull request to the merge queue Sep 2, 2024
Merged via the queue into main with commit cfde809 Sep 2, 2024
12 checks passed
@laurmaedje laurmaedje deleted the realize-refactor branch September 2, 2024 18:56
@Andrew15-5
Copy link
Contributor

So, does parbreak still split paragraph text into ungroupable parts?

a
b

I don't remember if we actually have an issue opened for that, but I remember that I faced it before. I usually split paragraph into 80 char lines like a typical code/Markdown file. But then it becomes impossible to apply show "a b": "c".

So I either have to be careful when formatting text or not use multi-word show rules at all. Or add an ugly #[] wrapper.

@laurmaedje
Copy link
Member Author

What you showed should work fine now. A single linebreak is converted to a space, which can be matched.

A parbreak will not be matched by a space, but it is only parsed from two consecutive linebreaks, i.e. one blank.

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Sep 3, 2024

Oh, alright. Maybe I misremembered/mixed up parbreak and line break when I had that conversation a long time ago.

I skimmed through tests (twice) and didn't find any example like mine. Only examples with backslash at the end. I think example without backslash also should be added. I can do that.

@laurmaedje
Copy link
Member Author

Feel free to make a PR with a test for this.

@davystrong
Copy link

This looks great! However, I tried to use it with ellipsis and it didn't quite work as I expected: show "..." doesn't match ... in the text, but show "…" (the single character) does. I'm not saying this should be different, I'm just asking is this intended?

@laurmaedje
Copy link
Member Author

This is intentional. Text show rules always match on the fully realized text. The only exception are smart quotes because they are a bit ill-defined without their neighborhood and also because it would just not be particularly useful.

But in the case of an ellipsis, I think it's good that it's possible to distinguish between three dots and an ellipsis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text and regex selectors don't match smart quotes Text show rules don't work across text nodes
3 participants