-
-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New realization / Text show rules now work across elements #4876
Conversation
So, does parbreak still split paragraph text into ungroupable parts? a
b I don't remember if we actually have an issue opened for that, but I remember that I faced it before. I usually split paragraph into 80 char lines like a typical code/Markdown file. But then it becomes impossible to apply So I either have to be careful when formatting text or not use multi-word show rules at all. Or add an ugly |
What you showed should work fine now. A single linebreak is converted to a space, which can be matched. A parbreak will not be matched by a space, but it is only parsed from two consecutive linebreaks, i.e. one blank. |
Oh, alright. Maybe I misremembered/mixed up parbreak and line break when I had that conversation a long time ago. I skimmed through tests (twice) and didn't find any example like mine. Only examples with backslash at the end. I think example without backslash also should be added. I can do that. |
Feel free to make a PR with a test for this. |
This looks great! However, I tried to use it with ellipsis and it didn't quite work as I expected: |
This is intentional. Text show rules always match on the fully realized text. The only exception are smart quotes because they are a bit ill-defined without their neighborhood and also because it would just not be particularly useful. But in the case of an ellipsis, I think it's good that it's possible to distinguish between three dots and an ellipsis. |
This pull request contains a full rewrite of Typst's realization subsystem. This work is the result of a long time of planning and incremental improvements toward making these changes possible.
Motivation
Realization is the process by which arbitrary user content is turned into well known elements ready for further processing. Among other things, it applies show rules, various transformations, and inserts tags for introspection. So far, the realization process was hardcoded to produce a specific structure ingested by the layout engine. This PR makes the process much more flexible and less fixated on layout.
A large motivation for these changes is upcoming work on HTML which imposes various requirements on realization differing from those of layout. Another, more distant, motivation is the ability for more powerful selector mechanisms based on the neighbourhood of elements.
The changes also made it much easier to group textual elements (text, spaces, linebreaks, and smartquotes) for regex show rule processing. As a result, regex show rules now apply across these four types of elements as long as their styles are uniform.
Fixes #86
Fixes #3693
Details
Grouping
The primary new ability of the realization are dynamic grouping rules that define transformations that shall be applied. Previously, these groupings were hard-coded for layout. This allowed the use of the real realization instead of a fake approximation in math. While interaction of math and layout remain quite broken, it demonstrates that the more flexible process is working.
Behaviours
This PR removes the concept of behaviours from the realization process. Behaviours used to define how various elements interact (spaces, weak spacing, and so on) and collapse. Behaviours had various edge case problems. For example, it is often not possible to fully resolve weak spacing at the realization stage: Spacing may collapse due to layout artifacts and determining the maximum of multiple weak spacings might require region information for relative sizing. These problems are now fixed: Weak spacing is now fully resolved during layout while space collapsing is handled in the dynamic paragraph grouping rule during realization. This means the realization process itself makes much less assumptions about the content passing through it. While there are a few hard-coded things, most are handled via show or grouping rules.
Performance
There hasn't been sufficient investigation to say how much of a bottleneck realization is in a typical Typst document. However, due to its stateful nature, it is one of the least incrementalized parts of Typst. Therefore, I think it is somewhat safe to say that it can become a bottleneck in incremental compilations of large document.
For this reason, the PR is written in a quite performance-sensitive way. Most of realization and grouping occurs within a single mutable buffer (similar to how the parser works). It also heavily relies on efficient arena allocation (more so than the old realization).
However, I didn't yet benchmark and properly tune things. The reason for this is that some of Typst's core infrastructure (in particular, the
StyleChain
) is conceptually not capable of supporting the desired performance. Therefore, I believe all the performance work already done would be hard to notice with the current inefficient style chain traversals. This does not mean the work was in vain—there are plans to improve style matching in the future, which will then also unlock the performance benefits of the work done here.