Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content rework 2 - Electric Boogaloo #2504

Merged
merged 70 commits into from
Nov 6, 2023
Merged

Content rework 2 - Electric Boogaloo #2504

merged 70 commits into from
Nov 6, 2023

Conversation

Dherse
Copy link
Sponsor Collaborator

@Dherse Dherse commented Oct 27, 2023

After the failure that was #878 I am back with version 2, it only took six months.

This also lays the groundwork for the Value rework discussed with @laurmaedje in the sense that it shows that structs are better than dynamic values if possible, and that Arc<dyn Trait> is a valid and performant option.

This PR is massive and has deep performance implications for typst while not changing any behaviour from the user's point of view:

  • Elements are now struct based instead of backed by a dynamic EcoVec of attributes
  • Element fields are now accessed using their ID which is unique to each element, these IDs are defined on a #[repr(u8)] enum called {ElementName}Fields (e.g for HeadingElem it is HeadingElemFields).
  • The ID 255 is reserved for the "label" """virtual""" field (a weird workaround that predates this PR).
  • Queries now use the ID instead of the field name
  • Style chains now use the ID instead of the field name
  • Fields marked #[internal] are now truly internal: the user cannot access them, they no longer need to be FromValue or IntoValue
  • Made the bibliography synthesized which is possible thanks to the aforementioned #[internal] change
  • The where() selector now converts the field name into the ID when called and checks that the field exists on the element being selected (this is a breaking change)
  • Made getters return references where possible (i.e when there is not style chain lookup) which removes an enormous amount of rather expensive Arc clones (less expensive than a full clone but involves an atomic operation and therefore cache invalidation)
  • Added the #[empty(<expr>)] attribute, which allows setting a default value for a field, this is used when we need to create an empty element that is (if possible) non-allocating, this is only used when getting the supplement of figures, perhaps we can find a better way of handling this.
  • Added the #[not_hash] attribute for making a field as not being part of the hashing of the element.
  • Added the #[borrowed] attribute for making a field that can also be resolved from the stylechain be borrowed instead of cloned
  • Added some caching in par.rs for linebreaking algorithm.
  • Added the Block type, a type that can store a value (essentially dyn Any) either on the stack within a small space or on the heap using a Box<T>. This is used in the style chain for the following change:
  • Style chain no longer stores Values but instead the "native" data structure used in code, to avoid doing conversion to/from Value everytime.
  • Added the Element object-safe trait (hence why I am not re-using NativeElement) which allows safe access to an element in a type-erased manner.
  • Generally tried to not use unsafe with the exception of obtaining references and mut references of the original type, safety is clearly explained where relevant and checks are in place to avoid issues.
  • Added the #[variant(<u8>)] attribute to control the numerical ID of a field, this is only used for TextElem where the text field is always zero and HeadingElem.
  • Automatically generate the following fields for each element: span, location, label, prepared, and guards. These fields should probably be moved into an ElementMetadata I just haven't gotten around to that yet.
  • The rest of the API is the same.
  • Added an optional allocator (mimalloc) with the feature flag "mimalloc" on the CLI, I mostly used this for testing but I cannot stress enough how huge the impact is especially on Windows. I would be tempted to have this enabled all the time but since it introduces a C dependency it's really up to @laurmaedje.
  • Added calls to ittapi for tracing, these calls are only enabled with the feature flag "ittapi" on the CLI and are only ever useful for profiling the code as this is the API that Intel V-Tune uses for pausing and resuming profiling, which I use specifically to profile typst watch. Note that I am not attached to having this included in the PR and I can easily remove it but I think it's convenient to have the calls already in the right place for those that need it (i.e just me afaik).
  • Removed as much cloning as possible all over the code base.
  • Labels are now based on interned strings which makes them much cheaper to compare and hash which gives a nice performance bump.

Performance impact:

The performance impact is rather huge, not so much in cold compiles where it is about 24% on an Intel i9 12900k on Ubuntu, and 15% on an AMD 7950X3D on Windows. But in incremental, the gains are H-U-G-E, we are talking 3.5x on the 12900k and 2.5x on the 7950X3D. That makes the incremental compile times go, respectively, from 2.49s to just 0.7s and 2.25s to 0.9s. These gains are gigantic and are really noticeable on larger documents. (these tests were done with my thesis)

For @Enter-tainer's 2.5k pages document, the gains are smaller being 11% in incremental and identical in cold compile on the 7950x3d. And 16% in cold compiles and 17% in incremental compilation. Mind you, I also tested on my 7950x3D but without 3D V-cache and the gains in incremental are 80% which is amazing too.

Therefore, it is fair to say that this is dependent on the document but generally document containing lots of queries and large bibliographies should benefit the most. Since there are very few changes to the layout engine, documents that are bound by said engine (as @Enter-tainer's document is) will see very little in the way of gains. From these results as well as other benchmarks I have done in the past, it becomes clear that memory latency and cache size are major factors when it comes to the performance of Typst.

Still left to do:

  • Fix merge conflicts
  • ElementMetadata instead of direct fields and repeated code
  • Remove usage of Cow<T> in some places (it had no measurable performance impact)
  • Discuss allocator and ittapi usage
  • Discuss new attributes (although I think they're mostly fine)
  • Fix up some documentation I have not yet updated
  • Remove the old elem macro and rename selem (the new one) to elem

@Dherse Dherse marked this pull request as draft October 27, 2023 17:43
@Dherse
Copy link
Sponsor Collaborator Author

Dherse commented Nov 2, 2023

Based on your comments, I thought that the unsafe block impl wasn't needed anymore, but it's still there. Is it just not yet done or did I misunderstand something?

It's still used in the style chain, just not in selectors. I am unsure how to remove it since a Box<dyn Any> doesn't work or it would have to be a Box<dyn Blockable>, additionally I really like that it has a stack based storage for smaller values (which are the vast majority of values in the style chain). I will try experimenting with the smallbox crate instead.

@Dherse
Copy link
Sponsor Collaborator Author

Dherse commented Nov 2, 2023

Ok, I think I covered all code review items, I removed a good chunk of unsafe code and improved the existing ones in the elem macro to be a bit shorter (using the new methods on Content directly). There is not any more unsafe code than there was before with the exception of unpack_owned which is new.

Copy link
Member

@laurmaedje laurmaedje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more notes:

  • The unpack comment was marked resolved, but not addressed.
  • The Fields trait comment was marked resolved, but not addressed.
  • I think the enum -> u8 should be done with as and then we can remove the impl Into<u8> everywhere.
  • I checkout the code of smallbox and it's full of unsafe without any safety comments :/ Are the gains really worth it?

crates/typst/src/model/content.rs Outdated Show resolved Hide resolved
crates/typst/src/meta/mod.rs Outdated Show resolved Hide resolved
crates/typst/src/meta/mod.rs Outdated Show resolved Hide resolved
crates/typst/src/model/content.rs Outdated Show resolved Hide resolved
crates/typst/src/model/block.rs Outdated Show resolved Hide resolved
crates/typst/src/model/block.rs Outdated Show resolved Hide resolved
crates/typst/src/model/block.rs Outdated Show resolved Hide resolved
crates/typst/Cargo.toml Outdated Show resolved Hide resolved
@laurmaedje
Copy link
Member

I've pushed my changes, feel free to take a look. Main changes:

  • Fixed style chain lifetimes
  • Changed fields! to select_where!
  • Changed some impls from generated to provided trait method
  • Moved some stuff around
  • Other nitpicks (derive order etc.)

@Dherse
Copy link
Sponsor Collaborator Author

Dherse commented Nov 6, 2023

Looks great 🎉

@Dherse
Copy link
Sponsor Collaborator Author

Dherse commented Nov 6, 2023

A small update on performance improvement, we have lost a couple of percentage points due to slightly more dynamic dispatch than there was, the fact that Block now contains a Box instead of a small box, and a couple of other miscelaneous changes, but overall, we are still talking about (on my 7950X3D on Windows) 10% improvement in cold compile times and 170% to 230% improvement in incremental compilation.

@laurmaedje laurmaedje merged commit c0f6d20 into typst:main Nov 6, 2023
3 checks passed
@laurmaedje
Copy link
Member

Great work, thank you! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants