-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
HTML Export #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I fail to see why that is
Documentation says "It may also be useful to redact content because its arguments are not included in the output.", so simply using opacity is not an option
It should at least be possible to get separate pages for chapters/sections, with a menu for navigation. |
I'll always vote in favor of semantic tags. That means There are some more tags that may be used with this in mind, like |
Re this:
There are CSS Providing for something like |
Thanks for the comments, updated the comparison above! ❤️ On the note of pages, we could have some form of metadata in the HTML document noting down, where we want to do a page-break, though it might miss the point as exporting to a printable document should go through the PDF export directly instead of exporting it to html and then printing that. But yes, we can look into supporting that too. |
Math should use MathML! |
That would be ideal in principle, but maybe not great in practice... Quoting MDN:
Maybe MathML Core, but I don't know how that would work out.
|
We can definitely focus on using MathML Core, my current translation should also only contain MathML Core elements as far as I'm aware. |
Hey! Thanks for this issue. I wanted to chip in with a few thoughts here:
In general, you carry over quite a bit of styling! We should ask ourselves what the goal of HTML output will be: Is it to strive for a pixel-perfect reproduction of the PDF or to produce a document that feels "web native" and can easily be styled with downstream CSS or be used as an artifact in i.e. a static site generator. Or do we want something in-between (eBooks)? The answer to these questions should inform the design with respect to the more invasive layout functions like I, personally, think that Typst's HTML output should be more semantic than a pixel-perfect reproduction of the PDF. However, I do not have an opinion yet to the extend we should "bake in" styles as CSS and apply functions like |
Updated On the note of I'm definitely on par with keeping the outputed HTML file as semantic as possible so I propose the following balance between Semantics and Styling.
Let me know if that aligns with your thoughts on an HTML export or if I should think this through furhter. |
That's correct. There are some things in your list which are already resolved by the time HTML export will start, including |
Personally, I would prefer this to be something to do outside of the document, and I would avoid making the document "know" about existing environment options. What I mean is that the document would only describe what it would like to do and something else would provide the how. (Not a proposal, but picture something like CSS custom properties that can only be set outside of their usage.) |
Note that there are multiple HTML tags that have the same style by default but different semantic connotations: |
The semantic distinction is important for accessibility is not? |
I think his would be very useful. The typst language looks perfect for a static site builder, by being both a templating language and a markup language all in one |
Im currently investigating forking and adding HTML support myself now that I've learned how typst works internally. It would require making non-fixed-position version of the Im wondering, since I've never contributed to a large project like this, how much I can modify the library? Typst seems to make almost everything public so Im hesitant to rename or modify builtin function signatures so some feedback on what's acceptable would be appreciated. |
I wouldn't care too much about breaking changes and HTML export should indeed be 1-step instead of 2-step. However, we have planned to split up the current layout phase into two, where the first results in a fully styled, semantic document model. This would then serve as the source for HTML export and layouting. |
Understood. I'll hold off in my efforts for now.
I disagree. For example if using for a static site builder it might be useful to paginate things, like with a list of articles. |
Afaik KaTeX uses MathML under the hood where possible but is more fully featured and might be less hassle than MathML itself. |
That would require converting typst math to LaTeX math instead, and KaTeX supports an even smaller subset of that than MathJax. MathML Core is the way to go. |
I see. Out of curiosity, how big of an undertaking is HTML export? Doable in weeks/few months/many months? |
what if we convert the typst to markdown, then let markdown ecosystem take care of the rest? am I not thinking correct? |
That conversion would be very lossy. |
We first need to rework some internals. I'd say a few months. |
This would be more an architectural thought (that I'm naiively suggesting without fully understanding typst internals), but I wonder if it would make sense to create an intermediate representation. Something akin to LLVM-IR but for documents. For example rustc (clang and many other compilers) will first compile to llvm-ir and then use llvm tooling to convert the IR to some sort of machine code. This sort of a pipeline would look something like;
This would be nice for a couple of reasons.
|
I've extended the Pandoc reader and writer to support Now <div class="warning">
<p>This is a warning</p>
</div> is equivalent to #block(class: "warning")[This is a warning] in both directions (Typst ⇒ Html and Html ⇒ Typst). This also works for other formats than Html. This is not officially supported by Typst currently. It will give you an error on "an unexpected attribute: class". Try it out, let me know if it's useful. The code is here. |
I think we’ll eventually want to have support for We would also need a way to specify the |
The concern is generally valid. |
That wouldn’t solve the problem of clashes between packages, unless you mean having the prefix be unique for each package. Of course, you could leave the problem of avoiding name clashes to the packages themselves, in a similar way as C libraries do in the absence of namespaces, but that requires conscious effort. |
This is somewhat of a solved problem in c++ and rust with name mangling. e.g. https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html |
LocalizationI want to add to this that when following MathML Core, the development of HTML export should take into account (at least) the different usage of thousand and decimal separators when it comes to other languages. ContextThis might prove a challenge, since depending on the context in MathML (Core) a comma can be a decimal separator in a number
This is not apparent in the MathML Core specification, but is shown in the examples in MathML 3 (3.2.5.4 Examples with fences and separators) and MathML 4 (3.2.5.4 Examples with fences and separators). I understand that MathML 3 isn't well supported, but it is the recommendation. Seeing that MathML 4 is being worked on it would be a good idea for future-proofing that these things would be taken into account when handling the HTML export. This might prove difficult to implement when wanting to maintain Typst's simple typing experience, since this case (and many others) are context-based. Invisible operatorsIt should also be taken into account that there are invisible operators (MathML 4) such as the invisible times and function application, which are used to make the expression There should be a way to give these as input in Typst. It is an issue of correct markup and accessibility. Separate issue?I don't know if these MathML specific things should be actually brought up as its own issue, since there is a lot more to go into and it might derail what is mainly about the HTML output. I do think that the MathML is essential for HTML output as well, which was reflected well in the first comment. Much appreciated! |
Ignoring this doesn't make sense, pages on the web can be printed, for use cases like blogs, documentation/reference manuals, web novels/books, etc. printing is a natural consideration. Edit: My bad, just saw similar arguments have been made in the past P.S. |
FWIW there is a work in progress posted at d7107be |
A quick note about MathML, since I'm pretty involved in some initiatives for creating interoperable mathML (specifically on the fediverse, like on mastodon) If typst generates MathML, it should leave an This annotation node needs an encoding attribute, hopefully containing the mime-type. I don't think typst has a mime-type yet, but to match TeX this should be |
Work on HTML export is properly starting now. We've opened a tracking issue where you can follow the progress: #5512 |
Regarding invisible operators (previously mentioned in this thread by @samimaat), they are the topic of section 22.6 of the Unicode 16.0.0 Core Specification, as well as section 2.14 of UTR #25. Ideally, Typst should probably output them by default. Sadly, this is probably impossible to do right. If we want to allow users to emit those characters easily, they should be added as symbols to Codex, which is the topic of typst/codex#45. |
A minor suggestion: Typst comments should be exported to HTML comments. This would simplify altering generated files from the very beginning, e.g. include custom HTML code:
|
Should they? I generally use comments for things I don't want to include in the final published text (e.g. answers to exercises, which I don't want the students to see). I don't really want them to appear in the HTML output, even if you have to use inspect element to read them. |
@FeldrinH Generated HTML would require post-processing in most cases (at least for inclusion/publishing) until Typst becomes a full-fledged website builder (I hope it won't). Moreover, HTML is not a human-readable format; it's a markup language. Non-obfuscated output might be beneficial in many cases. I would definitely prefer to have high-quality HTML, rather than a publishing-ready (e.g., minified) version. Having some options to control this would be helpful, of course. |
For those use cases, because source comments are not expected in a built artefact like Typst's HTML output, I think that the inclusion of comments should be opt-in, and perhaps make use of |
There should probably be a ways to emit comments (such as with a |
My point is that, considering the consistency, if pdf outputs do not contain comments, then so do html outputs. |
One suggestion/request: I think typst to semantic HTML conversion is a bit of a waste of time (probably a lot of time), because if typst would be "compatible" with semantic HTML, typst itself wouldn't be necessary in the first place. The reason I am using typst is because it is the only way I can write scientific content which can be efficiently converted to svg using a small wasm plugin. Moreover, some typst extensions overlap with what slint does better, and it will become worse overtime. Integrating typst with slint, would allow many relevant extensions of typst. |
I'm not sure I see the connection to slint and makepad. These are UI toolkits rather than document processors. It sounds like you want to use Typst-as-WASM in a MathJax-like way, to embed equations into UI? That's a valid use case and fairly easy to do as a community plugin. But the idea behind the HTML work is a very different one. It's about creating complete, semantic websites that can be viewed without a Typst WASM runtime. |
You are stating the connection in your own reply: "It's about creating complete, semantic websites that can be viewed without a Typst WASM runtime" this is precisely what slint does well (apparently, at least, I am not a slint expert). Note that slint runs in a low-memory device, so it has particular design choices, different from javascript-like websites, but this is a matter of taste. A code example: slintpad.com About the ePub usecase, since slint runs in low memory devices it can in theory run in all devices that run ePub and much more. Of course, in the short term some devices may run ePub and not slint, but most of these devices can be software updated, so I guess it would be a short term issue. About the static site generator (example https://www.npmjs.com/package/hexo-renderer-typst ): wasm (and so slint) is compatible with static site generator. It is not single file, but the complexity overall I think is lower with slint (this is a bit of a matter of taste, many people may disagree, but it is what I think). The main problem with HTML is that if it would be possible, then typst would probably not have been created and MATHML would be a success: in fact, the complexity is large. |
The primary use case I see for Typst to HTML is to generate ePub documents, which still see a lot of use and are often more accessible than PDFs. |
Another use case: adding Typst to HTML output to static site generator inputs in order to support single sourcing. For this, pure semantic HTML is all we need (and all we really want). |
This is all very much my draft notes currently but will be worked upon in the comming days, so take everything with a grain of salt.
tl;dr This is looking really promising already in terms of translatability, the main callenge here will likely be the rewrite of the Layout process into separate steps as discussed.
Comparison Typst => HTML
How easy we could convert most functions into a
HTML/CSS
counterpart.Text
#lorem()
->#text
#emph
-><em>
#linebreak
-><br>
#lower
-><span class="lower">
or similar withtext-transform: lowercase
#overline
-><span class="overline">
withtext-decoration: overline
#raw
-><code>
with syntax highlighting done over a<div>
instead (possibly highlight.js)#smallcaps
-><span class="smallcaps">
#smartquote
->#text
#strike
-><s>
#strong
-><strong>
#sub
-><sub>
#super
-><sup>
#text
-><span>
#underline
-><span class="underline">
withtext-decoration: underline
#upper
-><span class="upper">
or similar withtext-transform: uppercase
Math
Good sauce: https://fred-wang.github.io/TeXZilla/
accent
-><mover>
attach
-><munderover>
/<msubsup>
scripts
-><msubsup>
limits
-><munderover>
binom
-><mrow><mo>(</mo><mfrac linethickness="0px"><mi>n</mi><mi>k</mi></mfrac><mo>)</mo></mrow>
cases
-><mrow><mo>{</mo><mtable columnalign="left left" displaystyle="false"></mtable></mrow>
equation
-><math>
frac
-><mfrac>
lr
-><mrow>
with<mo>
mat
-><mtable>
root
-><mroot>
and<msqrt>
round
-> Same aslr
styles
-> with cssop
-><mo>
or<mi>
under/over
-><munderover>
variants
->#text
vec
-><mtable>
Layout
#align
-><span class="center">
withtext-align: center
#block
-><div>
#box
-><div class="box">
withdisplay: inline-block
#list
-><ul>
#colbreak
-> maybe possible with cssbreak-before: ...
#columns
-> possible withcolumn-count: 2
#grid
-><div class="grid">
withdisplay: grid
and so on#hide
-> either withopacity: 0
or using a<div>
with a fixed width#measure
-> maybe possible with js?#move
-> withposition: relative; top: y, left: x
#enum
-><ol>
tbh, the name could maybe be improved#pad
-> withpadding: 10pt 2pt ...
#page
-> Doesn't make sense in context of web media#pagebreak
-> Same as#page
#par
-><p>
#parbreak
-> beginning of new<p>
#place
-> withposition: absolute; top: ...
#repeat
-> Could maybe be possible withwidth: 100%; overflow: hidden
but very hacky#rotate
-> withtransform: rotate(...)
#scale
-> withtransform: scale(...)
#h
-> possibly withposition: inline-block; width: ...
#v
-> same as#h
but with height instead of width#stack
-> withdisplay: flex
#table
-><table>
#terms
-><dl>
Visualize
Easily possible using SVG's
#circle
-><circle>
#ellipse
-><ellipse>
#image
-><image>
#line
-><line>
#path
-><path>
#polygon
-><polygon>
#rect
-><rect>
#square
-><rect>
Meta
#bibliography
-> Likely with<a href="#el-id">
but has to be looked into#cite
-> same as#bibliography
#counter
->#text
#document
-> should be used to insert meta information into the HTML Document.#figure
-><figure>
#heading
-><h1>
/<h2>
/ ...#link
-><a>
#locate
-> Doesn't make much sense in the context of web media / possibly with JS#numering
->#text
#outline
->#text
#query
-> can be ignored#ref
-><a>
#state
might be ignored#style
-> possibly can be ignoredThe text was updated successfully, but these errors were encountered: