Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more details to the stretchy algorithm #238

Open
NSoiffer opened this issue Apr 17, 2024 · 4 comments
Open

Add more details to the stretchy algorithm #238

NSoiffer opened this issue Apr 17, 2024 · 4 comments

Comments

@NSoiffer
Copy link
Contributor

This comes from a side discussion about #103... It incorporates two related questions about the algorithm for stretchy characters.

Some stretchy characters have a few fixed sizes before moving to essentially assembling glyph pieces. For these fixed sizes, there is the question of which glyph variant to choose when the size is between two sizes. I see three options when moving from smaller sizes to larger sizes:

  1. when the size exceeds the target size, choose the smaller size (i.e., the previous entry)
  2. when the size exceeds the target size, choose the larger size (i.e., the current entry)
  3. when the size exceeds the target size, choose the size that is closest to the target size

Note that the target size is affected by the symmetric property.

@davidcarlisle: what does TeX do?

My gut reaction is that choosing the closest size is best. However, I'm not entirely sure that is right or maybe the target size should have already been shrunk (TeX uses parens/brackets/braces that are ~90% the size of the contents for \left( ... \right) -- see below).

5.3 Size variants for operators (MathVariants) says:

Browse the list of MathGlyphVariantRecord in MathGlyphConstruction.mathGlyphVariantRecord. If one MathGlyphVariantRecord.advanceMeasurement is at least T then use normal shaping and bounding box for MathGlyphVariantRecord.variantGlyph, the MathItalicsCorrectionInfo for that glyph as italic correction and exit with success.

So the spec seems to be saying use '2'. I didn't see an issue where there a discussion of what is the right thing to do (hence this issue).

The determination of the target size is in 3.3.1.1 Algorithm for stretching operators along the block axis.

TeX has:

  • \delimitershortfall: specifies the maximum space not covered by a delimiter (default 5pt)
  • \delimiterfactor: the ratio for variable delimiters, times 1000 (default 901).

I suspect that sizing will be better if we follow the TeX model. This would mean a change to 3.3.1.1. I don't know how common it is to change these values. Maybe @dginev can tell from arXiv? Assuming changing them is not common (and they aren't in the OpenType table), then I suggest hard coding them into the algorithm in the spec for computing the size to use.

@davidcarlisle: are these variables used for other stretchy chars like \sum and horizontal stretching, or just when \left and \right are used (which would slightly complicate the algorithm by having to check the first/last element in an mrow for being a prefix/postfix mo.

@davidcarlisle
Copy link
Collaborator

Tex does (2) stop at (and use) first characterthat is at least the requested size. Note tex ony ever stretches symmetrically on the math axis so size here is effectively height+depth.

The texbook describes it thus:

Another subroutine sets box $x$ to a specified variable ^{delimiter},
having a specified minimum height plus depth.  This means that a search is
conducted as follows: The delimiter is defined by two symbols, a ``small
character''~$a$ in family~$f$ and a ``large character''~$b$ in family~$g$.
The search looks first at character $a$ in scriptscriptfont~$f$, if $C\le
\it SS$; then it looks at $a$ in scriptfont~$f$, if $C\le S$; then it looks at
$a$ in textfont~$f$.  If nothing suitable is found from $a$ and~$f$, the
larger alternative $b$ and $g$ is examined in the same way.  Either
$(a,f)$ or $(b,g)$ may be $(0,0)$, which means that the corresponding part
of the search is to be bypassed.  When looking at a character in a
font, the search stops immediately if that character has sufficient height
plus depth, or if the character is ^{extensible}; furthermore, if the
character does not stop the search, and if it has a ^{successor} in the
font, the successor is looked at next. \ (See the \MF\ manual or the
^^{METAFONT}
system documentation of |tfm| files for further information about
successors and extensible characters.) \ If the search runs all the way to
completion without finding a suitable character, the one with greatest
height plus depth is chosen. If no characters at all were found (either
because $a=f=b=g=0$ or because the characters did not exist in the fonts),
$x$~is set to an empty box whose width is ^|\nulldelimiterspace|.  If an
extensible character was found, $x$~is set to a vbox containing enough
pieces to build up a character of sufficient size; the height of this vbox
is the height of the topmost piece, and the width is the width of the
repeatable piece. ^^{built-up characters} Otherwise
$x$~is set to an hbox containing the character that was found; the italic
correction of the character is included in the width of this box.

@NSoiffer

(TeX uses parens/brackets/braces that are ~90% the size of the contents for \left( ... \right) -- see below).

Not really. there is no factor built in there are two user settable parameters \delimitershortfall and \delimiterfactor
when determing the "minimum size" as used in the quote above from the size of the content tex does

 Replace the boundary items by
delimiters whose height plus depth is at least $\max(\lfloor\delta/500\rfloor
f,2\delta-l)$, where $f$ is the ^|\delimiterfactor|
and $l$ is the ^|\delimitershortfall|.

latex sets these to

\delimiterfactor=901
\delimitershortfall=5pt

so at least 90.1% of the size and within 5pt of the size, but that is settable at any point and can be changed for each expression within a document it is not built in to tex.

are these variables used for other stretchy chars like \sum and horizontal stretching, or just when \left and \right are used (which would slightly complicate the algorithm by having to check the first/last element in an mrow for being a prefix/postfix mo

All this is just about \left\right in tex \sum is never stretchy it just comes in two sizes, one is used in displaystyle and one in text style, the size doesn't depend on other content in the expression. Normally \int is the same as \sumbut there are some fonts which experiment with a vertical extensible integral which is set left \left\int content \right. although classic tex can not really handle limits on a delimiter, luatex has some experimental extensions in that area. In most fonts though teh integral is not extensible and just comes in two sizes like sum and other big operators, Union, etc.

Classic TeX can not horizontally stretch characters. Some wide accents choose from a fixed set by measuring "by hand" within the macro layer and then stretchy arrows just draw horizontal or vertical rules, not using glyphs apart from the arroew head.
luatex has some support for opentype-specified horizontal stretching.

@fred-wang
Copy link
Contributor

MathML Core says we should pick first size that is at least the target size (so 2 above).

Besides what David explained, there are two (arguably not very strong) hints that justify this choice:

  • MathML 3 / Full say to "stretch to cover" (the height and depth, or the width) which may be interpreted as saying at least as large as the target... Indeed, the smaller sizes may only partially cover...

  • The OpenType MATH spec seems to mention that we try and find glyphs that are at least the required size:

    First, an attempt is made to find glyph among provided variants. If the required size is larger than any of the glyph variants provided, however, then the general mechanism can be employed to typeset the curly braces as a glyph assembly.

Regarding implementations, Chromium and WebKit tries to find at least the target size.

Gecko is doing more complicate stuff, but IIUC

  • it's looking for a size that is +/-10% of the target, and for vertical and displaystyle largeop that is additionally at least the target minus 5pt. Note this corresponds to TeX's \delimiterfactor and \delimitershortfall mentioned above.
  • in both cases, it's trying to find the size that is closer in absolute value to the target (incidentally, there is a code comment about whether we should use the log scale instead)

It's actually doing even more, see stretch hints (see also IsSizeOK, IsSizeBetter and GetStretchHint) for details. Also by default the mathml.scale_stretchy_operators.enabled pref is true so it's scaling the result to match the target size anyway.

My preference would be to keep the algorithm simple as it's easier to understand (especially when writing or debugging WPT tests). If we really want something like delimiterfactor or delimitershortfall prefs, it would be better to have them as CSS properties (so that web developers can customize them if needed) rather than hardcoded magic values. I assume they would be defined as percentage and length respectively and could take different values for the element stretching the embellished operator, the embellished operator and the core operator (so we would need to decide which of the three possibilities to consider).

@NSoiffer
Copy link
Contributor Author

Small update from the MathML Full meeting on 18/4/24:

@davidcarlisle pointed out that \delimitershortfall=5pt is probably irrelevant on the web and that only adding a fixed 90% of the target step to the vertical stretchy algorithm in the spec is needed to make the layout similar to TeX. It appears that \delimiterfactor is almost never changed in TeX and so there is no need to expose that as something that can be changed.

@fred-wang
Copy link
Contributor

I understand this would mean tweaking https://w3c.github.io/mathml-core/#dfn-shape-a-stretchy-glyph so that steps 2 and 3 are changed by at least 90.1% of T when shaping to block dimension T.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants